Topic Change Point Detection Model
Model Details
- Model Name: Falconsai/topic_change_point
- Model Type: Fine-tuned
google/t5-small
- Language: English
- License: MIT
Overview
The Topic Change Point Detection model is designed to identify topics and track how they change within a block of text. It is based on the google/t5-small model, fine-tuned on a custom dataset that maps texts to their respective topic changes. This model can be used to analyze and categorize texts according to their topics and the transitions between them.
Model Architecture
The base model architecture is T5 (Text-To-Text Transfer Transformer), which treats every NLP problem as a text-to-text problem. The specific version used here is google/t5-small
, which has been fine-tuned to understand and predict conversation arcs.
Fine-Tuning Data The model was fine-tuned on a dataset consisting of texts and their corresponding topic changes. The dataset should be formatted in a specified file with two columns: text and topic_changes.
Intended Use The model is intended for identifying topics and detecting changes in topics across a block of text. It can be useful for applications in various fields: Psychology/Psychiatry for session assesment (This initial use case), content analysis, document insights, conversational analysis, and other areas where understanding the flow of topics is important.
How to Use
Inference
To use this model for inference, you need to load the fine-tuned model and tokenizer. Here is an example of how to do this using the transformers
library:
Running Pipeline
# Use a pipeline as a high-level helper
from transformers import pipeline
text_block = 'Your block of text here.'
pipe = pipeline("summarization", model="Falconsai/topic_change_point")
res1 = pipe(convo1, max_length=1024, min_length=512, do_sample=False)
print(res1)
Running on CPU
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Falconsai/topic_change_point")
model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/topic_change_point")
input_text = 'Your block of text here.'
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Running on GPU
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Falconsai/topic_change_point")
model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/topic_change_point", device_map="auto")
input_text = 'Your block of text here.'
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Training
The training process involves the following steps:
- Load and Explore Data: Load the dataset and perform initial exploration to understand the data distribution.
- Preprocess Data: Tokenize the text block and prepare them for the T5 model.
- Fine-Tune Model: Fine-tune the
google/t5-small
model using the preprocessed data. - Evaluate Model: Evaluate the model's performance on a validation set to ensure it's learning correctly.
- Save Model: Save the fine-tuned model for future use.
Evaluation
The model's performance should be evaluated on a separate validation set to ensure it accurately predicts the conversation arcs. Metrics such as accuracy, precision, recall, and F1 score can be used to assess its performance.
Limitations
- Data Dependency: The model's performance is highly dependent on the quality and representativeness of the training data.
- Generalization: The model may not generalize well to conversation texts that are significantly different from the training data.
Ethical Considerations
When deploying the model, be mindful of the ethical implications, including but not limited to:
- Privacy: Ensure that text data used for training and inference does not contain sensitive or personally identifiable information.
- Bias: Be aware of potential biases in the training data that could affect the model's predictions.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Citation
If you use this model in your research, please cite it as follows:
@misc{topic_change_point,
author = {Michael Stattelman},
title = {Topic Change Point Detection},
year = {2024},
publisher = {Falcons.ai},
}
- Downloads last month
- 20