license: artistic-2.0
language:
- en
library_name: transformers
pipeline_tag: text2text-generation
tags:
- code
- keyword-generation
- t5
- english
KeywordGen-v1 Model
KeywordGen-v1 is a T5-based model fine-tuned for keyword generation from a piece of text. Given an input text, the model will return relevant keywords.
Model details
This model was trained using the T5 base model, and was fine-tuned on a custom dataset. The training data consists of text and corresponding keywords. The model generates keywords by predicting the relevant words or phrases present in the input text.
Important Usage Note
This model is optimized for processing larger inputs and will generate 3 keywords as output. For the most accurate results, I recommend using inputs of at least 4-5 sentences. Inputs shorter than this may lead to suboptimal keyword generation.
Suggestion for Usage
This model was made to generate keywords from reviews. To perform efficiently combine multiple reviews as one and give it as input to the model.
How to use
You can use this model in your application using the Hugging Face Transformers library. Make sure to prefix your input with "Keyword: " for the model to generate keywords. Here is an example:
from transformers import T5TokenizerFast, T5ForConditionalGeneration
# Load the tokenizer and model
tokenizer = T5TokenizerFast.from_pretrained('mrutyunjay-patil/keywordGen-v1')
model = T5ForConditionalGeneration.from_pretrained('mrutyunjay-patil/keywordGen-v1')
# Define the input text
input_text = "Keyword: I recently purchased the new headphones and they are incredible. The sound quality is superb, providing crystal clear audio in all ranges. The noise-cancelling feature is very effective, blocking out almost all ambient noise. I also love the comfortable design - they fit perfectly over my ears and don't cause any discomfort, even after long periods of use. The battery life is also impressive, lasting up to 20 hours on a single charge. Overall, I'm extremely satisfied with this product."
# Encode the input text
input_ids = tokenizer.encode(input_text, return_tensors='pt')
# Generate the keywords
outputs = model.generate(input_ids)
# Decode the outputs
keywords = tokenizer.decode(outputs[0])
Limitations and bias
As this is the first version, the model might perform poorly on texts that are very different from the texts in the training data. It might also be biased towards the types of text or keywords that are overrepresented in the training data.