|
--- |
|
license: gpl-3.0 |
|
language: |
|
- en |
|
tags: |
|
- feature extraction |
|
- mobile apps |
|
- reviews |
|
- token classification |
|
- named entity recognition |
|
pipeline_tag: token-classification |
|
widget: |
|
- text: "The share note file feature is completely useless." |
|
example_title: "Example 1" |
|
- text: "Great app I've tested a lot of free habit tracking apps and this is by far my favorite." |
|
example_title: "Example 2" |
|
- text: "The only negative feedback I can give about this app is the difficulty level to set a sleep timer on it." |
|
example_title: "Example 3" |
|
- text: "Does what you want with a small pocket size checklist reminder app" |
|
example_title: "Example 4" |
|
- text: "Very bad because call recording notification send other person" |
|
example_title: "Example 5" |
|
- text: "I originally downloaded the app for pomodoro timing, but I stayed for the project management features, with syncing." |
|
example_title: "Example 6" |
|
- text: "It works accurate and I bought a portable one lap gps tracker it have a great battery Life" |
|
example_title: "Example 7" |
|
- text: "I'm my phone the notifications of group message are not at a time please check what was the reason behind it because due to this default I loose some opportunity" |
|
example_title: "Example 8" |
|
- text: "There is no setting for recurring alarms" |
|
example_title: "Example 9" |
|
--- |
|
|
|
# T-FREX XLNet base model |
|
|
|
--- |
|
Please cite this research as: |
|
|
|
_Q. Motger, A. Miaschi, F. Dell’Orletta, X. Franch, and J. Marco, ‘T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews’, in Proceedings of The IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024. Pre-print available at: https://arxiv.org/abs/2401.03833_ |
|
|
|
--- |
|
|
|
T-FREX is a transformer-based feature extraction method for mobile app reviews based on fine-tuning Large Language Models (LLMs) for a named entity recognition task. We collect a dataset of ground truth features from users in a real crowdsourced software recommendation platform, and we use this dataset to fine-tune multiple LLMs under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned. |
|
|
|
Source code for data generation, fine-tuning and model inference are available in the original [GitHub repository](https://github.com/gessi-chatbots/t-frex/). |
|
|
|
## Model description |
|
|
|
This version of T-FREX has been fine-tuned for [token classification](https://huggingface.co/docs/transformers/tasks/token_classification#train) from [XLNet base model](https://huggingface.co/xlnet-base-cased). |
|
|
|
## Model variations |
|
|
|
T-FREX includes a set of released, fine-tuned models which are compared in the original study (pre-print available at http://arxiv.org/abs/2401.03833). |
|
|
|
- [**t-frex-bert-base-uncased**](https://huggingface.co/quim-motger/t-frex-bert-base-uncased) |
|
- [**t-frex-bert-large-uncased**](https://huggingface.co/quim-motger/t-frex-bert-large-uncased) |
|
- [**t-frex-roberta-base**](https://huggingface.co/quim-motger/t-frex-roberta-base) |
|
- [**t-frex-roberta-large**](https://huggingface.co/quim-motger/t-frex-roberta-large) |
|
- [**t-frex-xlnet-base-cased**](https://huggingface.co/quim-motger/t-frex-xlnet-base-cased) |
|
- [**t-frex-xlnet-large-cased**](https://huggingface.co/quim-motger/t-frex-xlnet-large-cased) |
|
|
|
## How to use |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline |
|
|
|
# Load the pre-trained model and tokenizer |
|
model_name = "quim-motger/t-frex-xlnet-base-cased" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForTokenClassification.from_pretrained(model_name) |
|
|
|
# Create a pipeline for named entity recognition |
|
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
# Example text |
|
text = "The share note file feature is completely useless." |
|
|
|
# Perform named entity recognition |
|
entities = ner_pipeline(text) |
|
|
|
# Print the recognized entities |
|
for entity in entities: |
|
print(f"Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']:.4f}") |
|
|
|
# Example with multiple texts |
|
texts = [ |
|
"Great app I've tested a lot of free habit tracking apps and this is by far my favorite.", |
|
"The only negative feedback I can give about this app is the difficulty level to set a sleep timer on it." |
|
] |
|
|
|
# Perform named entity recognition on multiple texts |
|
for text in texts: |
|
entities = ner_pipeline(text) |
|
print(f"Text: {text}") |
|
for entity in entities: |
|
print(f" Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']:.4f}") |
|
|
|
``` |