--- license: apache-2.0 datasets: - sekarmulyani/ulasan-beauty-products language: - id metrics: - f1 - roc_auc - accuracy pipeline_tag: text-classification widget: - text: mengecewakan, sebulan baru sampai, botolnya pecah example_title: Contoh 1 - text: kulit saya jadi kering dan bruntusan example_title: Contoh 2 - text: barang sudah datang tapi kurirnya lemot example_title: Contoh 3 - text: lipbalm oke, sesuai harga example_title: Contoh 4 - text: recommended buat yang pengen keliatan cantik example_title: Contoh 5 library_name: transformers tags: - rating classification - e-commerce --- # E-Commerce Rating's Review Classification: Women's Beauty Product ## en: The project of this pretrained_model involves a series of complex steps that begin with leveraging the [IndoBERTweet](https://huggingface.co/indolem/indobertweet-base-uncased) pretrained model. The primary objective behind the development of this model is to address the bias frequently encountered in product reviews on e-commerce platforms. One of the classic issues on such platforms is the disconnect between the language used in reviews and the ratings given by users. This model was conceived with a focus on mitigating this problem. However, during its developmental journey, there are several limitations that need to be noted. Firstly, this model is only empowered by reviews of women's beauty products, potentially limiting its generalizability to specific categories. Additionally, it's important to remember that the dataset employed for training this model was generated through scraping techniques. While efficient, this technique also comes with the risk of introducing potential biases. The final output of this model, although holding substantial potential to enhance the quality of reviews on e-commerce platforms, still requires critical review. In-depth evaluation is necessary to comprehend the extent to which this model succeeds in addressing bias issues in product reviews. Understanding the limitations and potential impact of the scraped dataset is also vital in interpreting the outputs of this model. > This project is oriented towards academic pursuits and is undertaken as a stipulated requirement for graduation within the Information System undergraduate program at Computer Science Faculty, Amikom University of Purwokerto. --- ## Training Data (WIP) This model comes from **Epoch 4** Checkpoint | **Epoch** | **Training Loss** | **Validation Loss** | **F1** | **Roc Auc** | **Validation Acc** | **Test Acc** | |----------|-------------------|---------------------|-----------|------------|-------------------|--------------| | 1 | 0.374800 | 0.374789 | 0.438253 | 0.643794 | 0.344436 | 51.56% | | 2 | 0.346300 | 0.367311 | 0.469088 | 0.660424 | 0.384696 | 52.22% | | 3 | 0.311500 | 0.386395 | 0.480959 | 0.669563 | 0.423579 | 51.25% | | **4** | **0.261800** | **0.431841** | **0.496517** | **0.680931** | **0.458986** | **51.27%** | | 5 | 0.222300 | 0.478353 | 0.495308 | 0.681398 | 0.468297 | 50.43% | | 6 | 0.198800 | 0.536330 | 0.496174 | 0.682431 | 0.473149 | 50.69% | | 7 | 0.166200 | 0.608345 | 0.492919 | 0.680791 | 0.472166 | 49.78% | | 8 | 0.142400 | 0.651709 | 0.496586 | 0.683545 | 0.480428 | 50.07% | Note: Low accuracy might be attributed to the presence of reviewer bias in the validation and testing dataset. --- # Klasifikasi Rating Ulasan E-Commerce: Produk Kecantikan Wanita ## id: Proyek pembuatan pretrained_model ini melibatkan serangkaian langkah yang dimulai dengan pemanfaatan pretrained model [IndoBERTweet](https://huggingface.co/indolem/indobertweet-base-uncased). Tujuan utama di balik pengembangan model ini adalah mengatasi bias yang sering muncul dalam ulasan produk di platform e-commerce. Salah satu masalah klasik di platform semacam ini adalah ketidaksesuaian antara kata-kata dalam ulasan dan peringkat yang diberikan oleh pengguna. Model ini diciptakan dengan fokus pada penyelesaian masalah ini. Namun dalam pengembangannya, terdapat sejumlah keterbatasan yang perlu diperhatikan. Pertama, model ini hanya di finetune pada ulasan produk kecantikan wanita, sehingga kemampuannya hanya terbatas pada kategori ini. Selain itu dataset yang digunakan dalam pelatihan model ini dihasilkan melalui teknik scraping. Meskipun teknik ini efisien, namun juga memiliki potensi bias rating ulasan apabila tidak dilakukan supervisi. Hasil akhir dari model ini, meskipun memiliki potensi besar untuk meningkatkan kualitas ulasan di platform e-commerce masih perlu ditinjau secara kritis. Evaluasi mendalam diperlukan untuk memahami sejauh mana model ini berhasil mengatasi masalah bias dalam ulasan produk. Pemahaman tentang batasan dan potensi dampak dari dataset scraping juga penting dalam mengartikan keluaran dari model ini. > Proyek ini ditujukan untuk pencapaian akademis dan dilakukan sebagai persyaratan untuk meraih gelar sarjana dalam Program Studi Sistem Informasi Fakultas Ilmu Komputer di Universitas Amikom Purwokerto. --- # BibTex ``` @misc {sekar_mulyani_2023, author = { {Sekar Mulyani} }, title = { indobertweet-ulasan-beauty-products (Revision 7dbef46) }, year = 2023, url = { https://huggingface.co/sekarmulyani/indobertweet-ulasan-beauty-products }, doi = { 10.57967/hf/1033 }, publisher = { Hugging Face } } ```