Edit model card

WangchanBERTa Base for Sentiment Analysis

This is a fine-tuned version of the WangchanBERTa model, trained for sentiment analysis in Thai language using simpletransformers.

Model Details

  • Model Name: WangchanBERTa Base Sentiment Analysis
  • Pretrained Base Model: airesearch/wangchanberta-base-att-spm-uncased
  • Architecture: CamemBERT
  • Language: Thai
  • Task: Sentiment Classification

Training Configuration

  • Training Dataset: (e.g., your dataset name or a public dataset if applicable)
  • Number of Training Epochs: 6
  • Train Batch Size: 16
  • Eval Batch Size: 32
  • Learning Rate: 2e-5
  • Optimizer: AdamW
  • Scheduler: Cosine
  • Gradient Accumulation Steps: 2
  • Seed: 42
  • Training Framework: simpletransformers
  • FP16: Disabled

Model Performance

Provide any performance metrics here, such as accuracy, F1-score, etc., depending on your dataset.

Usage

To use this model, you can load it as follows:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
import numpy as np
from pythainlp.tokenize import word_tokenize

tokenizer = AutoTokenizer.from_pretrained("Pongsathorn/wangchanberta-base-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("Pongsathorn/wangchanberta-base-sentiment")

id2label = {
    0: "pos", 
    1: "neu", 
    2: "neg",  
}

input_text = "พนักงานบริการดีมาก สัญญาณก็ดี แต่ร้านอยู่ที่ไหน อยากได้ข้อมูลเพิ่มเติม จะได้ประกาศบนเว็บถูก"  

segmented_text = word_tokenize(input_text, engine="longest")

preprocessed_text = " ".join(segmented_text)

inputs = tokenizer(preprocessed_text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

probs = F.softmax(logits, dim=-1)

predicted_class = torch.argmax(probs, dim=-1).item()

predicted_label = id2label[predicted_class]

print("Predicted Label (ID):", predicted_class)
print("Predicted Label (Description):", predicted_label)
max_prob = np.max(probs.numpy())
print(f"Maximum Probability: {max_prob:.4f}")
Downloads last month
1
Safetensors
Model size
105M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .