Update README.md
Browse files
README.md
CHANGED
@@ -101,6 +101,63 @@ chatbot = pipeline("text-generation", model="behbudiy/Mistral-7B-Instruct-Uz")
|
|
101 |
chatbot(messages)
|
102 |
```
|
103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
## More
|
105 |
For more details and examples, refer to the base model below:
|
106 |
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
|
|
|
101 |
chatbot(messages)
|
102 |
```
|
103 |
|
104 |
+
## Information on Evaluation Method
|
105 |
+
|
106 |
+
To evaluate on the translation task, we used FLORES+ Uz-En / En-Uz datasets, where we merged the dev and test sets to create a bigger evaluation data for each Uz-En and En-Uz subsets.
|
107 |
+
We used the following prompt to do one-shot Uz-En evaluation both for the base model and Uzbek-optimized model (for En-Uz eval, we changed the positions of the words "English" and "Uzbek").
|
108 |
+
|
109 |
+
```python
|
110 |
+
prompt = f'''You are a professional Uzbek-English translator. Your task is to accurately translate the given Uzbek text into English.
|
111 |
+
|
112 |
+
Instructions:
|
113 |
+
1. Translate the text from Uzbek to English.
|
114 |
+
2. Maintain the original meaning and tone.
|
115 |
+
3. Use appropriate English grammar and vocabulary.
|
116 |
+
4. If you encounter an ambiguous or unfamiliar word, provide the most likely translation based on context.
|
117 |
+
5. Output only the English translation, without any additional comments.
|
118 |
+
|
119 |
+
Example:
|
120 |
+
Uzbek: "Bugun ob-havo juda yaxshi, quyosh charaqlab turibdi."
|
121 |
+
English: "The weather is very nice today, the sun is shining brightly."
|
122 |
+
|
123 |
+
Now, please translate the following Uzbek text into English:
|
124 |
+
"{sentence}"
|
125 |
+
'''
|
126 |
+
```
|
127 |
+
|
128 |
+
To assess the model's ability in Uzbek sentiment analysis, we used the **risqaliyevds/uzbek-sentiment-analysis** dataset, for which we created binary labels (0: Negative, 1: Positive) using GPT-4o API (refer to **behbudiy/uzbek-sentiment-analysis** dataset).
|
129 |
+
We used the following prompt for the evaluation:
|
130 |
+
|
131 |
+
```python
|
132 |
+
prompt = f'''Given the following text, determine the sentiment as either 'Positive' or 'Negative.' Respond with only the word 'Positive' or 'Negative' without any additional text or explanation.
|
133 |
+
|
134 |
+
Text: {text}"
|
135 |
+
'''
|
136 |
+
```
|
137 |
+
For Uzbek News Classification, we used **risqaliyevds/uzbek-zero-shot-classification** dataset and asked the model to predict the category of the news using the following prompt:
|
138 |
+
|
139 |
+
```python
|
140 |
+
prompt = f'''Classify the given Uzbek news article into one of the following categories. Provide only the category number as the answer.
|
141 |
+
|
142 |
+
Categories:
|
143 |
+
0 - Politics (Siyosat)
|
144 |
+
1 - Economy (Iqtisodiyot)
|
145 |
+
2 - Technology (Texnologiya)
|
146 |
+
3 - Sports (Sport)
|
147 |
+
4 - Culture (Madaniyat)
|
148 |
+
5 - Health (Salomatlik)
|
149 |
+
6 - Family and Society (Oila va Jamiyat)
|
150 |
+
7 - Education (Ta'lim)
|
151 |
+
8 - Ecology (Ekologiya)
|
152 |
+
9 - Foreign News (Xorijiy Yangiliklar)
|
153 |
+
|
154 |
+
Now classify this article:
|
155 |
+
"{text}"
|
156 |
+
|
157 |
+
Answer (number only):"
|
158 |
+
'''
|
159 |
+
```
|
160 |
+
|
161 |
## More
|
162 |
For more details and examples, refer to the base model below:
|
163 |
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
|