azimjon commited on
Commit
cfa72b8
1 Parent(s): 8318dde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -101,6 +101,63 @@ chatbot = pipeline("text-generation", model="behbudiy/Mistral-7B-Instruct-Uz")
101
  chatbot(messages)
102
  ```
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ## More
105
  For more details and examples, refer to the base model below:
106
  https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
 
101
  chatbot(messages)
102
  ```
103
 
104
+ ## Information on Evaluation Method
105
+
106
+ To evaluate on the translation task, we used FLORES+ Uz-En / En-Uz datasets, where we merged the dev and test sets to create a bigger evaluation data for each Uz-En and En-Uz subsets.
107
+ We used the following prompt to do one-shot Uz-En evaluation both for the base model and Uzbek-optimized model (for En-Uz eval, we changed the positions of the words "English" and "Uzbek").
108
+
109
+ ```python
110
+ prompt = f'''You are a professional Uzbek-English translator. Your task is to accurately translate the given Uzbek text into English.
111
+
112
+ Instructions:
113
+ 1. Translate the text from Uzbek to English.
114
+ 2. Maintain the original meaning and tone.
115
+ 3. Use appropriate English grammar and vocabulary.
116
+ 4. If you encounter an ambiguous or unfamiliar word, provide the most likely translation based on context.
117
+ 5. Output only the English translation, without any additional comments.
118
+
119
+ Example:
120
+ Uzbek: "Bugun ob-havo juda yaxshi, quyosh charaqlab turibdi."
121
+ English: "The weather is very nice today, the sun is shining brightly."
122
+
123
+ Now, please translate the following Uzbek text into English:
124
+ "{sentence}"
125
+ '''
126
+ ```
127
+
128
+ To assess the model's ability in Uzbek sentiment analysis, we used the **risqaliyevds/uzbek-sentiment-analysis** dataset, for which we created binary labels (0: Negative, 1: Positive) using GPT-4o API (refer to **behbudiy/uzbek-sentiment-analysis** dataset).
129
+ We used the following prompt for the evaluation:
130
+
131
+ ```python
132
+ prompt = f'''Given the following text, determine the sentiment as either 'Positive' or 'Negative.' Respond with only the word 'Positive' or 'Negative' without any additional text or explanation.
133
+
134
+ Text: {text}"
135
+ '''
136
+ ```
137
+ For Uzbek News Classification, we used **risqaliyevds/uzbek-zero-shot-classification** dataset and asked the model to predict the category of the news using the following prompt:
138
+
139
+ ```python
140
+ prompt = f'''Classify the given Uzbek news article into one of the following categories. Provide only the category number as the answer.
141
+
142
+ Categories:
143
+ 0 - Politics (Siyosat)
144
+ 1 - Economy (Iqtisodiyot)
145
+ 2 - Technology (Texnologiya)
146
+ 3 - Sports (Sport)
147
+ 4 - Culture (Madaniyat)
148
+ 5 - Health (Salomatlik)
149
+ 6 - Family and Society (Oila va Jamiyat)
150
+ 7 - Education (Ta'lim)
151
+ 8 - Ecology (Ekologiya)
152
+ 9 - Foreign News (Xorijiy Yangiliklar)
153
+
154
+ Now classify this article:
155
+ "{text}"
156
+
157
+ Answer (number only):"
158
+ '''
159
+ ```
160
+
161
  ## More
162
  For more details and examples, refer to the base model below:
163
  https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3