isaiahbjork commited on
Commit
c945c04
1 Parent(s): 81f8b0a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md CHANGED
@@ -16,6 +16,11 @@ datasets:
16
  # Llama 3.1 8B Logic
17
  Prompt the model to "use COT" and will think things out logically.
18
 
 
 
 
 
 
19
  ## Example (Trained)
20
 
21
  ### Instruction:
@@ -150,6 +155,96 @@ text_streamer = TextStreamer(tokenizer)
150
  _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)
151
  ```
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  - **Developed by:** isaiahbjork
154
  - **License:** apache-2.0
155
  - **Finetuned from model :** unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
 
16
  # Llama 3.1 8B Logic
17
  Prompt the model to "use COT" and will think things out logically.
18
 
19
+ Basic Compound Words Evaluation (Below):
20
+ - Accuracy: 86.00%
21
+ - Correct predictions: 129
22
+ - Total predictions: 150
23
+
24
  ## Example (Trained)
25
 
26
  ### Instruction:
 
155
  _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)
156
  ```
157
 
158
+ # Evaluation - Google Colab
159
+ ```python
160
+ import re
161
+ import random
162
+ from transformers import TextStreamer
163
+
164
+ # Function to parse the model output and extract the predicted count
165
+ def extract_count(output):
166
+ match = re.search(r'The letter "[a-z]" (?:appears|occurs|is found|exists) (\d+)', output)
167
+ if match:
168
+ return int(match.group(1))
169
+ return None
170
+
171
+ # Function to generate test data
172
+ def generate_test_data(num_words=150):
173
+ words = ["Airplane", "Airport", "Angelfish", "Antfarm", "Ballpark", "Beachball", "Bikerack", "Billboard", "Blackhole", "Blueberry", "Boardwalk", "Bodyguard", "Bookstore", "Bow Tie", "Brainstorm", "Busboy", "Cabdriver", "Candlestick", "Car wash", "Cartwheel", "Catfish", "Caveman", "Chocolate chip", "Crossbow", "Daydream", "Deadend", "Doghouse", "Dragonfly", "Dress shoes", "Dropdown", "Earlobe", "Earthquake", "Eyeballs", "Father-in-law", "Fingernail", "Firecracker", "Firefighter", "Firefly", "Firework", "Fishbowl", "Fisherman", "Fishhook", "Football", "Forget", "Forgive", "French fries", "Goodnight", "Grandchild", "Groundhog", "Hairband", "Hamburger", "Handcuff", "Handout", "Handshake", "Headband", "Herself", "High heels", "Honeydew", "Hopscotch", "Horseman", "Horseplay", "Hotdog", "Ice cream", "Itself", "Kickball", "Kickboxing", "Laptop", "Lifetime", "Lighthouse", "Mailman", "Midnight", "Milkshake", "Moonrocks", "Moonwalk", "Mother-in-law", "Movie theater", "Newborn", "Newsletter", "Newspaper", "Nightlight", "Nobody", "Northpole", "Nosebleed", "Outer space", "Over-the-counter", "Overestimate", "Paycheck", "Policeman", "Ponytail", "Post card", "Racquetball", "Railroad", "Rainbow", "Raincoat", "Raindrop", "Rattlesnake", "Rockband", "Rocketship", "Rowboat", "Sailboat", "Schoolbooks", "Schoolwork", "Shoelace", "Showoff", "Skateboard", "Snowball", "Snowflake", "Softball", "Solar system", "Soundproof", "Spaceship", "Spearmint", "Starfish", "Starlight", "Stingray", "Strawberry", "Subway", "Sunglasses", "Sunroof", "Supercharge", "Superman", "Superstar", "Tablespoon", "Tailbone", "Tailgate", "Take down", "Takeout", "Taxpayer", "Teacup", "Teammate", "Teaspoon", "Tennis shoes", "Throwback", "Timekeeper", "Timeline", "Timeshare", "Tugboat", "Tupperware", "Underestimate", "Uplift", "Upperclassman", "Uptown", "Video game", "Wallflower", "Waterboy", "Watermelon", "Wheelchair", "Without", "Workboots", "Worksheet"]
174
+
175
+ # "Airplane", "Airport", "Angelfish", "Antfarm", "Ballpark", "Beachball", "Bikerack", "Billboard", "Blackhole", "Blueberry", "Boardwalk", "Bodyguard", "Bookstore", "Bow Tie", "Brainstorm", "Busboy", "Cabdriver", "Candlestick", "Car wash", "Cartwheel", "Catfish", "Caveman", "Chocolate chip", "Crossbow", "Daydream", "Deadend", "Doghouse", "Dragonfly", "Dress shoes", "Dropdown", "Earlobe", "Earthquake", "Eyeballs", "Father-in-law", "Fingernail", "Firecracker", "Firefighter", "Firefly", "Firework", "Fishbowl", "Fisherman", "Fishhook", "Football", "Forget", "Forgive", "French fries", "Goodnight", "Grandchild", "Groundhog", "Hairband", "Hamburger", "Handcuff", "Handout", "Handshake", "Headband", "Herself", "High heels", "Honeydew", "Hopscotch", "Horseman", "Horseplay", "Hotdog", "Ice cream", "Itself", "Kickball", "Kickboxing", "Laptop", "Lifetime", "Lighthouse", "Mailman", "Midnight", "Milkshake", "Moonrocks", "Moonwalk", "Mother-in-law", "Movie theater", "Newborn", "Newsletter", "Newspaper", "Nightlight", "Nobody", "Northpole", "Nosebleed", "Outer space", "Over-the-counter", "Overestimate", "Paycheck", "Policeman", "Ponytail", "Post card", "Racquetball", "Railroad", "Rainbow", "Raincoat", "Raindrop", "Rattlesnake", "Rockband", "Rocketship", "Rowboat", "Sailboat", "Schoolbooks", "Schoolwork", "Shoelace", "Showoff", "Skateboard", "Snowball", "Snowflake", "Softball", "Solar system", "Soundproof", "Spaceship", "Spearmint", "Starfish", "Starlight", "Stingray", "Strawberry", "Subway", "Sunglasses", "Sunroof", "Supercharge", "Superman", "Superstar", "Tablespoon", "Tailbone", "Tailgate", "Take down", "Takeout", "Taxpayer", "Teacup", "Teammate", "Teaspoon", "Tennis shoes", "Throwback", "Timekeeper", "Timeline", "Timeshare", "Tugboat", "Tupperware", "Underestimate", "Uplift", "Upperclassman", "Uptown", "Video game", "Wallflower", "Waterboy", "Watermelon", "Wheelchair", "Without", "Workboots", "Worksheet"
176
+ letters = "aeioulprts"
177
+ test_data = []
178
+ for word in words[:num_words]:
179
+ letter = random.choice(letters)
180
+ actual_count = word.lower().count(letter) # Use lower() to count case-insensitively
181
+ test_data.append((word, letter, actual_count))
182
+ return test_data
183
+
184
+ # Alpaca prompt template
185
+ alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
186
+
187
+ ### Instruction:
188
+ {0}
189
+
190
+ ### Input:
191
+ {1}
192
+
193
+ ### Response:
194
+ """
195
+
196
+ # Generate test data
197
+ test_data = generate_test_data()
198
+
199
+
200
+ # Run evaluation
201
+ correct_predictions = 0
202
+ total_predictions = 0
203
+
204
+ for word, letter, actual_count in test_data:
205
+ input_text = f"How many {letter}'s in {word}?"
206
+ prompt = alpaca_prompt.format(
207
+ "You are an expert at logic puzzles, reasoning, and planning",
208
+ input_text,
209
+ ""
210
+ )
211
+
212
+ inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
213
+ text_streamer = TextStreamer(tokenizer)
214
+ output = model.generate(**inputs, streamer=text_streamer, max_new_tokens=256)
215
+
216
+ decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
217
+ predicted_count = extract_count(decoded_output)
218
+
219
+ total_predictions += 1
220
+
221
+ if predicted_count is not None:
222
+ if predicted_count == actual_count:
223
+ correct_predictions += 1
224
+ else:
225
+ # If predicted_count is None and actual_count is 0, consider it correct
226
+ if actual_count == 0:
227
+ correct_predictions += 1
228
+ print(f"Warning: Could not extract a count from the model's response for '{word}'.")
229
+
230
+ print(f"Word: {word}, Letter: {letter}")
231
+ print(f"Actual count: {actual_count}, Predicted count: {predicted_count}")
232
+ print("Correct" if (predicted_count == actual_count or (predicted_count is None and actual_count == 0)) else "Incorrect")
233
+
234
+ # Calculate and print accuracy after each word
235
+ current_accuracy = correct_predictions / total_predictions
236
+ print(f"Current Accuracy: {current_accuracy:.2%}")
237
+ print(f"Correct predictions: {correct_predictions}")
238
+ print(f"Total predictions: {total_predictions}")
239
+ print("---")
240
+
241
+ # Calculate accuracy
242
+ accuracy = correct_predictions / total_predictions if total_predictions > 0 else 0
243
+ print(f"\nAccuracy: {accuracy:.2%}")
244
+ print(f"Correct predictions: {correct_predictions}")
245
+ print(f"Total predictions: {total_predictions}")
246
+ ```
247
+
248
  - **Developed by:** isaiahbjork
249
  - **License:** apache-2.0
250
  - **Finetuned from model :** unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit