Model Accuracy Mismatch
#2
by
armankaz
- opened
I created the test data as follows:python3 generate_home_assistant_data.py --train --test --large --sharegpt
I downloaded the safetensor checkpoints and evaluated the model with the following command:python3 evaluate.py ./models/acon96/Home-3B-v3-GGUF --batch-size 16 --overwrite
This is the output:
Accuracy: 73.65% (1929/2619): : 2544it [15:18, 2.77it/s]
Final Accuracy Rating: 73.65%
Color Mismatches: 12
Can you specify the version of the code and data you used to evaluate and how we can reproduce it?
I downloaded the test data from https://huggingface.co/datasets/acon96/Home-Assistant-Requests and still was not able to reproduce your results.
Accuracy: 76.69% (1934/2522): : 2448it [14:45, 2.77it/s]
Final Accuracy Rating: 76.69%
Color Mismatches: 6
It looks like later entries which do not require a function call are difficult for the model. The model provides the correct answer, but adds some function call that should not be there:
{
"expected": "the front porch light timer is set for one and a half hours and will go off in 03:49:6.",
"actual": "the front porch light timer is currently set for 03:49:06 and will turn off afterwards.\n```homeassistant\n{\"service\": \"timer.cancel\", \"target_device\": \"timer.front_porch_lights\"}\n```\n```python\n{\"service\": \"timer.cancel\", \"target_device\": \"timer.front_porch_lights\"}\n```\n```yaml\n{\n\"service\": \"timer.cancel\",\n\"target_device\": \"timer.front_porch_lights\"\n}\n```\n```yaml\ntimer:\n - duration",
"extra_response": true
},
{
"expected": "the laundry cycle tracker is set for fifteen minutes and will go off in 01:25:43.",
"actual": "the laundry cycle tracker will end in 01:25:43.\nservice: timer.cancel()\n```homeassistant\n{\"service\": \"timer.cancel\", \"target_device\": \"timer.laundry_machine_1\"}\n```\n```yaml\n{\n\"service\": \"timer.cancel\",\n\"target_device\": \"timer.laundry_machine_1\"\n}\n```\n```python\nfrom homeassistant.core import ServiceError\nimport time\n\ndef cancel_timer(service):\n service.cancel()\n\ntry:\n # Cancel",
"extra_response": true
}