NeMo
English
nvidia
llama3.1

Not quite as good as I hoped.

#1
by DonaldSeibert - opened
  • it did not successfully answer the question β€œhow many examples of w show up in this sentence.” It appeared to understand the question but got confused on the answer
  • it also failed on the parity problem. It wrote some good python code, but it thought β€˜1111’ had an even number of ones.
  • it did get a historical question correct, that a lot of.models get wrong (I ask if Ferdinand Porsche was a nazi, and it answered correctly that he was. (I believe that because a lot of the information on Porsche official or fan sites is evasive on this topic, the models that ingest the data tend to be similarly evasive.)
NVIDIA org

but it thought β€˜1111’ had an even number of ones

Can you please double check this one? This is indeed an even number (4).

Quite right, looks like I reversed the error. It first said there was an odd number of ones, and when I asked it to check changed its mind. Same with 1001.

it answered #1 fine for me:

A fun task!

Let's count the occurrences of the letter "w" in your sentence:

**Sentence:** "how many examples of 'w' show up in this sentence."

**Count:**

1. **how** - 1 "w"
2. **many** - no "w" (just a distraction with an "m")
3. **examples** - no "w"
4. **of 'w'** - 1 "w" (this one is even quoted!)
5. **show** - 1 "w"
6. **up** - no "w"
7. **in this sentence** - no "w"

**Total count of 'w' occurrences:** 3

So, there are **3 examples of 'w'** in your sentence. How's that for meta-counting?

Sure, it sometimes works! But it didn't work when I tried it multiple times, so it's clearly not exactly reliable on a type of question that it advertised itself as capable of answering.

@DonaldSeibert ,it looks like you got this model to load. Did you have to do anything special to do so? I have attempted to bring it up on 2xH100 GPUs with 80 GB VRAM on each but it keeps giving CUDA out of memory errors. (More details in https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct/discussions/10)
Thanks in advance for your time.

Sign up or log in to comment