How can I send a reference text to the model which is splitted into multiple text blocks?

by FrankDase - opened Nov 21, 2023

Nov 21, 2023

•

edited Nov 21, 2023

Hi,

I receive from my database some long text answers that I like to send to the LLM together with my question.
I tell the model that I will send multiple text blocks and that it should not answer until all is sent and I ask my question.
But for some reason it answers after each text input.

Is that not possible with this model?

Example:

My script is sending it this way:

{
  model: 'llama2-13b-chat-german.ggmlv3.q4_0.bin',
  messages: [
    {
      role: 'user',
      content: 'Das folgende Skript ist Teil eines größeren Textes, den ich analysieren möchte. Ich werde dir den Text in Abschnitten (Chunks) senden, die jeweils weniger als 1000 Zeichen lang sind, um die Token-Limits zu beachten. Bitte warte mit deiner Analyse oder Antwort, bis ich alle Teile des Textes gesendet habe. Ich werde dir signalisieren, wenn der gesamte Text übermittelt wurde und ich bereit bin, deine umfassende Analyse zu erhalten. Chinas Staatschef Xi Jinping hat eine "sofortige Waffenruhe" im Krieg zwischen Israel und der radikalislamischen Hamas und eine Freilassung der "zivilen Gefangenen" gefordert. Xi sagte laut der staatlichen Nachrichtenagentur Xinhua bei einem virtuellen Sondergipfel der BRICS-Staaten Brasilien, Russland, Indien, China und Südafrika, "alle Konfliktparteien" sollten den Beschuss und die Kampfhandlungen "sofort" einstellen. Das war Abschnitt 1 von 2. Bitte warte auf die folgenden Abschnitte, bevor du antwortest.'
    }
  ],
  temperature: 0.7
}
{
  model: 'llama2-13b-chat-german.ggmlv3.q4_0.bin',
  messages: [
    {
      role: 'user',
      content: 'Xi rief die Konfliktparteien dem Bericht zufolge zudem dazu auf, "jegliche Gewalt und Angriffe auf Zivilisten" zu beenden und "zivile Gefangene" freizulassen. In Äußerungen Xis bei der Videokonferenz, die von einem Dolmetscher übersetzt wurden, rief der chinesische Präsident zudem zu einer "internationalen Friedenskonferenz" zur Beendigung des Gaza-Kriegs auf. Dabei müsse es auch um "eine baldige Lösung der Palästina-Frage" gehen, die "umfassend, gerecht und nachhaltig" sei und ohne die es im Nahen Osten "keinen nachhaltigen Frieden" geben werde. Das war der letzte Abschnitt. Alle Teile des Textes wurden gesendet. Basierend auf dem gesamten übermittelten Text, hier ist meine spezifische Frage: Was fordert Xi Jinping?'
    }
  ],
  temperature: 0.7
}

thanks in advance
Frank

jphme

Owner Nov 23, 2023

Hi Frank,

I don't know which software you are using but some tips:

Use the new EM German Leo Mistral, it should be way better than this model (and as its 7b you can fit more into its context window and has 4096 context size, easily extendable to 8k+).
It doesn't make sense to split the text into different chunk to stay under the context window (except you prepare a summary or shorten the chunks somehow) as the text has to be in the context window to get "attention" from the model (and it doesn't make a difference whether its in the same message or multiple messages).

Hope that helps!

jphme changed discussion status to closed Nov 23, 2023

dho

Nov 24, 2023

i would disagree, sorry, i have tested various RAG tasks against both this LLM and the leo-mistral-hessianai-7b-chat. For more complex reasoning tasks, the LLama-2 gives a better answer. In addition, Mistral hallucinates more frequently, but this can be stopped by QLora fine-tuning.

FrankDase

Nov 24, 2023

The EM German Leo Mistral is faster but the LLama2 gives more robust answers. With the Leo Mistral I have often answers with repeated text.

To answer which software I use: I use https://localai.io as API server and for the frontend I wrote my own application with NodeJs and Express.
Example of my frontend: https://share.vidyard.com/watch/p8mWyKyHFnWXQ5Nynit9D4?

jphme

Owner Nov 24, 2023

i would disagree, sorry, i have tested various RAG tasks against both this LLM and the leo-mistral-hessianai-7b-chat. For more complex reasoning tasks, the LLama-2 gives a better answer. In addition, Mistral hallucinates more frequently, but this can be stopped by QLora fine-tuning.

@dho Many thanks for your feedback! Would you be able to provide an example for one of these RAG-related reasoning tasks (preferably here: https://github.com/jphme/EM_German/issues )?
For this model, RAG was more an afterthought/gimmick, but we are currently preparing the data for the next model generation with more robust RAG capabilities and I would love to add some training examples for cases like you mentioned.

@FrankDase same for you, can you give an example for one of these prompts? Did you try to increase the temperature and presence/frequency penalty slightly?

Many thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment