Multi-round conversation w/ PKV cache example code
#5
by
Xenova
HF staff
- opened
Hi there! As seen in your README, the model seemingly supports multi-round conversations. Does this also work with passing past key values? If so, could you provide example code for this, as it will dramatically improve performance? Thanks!
Great! It will greatly speed up time-to-first-token for the web demo I'm working on. If it doesn't work, then it's alright, it will produce the same results, just a bit slower since it needs to recompute KV cache on second run.
I've updated the model card + released the demo! :)
Model: https://huggingface.co/Xenova/nanoLLaVA
Demo: https://huggingface.co/spaces/Xenova/experimental-nanollava-webgpu
Video: