@mayank-mishra on Hugging Face: "New preprint out with colleagues from MIT and IBM Research…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

mayank-mishra

posted an update May 24

Post

1753

New preprint out with colleagues from MIT and IBM Research

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (2405.12981)

We introduce a simple mechanism of sharing keys and values across layers, reducing the memory needed for KV cache during inference!!

May 24

Very cool! cc @joaogante you might be interested in this

In this post