Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model Apr 2 โข 6
view post Post 1751 New preprint out with colleagues from MIT and IBM Research Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (2405.12981)We introduce a simple mechanism of sharing keys and values across layers, reducing the memory needed for KV cache during inference!!
view post Post 2521 Thrilled to unveil DS-MoE: a dense training and sparse inference scheme for enhanced computational and memory efficiency in your MoE models! ๐๐๐Discover more in our blog: https://huggingface.co/blog/bpan/ds-moe and dive into the details with our paper: Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models (2404.05567)
Dolomite Engine Sample This collections contains a sample dataset and model trained via dolomite-engine. Repo: https://github.com/ibm-granite/dolomite-engine/ mayank-mishra/glaive-code-assisstant-v3-20k Viewer โข Updated Jun 5 โข 20k โข 60 mayank-mishra/granite-3b-code-glaive-20k Text Generation โข Updated Jun 5 โข 15