Abstract
We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
Community
Amazing
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text (2023)
- TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models (2023)
- FIMO: A Challenge Formal Dataset for Automated Theorem Proving (2023)
- Qwen Technical Report (2023)
- Code Llama: Open Foundation Models for Code (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Introduces Llemma LLM for mathematical reasoning: continue pre-training Code LLaMA on Proof-pile-2 (scientific papers, math data, and math code); releases 7B and 34B modes (latter is better than Google’s Minerva for math problems). Domain-specific language model can give better performance with smaller size. Uses a custom code dataset AlgebricStack, OpenWebMath, arXiv subset of RedPajama, and generic data sources. Uses standard decoder-only LLaMA 2 models (initialized from Code Llama that was trained on code), autoregressive language modeling objective on Proof-Pile-2. Trained with bf16 (mixed precision) using GPT-NeoX, tensor parallelism, ZeRO sharded; also uses Flash Attention 2 for better throughput and lower memory usage; RoPE for long context fine-tuning. Performs better than open models on CoT mathematical problem solving (GSM8k, OCW, SAT, etc.), matches Minerva; better than Code LLaMA for tool use (GSM8k+Python). Best perplexity for 2:4:1 arxiv to web to code mixture. Appendix has dataset creation (composition and processes), evaluation details, and additional results. From Eleuther AI, CMU.
Very nice
where can we access this model in huggingface