Mayank Mishra's picture

1644263.7 TFLOPS

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Recent Activity

upvoted a paper about 1 month ago

SelfCodeAlign: Self-Alignment for Code Generation

upvoted a collection about 1 month ago

New activity about 1 month ago

ibm-granite/granite-3.0-2b-instruct:add base model metadata

View all activity

Articles

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

Saving Memory Using Padding-Free Transformer Layers during Finetuning

Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model

Organizations

mayank-mishra's activity

upvoted a paper about 1 month ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31 • 20

upvoted a collection about 1 month ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 1 day ago • 184

New activity in ibm-granite/granite-3.0-2b-instruct about 1 month ago

add base model metadata

#3 opened about 1 month ago by

New activity in ibm-granite/granite-3.0-8b-instruct about 1 month ago

add base model metadata

#5 opened about 1 month ago by

New activity in ibm-granite/granite-3.0-1b-a400m-instruct about 1 month ago

Add base model metadata

#2 opened about 1 month ago by

upvoted a collection about 1 month ago

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 30 days ago • 93

updated 2 collections about 2 months ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17 • 15

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 30 days ago • 93

New activity in ibm/PowerMoE-3b 2 months ago

torch and llama.cpp integration

#1 opened 3 months ago by

updated a collection 3 months ago

Granite Code Models

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated 30 days ago • 178

upvoted a paper 3 months ago

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 37

authored a paper 3 months ago

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23 • 22

updated a collection 3 months ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17 • 15

upvoted a paper 3 months ago

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23 • 22