OrigamiDream

OrigamiDream

AI & ML interests

Medicine Specialist sLLMs

Recent Activity

liked a model about 13 hours ago

AIDC-AI/Marco-o1

liked a dataset 8 days ago

microsoft/orca-agentinstruct-1M-v1

liked a model 19 days ago

tencent/Tencent-Hunyuan-Large

View all activity

Organizations

None yet

OrigamiDream's activity

liked a model about 13 hours ago

AIDC-AI/Marco-o1

Text Generation • Updated 1 day ago • 1.81k • 290

liked a dataset 8 days ago

microsoft/orca-agentinstruct-1M-v1

Viewer • Updated 24 days ago • 1.05M • 2.68k • 352

liked a model 19 days ago

tencent/Tencent-Hunyuan-Large

Text Generation • Updated about 18 hours ago • 187 • 469

liked a dataset 21 days ago

nyuuzyou/stickers

Updated Jan 15 • 160 • 5

Reacted to yongchanghao's post with 🔥 22 days ago

Post

3726

We just released a paper (NeuZip) that compresses VRAM in a lossless manner to run larger models. This should be particularly useful when VRAM is insufficient during training/inference. Specifically, we look inside each floating number and find that the exponents are highly compressible (as shown in the figure below).

Read more about the work at NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks (2410.20650)

liked a dataset 22 days ago

beomi/KoAlpaca-RealQA

Viewer • Updated 22 days ago • 18.5k • 410 • 24

liked a model 23 days ago

HuggingFaceTB/SmolLM2-1.7B-Instruct

Text Generation • Updated about 12 hours ago • 76.7k • • 357

liked a model 24 days ago

Etched/oasis-500m

Updated 20 days ago • 5.33k • 415

liked a model about 1 month ago

deepseek-ai/Janus-1.3B

Any-to-Any • Updated 10 days ago • 5.25k • 462

liked 3 datasets about 1 month ago

liked 2 models about 1 month ago

arcee-ai/SuperNova-Medius

Text Generation • Updated 27 days ago • 10.6k • 186

google/gemma-2-2b-jpn-it

Text Generation • Updated Oct 2 • 23.6k • 138

upvoted a paper about 2 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 166

Reacted to Jaward's post with 🔥 about 2 months ago

Post

1136

Lightweight implementation of newly introduced “Differential Transformer”:
Proposes differential attention mechanism which computes attention scores as a difference between two separate softmax attention maps thereby reducing noise in attention blocks. [[[Differential nanoGPT]]] :)

Code: https://github.com/Jaykef/ai-algorithms/blob/main/DIFF_Transformer.ipynb
YT Video: https://youtu.be/9V4mJA5y7dg

liked a dataset about 2 months ago

KbsdJames/MathMinos-Natural-language-feedback

Viewer • Updated Jul 22 • 30k • 14 • 6

liked 2 models about 2 months ago

Qwen/Qwen2.5-72B-Instruct-GGUF

Text Generation • Updated Sep 20 • 14.9k • 29

openai/whisper-large-v3-turbo

Automatic Speech Recognition • Updated Oct 4 • 1.86M • • 1.41k

liked a dataset about 2 months ago

argilla/magpie-ultra-v0.1

Viewer • Updated Oct 9 • 50k • 409 • 217