Jie Fu's picture

3 18 7

Jie Fu

bigaidream

·

https://bigaidream.github.io/

AI & ML interests

LLM, Reinforcement Learning, System-2 Deep Learning (Reasoning, Planning), Automatic Theorem Proving, AI Safety

Organizations

None yet

bigaidream's activity

upvoted a paper 14 days ago

Layerwise Recurrent Router for Mixture-of-Experts

Paper • 2408.06793 • Published 15 days ago • 30

upvoted 3 papers 2 months ago

A Closer Look into Mixture-of-Experts in Large Language Models

Paper • 2406.18219 • Published Jun 26 • 15

Unlocking Continual Learning Abilities in Language Models

Paper • 2406.17245 • Published Jun 25 • 27

Efficient Continual Pre-training by Mitigating the Stability Gap

Paper • 2406.14833 • Published Jun 21 • 19

upvoted 3 papers 3 months ago

VCR: Visual Caption Restoration

Paper • 2406.06462 • Published Jun 10 • 10

LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

Paper • 2405.16287 • Published May 25 • 10

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published May 24 • 23

upvoted 3 papers 5 months ago

Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 34

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

Paper • 2403.18058 • Published Mar 26 • 4

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Paper • 2404.03543 • Published Apr 4 • 15

upvoted 5 papers 6 months ago

Think Before You Act: Decision Transformers with Internal Working Memory

Paper • 2305.16338 • Published May 24, 2023 • 3

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25 • 55

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Paper • 2402.16671 • Published Feb 26 • 26

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Paper • 2402.14658 • Published Feb 22 • 81

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19 • 39

upvoted 2 papers 7 months ago

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

Paper • 2401.11944 • Published Jan 22 • 24

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13 • 24

upvoted a paper 9 months ago

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35