brkkaya
's Collections
Clinical Text Summarization: Adapting Large Language Models Can
Outperform Human Experts
Paper
•
2309.07430
•
Published
•
27
MindAgent: Emergent Gaming Interaction
Paper
•
2309.09971
•
Published
•
11
Cure the headache of Transformers via Collinear Constrained Attention
Paper
•
2309.08646
•
Published
•
12
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
•
2309.09117
•
Published
•
37
Uncovering mesa-optimization algorithms in Transformers
Paper
•
2309.05858
•
Published
•
12
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Paper
•
2312.14878
•
Published
•
13
Time is Encoded in the Weights of Finetuned Language Models
Paper
•
2312.13401
•
Published
•
19
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper
•
2312.12456
•
Published
•
41
HAAR: Text-Conditioned Generative Model of 3D Strand-based Human
Hairstyles
Paper
•
2312.11666
•
Published
•
12
Cascade Speculative Drafting for Even Faster LLM Inference
Paper
•
2312.11462
•
Published
•
8
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper
•
2312.10003
•
Published
•
35
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
Astraios: Parameter-Efficient Instruction Tuning Code Large Language
Models
Paper
•
2401.00788
•
Published
•
21
Improving Text Embeddings with Large Language Models
Paper
•
2401.00368
•
Published
•
79
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper
•
2401.01055
•
Published
•
54
Hyena Hierarchy: Towards Larger Convolutional Language Models
Paper
•
2302.10866
•
Published
•
7
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
•
2401.02954
•
Published
•
40
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
and Distributed KVCache
Paper
•
2401.02669
•
Published
•
14
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper
•
2401.02415
•
Published
•
53
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper
•
2401.02412
•
Published
•
36
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
•
2401.00908
•
Published
•
180
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
•
2401.01325
•
Published
•
26
A Comprehensive Study of Knowledge Editing for Large Language Models
Paper
•
2401.01286
•
Published
•
16
Unicron: Economizing Self-Healing LLM Training at Scale
Paper
•
2401.00134
•
Published
•
9
The Internal State of an LLM Knows When its Lying
Paper
•
2304.13734
•
Published
•
2
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper
•
2312.07910
•
Published
•
15
SparQ Attention: Bandwidth-Efficient LLM Inference
Paper
•
2312.04985
•
Published
•
38
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Paper
•
2401.00935
•
Published
•
17
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
•
2309.14717
•
Published
•
44
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding
Paper
•
2401.04398
•
Published
•
20
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper
•
2208.07339
•
Published
•
4
Sigmoid Loss for Language Image Pre-Training
Paper
•
2303.15343
•
Published
•
4
Accelerating LLM Inference with Staged Speculative Decoding
Paper
•
2308.04623
•
Published
•
23