Adding `safetensors` variant of this model
#19 opened 8 months ago
by
SFconvertbot
Adding Evaluation Results
#18 opened 8 months ago
by
leaderboard-pr-bot
any plans for mixtral 128k?
#17 opened 9 months ago
by
sirus
Transformers fix to mixed precision at long context lengths
1
#16 opened 11 months ago
by
nbroad
How much computation power(like gpus and gpu hour) you guys needed to finetune this?
1
#15 opened 12 months ago
by
zohadev
Yarn-StableLM-Epoch?
#14 opened 12 months ago
by
KnutJaegersberg
Instruction finetuning and train script, QLORA etc.
#13 opened 12 months ago
by
aamir1122a
Add widget examples
#11 opened 12 months ago
by
mishig
Using this model with Vllm
1
#10 opened 12 months ago
by
haltux
Can't deploy to any provider an inference endpoint
2
#9 opened 12 months ago
by
ejkkan
Pretraining from scratch?
#8 opened 12 months ago
by
MengboZhou
Fine-tuned with all parameters?
1
#6 opened 12 months ago
by
MengboZhou
VRAM usage for full 128k tokens
7
#5 opened 12 months ago
by
Hypersniper
sliding_window = 131072? Sliding window attention doesn't work for 128?
1
#4 opened 12 months ago
by
keyishen
smaller shards, pls
#2 opened 12 months ago
by
lskywalker
Instruct Version?
8
#1 opened 12 months ago
by
mrfakename