4090 Graphics Card 24gb Video Memory Not Enough
#2
by
xldistance
- opened
max_position_embeddings40000,cache mode FP8,Takes up 70g of video memory,4090 Graphics card can't talk properly
See how_i_run_34b_models_at_75k_context_on_24gb_fast
I am making exl2 quantizations right now.
Sorry thought I posted the link: https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/