Using accelerating libraries such DeepSpeed

#8
by marccasals - opened

I am trying to load the model using the deepspeed library. Is it possible to optimize this model using this library? I have tried setting

replace_with_kernel_inject=True

But it duplicated the amount of GPU Ram needed. Is there any solution?

OpenLM Research org

When evaluating with lm-eval-harness, it seems that the model does support using accelerate. After all this model should be fully compatible with LLaMA so any inference tricks for LLaMA should apply to OpenLLaMA

young-geng changed discussion status to closed

Sign up or log in to comment