torch.compile error
Hi, i have this problem when i am running torch.compile code snippet:
Traceback (most recent call last):
File "/data/gemma_torch/torch_compile.py", line 40, in
outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
...
torch._dynamo.exc.Unsupported: reconstruct: UserDefinedObjectVariable(HybridCache)
from user code:
File "/home/.local/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 1111, in forward
return CausalLMOutputWithPast(
Versions:
torch: 2.3.1
transformers 4.42.4
python: 3.10
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
GPU: NVIDIA L40S
Did anyone encounter this problem? What can i try to get it running?
Hi
@LD-inform
, It seems, there is issue with the installed torch version 2.3.1
. Could you please try again by upgrading the torch
library to the latest version 2.5.1 using !pip install -U torch
and let us know if the issue still persists.
Thank you @Renu11 .
i had to add this parameter to torch.compile(..., backend="eager")
without it, i had this error:
/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1465, in _call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Should never be installed
however, the speed increased only by 5t/s (from 44t/s to 49t/s) in 2B model and in 9B model it even decreased from 26t/s to 20t/s.
i used it with different prompt as well but i doubt that is the problem:
input_text = "user\nExplain in details difference between integrals and derivatives.\nmodel"
is there anything else i can try?