THUDM/LongWriter-glm4-9b · Getting error while using on CPU

Aug 16

/bin/python3 "/media/mohi/Disk 1/Solutyics/GLM_4_Testing/task1.py"
╭─    /media/mohi/Disk 1/Solutyics/GLM_4_Testing ···························································································· ✔ ─╮
╰─ /bin/python3 "/media/mohi/Disk 1/Solutyics/GLM_4_Testing/task1.py" ─╯
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 10.98it/s]
Traceback (most recent call last):
File "/media/mohi/Disk 1/Solutyics/GLM_4_Testing/task1.py", line 20, in
output = model.generate(
File "/home/mohi/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/mohi/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1989, in generate
result = self._sample(
File "/home/mohi/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2932, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mohi/.cache/huggingface/modules/transformers_modules/THUDM/LongWriter-glm4-9b/81b025e373d163efd7908a787b3fb907424c6184/modeling_chatglm.py", line 801, in forward
transformer_outputs = self.transformer(
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mohi/.cache/huggingface/modules/transformers_modules/THUDM/LongWriter-glm4-9b/81b025e373d163efd7908a787b3fb907424c6184/modeling_chatglm.py", line 707, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mohi/.cache/huggingface/modules/transformers_modules/THUDM/LongWriter-glm4-9b/81b025e373d163efd7908a787b3fb907424c6184/modeling_chatglm.py", line 551, in forward
layer_ret = layer(
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mohi/.cache/huggingface/modules/transformers_modules/THUDM/LongWriter-glm4-9b/81b025e373d163efd7908a787b3fb907424c6184/modeling_chatglm.py", line 454, in forward
attention_output, kv_cache = self.self_attention(
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mohi/.cache/huggingface/modules/transformers_modules/THUDM/LongWriter-glm4-9b/81b025e373d163efd7908a787b3fb907424c6184/modeling_chatglm.py", line 351, in forward
context_layer = self.core_attention(query_layer, key_layer, value_layer, attention_mask)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mohi/.cache/huggingface/modules/transformers_modules/THUDM/LongWriter-glm4-9b/81b025e373d163efd7908a787b3fb907424c6184/modeling_chatglm.py", line 211, in forward
context_layer = flash_attn_unpadded_func(
TypeError: 'NoneType' object is not callable

╭─    /media/mohi/Disk 1/Solutyics/GLM_4_Testing ············································································ 1 ✘  took 11s  ─╮
╰─ pip install flash-attn ─╯

Defaulting to user installation because normal site-packages is not writeable
Collecting flash-attn
Using cached flash_attn-2.6.3.tar.gz (2.6 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
fatal: not a git repository (or any of the parent directories): .git
/tmp/pip-install-dt3l45za/flash-attn_76bb505f607b4d9783ee43defc787cf6/setup.py:95: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-dt3l45za/flash-attn_76bb505f607b4d9783ee43defc787cf6/setup.py", line 179, in
CUDAExtension(
File "/home/mohi/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1077, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/home/mohi/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1204, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/home/mohi/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2419, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

  torch.__version__  = 2.3.0+cu121
  
  
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

mohi0170

Aug 16

How we can fix that error

bys0318

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Aug 16

•

edited Aug 16

Hi, sorry for bringing this inconvenience. Currently our code relies on FlashAttention2 while FlashAttention2 only supports GPU environment. We will soon (hopefully by the end of this week) update the code to remove the reliance on FlashAttention2. Please stay tuned.

bys0318

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Aug 17

•

edited Aug 17

Good news! We've updated the modeling_chatglm.py to get rid of the dependency on FlashAttention2. Now I believe you can successfully run it on CPU.