no kernel image is available for execution on the device

#110
by hzxie - opened

Using ZeroGPU for projects with custom CUDA extensions is highly challenging.

  1. The lack of available nvcc compilers means I must precompile the .whl file on my local machine.
  2. Even with a precompiled .whl, I encounter the error "no kernel image is available for execution on the device," despite ensuring that my local build environment exactly matches the dependencies on ZeroGPU.

I've already spent hours troubleshooting this without success. Could you offer any suggestions?

Related spaces: https://huggingface.co/spaces/hzxie/city-dreamer

Error Logs:

To create a public link, set `share=True` in `launch()`.
[INFO] 2024-09-22 05:09:00,080 generated new fontManager
[INFO] 2024-09-22 05:09:00,340 HTTP Request: POST http://device-api.zero/schedule?cgroupPath=%2Fkubepods.slice%2Fkubepods-burstable.slice%2Fkubepods-burstable-pod8a1341ae_14c0_4a55_a7a2_186a8f02387f.slice%2Fcri-containerd-64dfe113d446587ed59593a3f5a144d066dc5f901e9ee7868a35c6b9bb77552c.scope&taskId=140441127454608&enableQueue=true&token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpcCI6IjIxOS43NC4xMTYuMjA4IiwidXNlciI6Imh6eGllIiwidXVpZCI6bnVsbCwiZXhwIjoxNzI2OTc0NTk5fQ.P-YarKitYJ1myDu5A2W8tIx55s_KYL7YnLys3eMtZWU "HTTP/1.1 200 OK"
[INFO] 2024-09-22 05:09:00,385 HTTP Request: POST http://device-api.zero/allow?allowToken=9867384f64ee42173d467ec34e06815732f3034e8d19d7002f4add340d19a832&pid=276 "HTTP/1.1 200 OK"
[INFO] 2024-09-22 05:09:01,231 CUDA is available: True
[INFO] 2024-09-22 05:09:01,231 PyTorch is built with CUDA: 12.1
[INFO] 2024-09-22 05:09:05,008 Generating latent codes ...
[INFO] 2024-09-22 05:09:05,208 Generating seg volume ...
Error in extrude_tensor_ext_cuda_forward: no kernel image is available for execution on the device
[INFO] 2024-09-22 05:09:05,251 Rendering City Image ...
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 256, in thread_wrapper
    res = future.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/app/app.py", line 94, in get_generated_city
    return citydreamer.inference.generate_city(
  File "/home/user/app/citydreamer/inference.py", line 80, in generate_city
    img = render(
  File "/home/user/app/citydreamer/inference.py", line 508, in render
    buildings = torch.unique(voxel_id[voxel_id > CONSTANTS["BLD_INS_LABEL_MIN"]])
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

From the global scope after starting the Zero GPU space, the GPU is barely recognized, so if it is a library, etc., it is normal to install as follows, but this is tough when it comes to custom CUDA extensions...

import subprocess
subprocess.run('pip install flash-attn --no-build-isolation', env={'FLASH_ATTENTION_SKIP_CUDA_BUILD': "TRUE"}, shell=True)

@John6666
Thanks for your reply. But this cannot solve my issue.

I moved the building CUDA extension blocks to the global scope as below.

import os
import subprocess

# Compile CUDA extensions
# Ref: https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/110#66ef9672127751a231f83f80
ext_dir = os.path.join(os.path.dirname(__file__), "citydreamer", "extensions")
for e in os.listdir(ext_dir):
    if os.path.isdir(os.path.join(ext_dir, e)):
        subprocess.call(
            ["pip", "install", "./%s" % e, "no-build-isolation"], cwd=ext_dir
        )

And I got the following errors.

Processing ./extrude_tensor
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  ร— python setup.py egg_info did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [12 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/user/app/citydreamer/extensions/extrude_tensor/setup.py", line 20, in <module>
          CUDAExtension(
        File "/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1074, in CUDAExtension
          library_dirs += library_paths(cuda=True)
        File "/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1201, in library_paths
          if (not os.path.exists(_join_cuda_home(lib_dir)) and
        File "/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2407, in _join_cuda_home
          raise OSError('CUDA_HOME environment variable is not set. '
      OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

ร— Encountered error while generating package metadata.
โ•ฐโ”€> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Powered by Google (Bing)!
As expected, this can't fool the installer?๐Ÿ˜…
https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/5

export CUDA_HOME=/usr/local/cuda-X.X
import os
os.environ["CUDA_HOME"] = "/usr/local/cuda-X.X"
from torch.utils.cpp_extension import CUDA_HOME

I think I cannot. Before precompiling the .whl files, I encounter a runtime error stating that nvcc cannot be found, even in functions decorated with @space .GPU.

It seems to actually not exist, at least as far as searches can be made. If it is not important whether it exists or not, it can be fooled, but if the thing is really necessary, is there any way to do it...?

https://huggingface.co/spaces/TencentARC/InstantMesh/blob/main/app.py

import shutil

def find_cuda():
    # Check if CUDA_HOME or CUDA_PATH environment variables are set
    cuda_home = os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')

    if cuda_home and os.path.exists(cuda_home):
        return cuda_home

    # Search for the nvcc executable in the system's PATH
    nvcc_path = shutil.which('nvcc')

    if nvcc_path:
        # Remove the 'bin/nvcc' part to get the CUDA installation path
        cuda_path = os.path.dirname(os.path.dirname(nvcc_path))
        return cuda_path

    return None

cuda_path = find_cuda()

if cuda_path:
    print(f"CUDA installation found at: {cuda_path}")
else:
    print("CUDA installation not found")

โ†“

CUDA installation not found

@John6666

This demo does not contain any CUDA extensions. No need to try, this function will definitely return None.

No need to try, this function will definitely return None.

Okay.

I tried packages.txt to no avail, but even though I could install ffmpeg, which is irrelevant in this case, I couldn't locate cuda or cuda-toolkit in any way...
https://huggingface.co/docs/hub/spaces-dependencies

@John6666
I solved this problem by manually installing CUDA toolkit.

def install_cuda_toolkit():
    CUDA_TOOLKIT_URL = "https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run"
    # CUDA_TOOLKIT_URL = "https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run"
    CUDA_TOOLKIT_FILE = "/tmp/%s" % os.path.basename(CUDA_TOOLKIT_URL)
    subprocess.call(["wget", "-q", CUDA_TOOLKIT_URL, "-O", CUDA_TOOLKIT_FILE])
    subprocess.call(["chmod", "+x", CUDA_TOOLKIT_FILE])
    subprocess.call([CUDA_TOOLKIT_FILE, "--silent", "--toolkit"])
    
    os.environ["CUDA_HOME"] = "/usr/local/cuda"
    os.environ["PATH"] = "%s/bin:%s" % (os.environ["CUDA_HOME"], os.environ["PATH"])
    os.environ["LD_LIBRARY_PATH"] = "%s/lib:%s" % (
        os.environ["CUDA_HOME"],
        "" if "LD_LIBRARY_PATH" not in os.environ else os.environ["LD_LIBRARY_PATH"],
    )
    # Fix: arch_list[-1] += '+PTX'; IndexError: list index out of range
    os.environ["TORCH_CUDA_ARCH_LIST"] = "8.0;8.6"
hzxie changed discussion status to closed

@John6666

But the inference results are not the same as the ones on my local machine.

Sign up or log in to comment