mmcv error for python 3.10, cu118 and torch2.1.0 for custom characters.

#2
by H-Liu1997 - opened

hi @hysts , I'm almost in final stage to setup this project but met a mmcv error.

I wanna ask is there any example projects who run mmcv successfully on zero-gpus?

my project is driving 2D human video by speech audio, and mmcv is critical to allow users upload their custom videos. I confirm current scripts works well on my local, python 3.10, cu122 and torch 2.1.0. video examples are below

the building of current project including mmcv is around 45 mins, making it time consuming to debug. Thanks!

logs below:

 File "/home/user/app/SMPLer-X/app.py", line 133, in <module>
    infer(os.path.join(video_folder, video_input), 0.5, False, False, inferer, OUT_FOLDER)
  File "/home/user/app/SMPLer-X/app.py", line 103, in infer
    _, _, _ = inferer.infer(original_img, in_threshold, frame, multi_person, not(render_mesh))
  File "/home/user/app/SMPLer-X/main/inference.py", line 55, in infer
    mmdet_results = inference_detector(self.model, original_img)
  File "/usr/local/lib/python3.10/site-packages/mmdet/apis/inference.py", line 189, in inference_detector
    results = model.test_step(data_)[0]
  File "/usr/local/lib/python3.10/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
    return self._run_forward(data, mode='predict')  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward
    results = self(**data, mode=mode)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/mmdet/models/detectors/base.py", line 94, in forward
    return self.predict(inputs, data_samples)
  File "/usr/local/lib/python3.10/site-packages/mmdet/models/detectors/two_stage.py", line 231, in predict
    rpn_results_list = self.rpn_head.predict(
  File "/usr/local/lib/python3.10/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 197, in predict
    predictions = self.predict_by_feat(
  File "/usr/local/lib/python3.10/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 279, in predict_by_feat
    results = self._predict_by_feat_single(
  File "/usr/local/lib/python3.10/site-packages/mmdet/models/dense_heads/rpn_head.py", line 233, in _predict_by_feat_single
    return self._bbox_post_process(
  File "/usr/local/lib/python3.10/site-packages/mmdet/models/dense_heads/rpn_head.py", line 284, in _bbox_post_process
    det_bboxes, keep_idxs = batched_nms(bboxes, results.scores,
  File "/usr/local/lib/python3.10/site-packages/mmcv/ops/nms.py", line 303, in batched_nms
    dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
  File "/usr/local/lib/python3.10/site-packages/mmengine/utils/misc.py", line 395, in new_func
    output = old_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/mmcv/ops/nms.py", line 127, in nms
    inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
  File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/site-packages/mmcv/ops/nms.py", line 27, in forward
    inds = ext_module.nms(
RuntimeError: nms_impl: implementation for device cuda:0 not found.

@H-Liu1997 Ah, I don't think mmcv works on ZeroGPU. I'll assign a normal GPU grant then.

@H-Liu1997 is looks like the github repository disappeared.

I am getting 404 on https://github.com/CyberAgentAILab/TANGO

@hlevring hi, the github is under review for engineering team, I don't know when they will finish it. but the content of scripts in the github and here (hugging face) is same, you can just git clone this repo.

if you meet error for custom characters, that is due to the mmcv install error here, I recommend you to clone this repo and setup environment based on requirements.txt. with python 3.10, torch 2.1.0.

@hysts thanks for the generous GPUs from hugging face. it still have some bug on mmcv with L40s, I may hold this for few days and finding some repo use mmcv successfully

@hysts I found L40s is cu132. I will update my requirement.txt

@hysts hi, please ignore the previous and I will summarize here, I want to setup mmcv correctly on gradio-based environment.

possible solution: could we setup which image to pull? like cu121, or cu122, instead of cu123. I tested my code works on google colab. T4, cu122. I suppose the cu123 is too new here.

for example, may I ask for a dockerfile like gradio cu121-cudnn xxx ?

@H-Liu1997
I'm confused. Looks like the requirements.txt in your Space has mmcv and your Space is up. So, I guess at least it was successfully installed, no? Or, is it still broken for some reason?
What is the error, then? Is it really about the CUDA minor version? Or is it the same error as https://huggingface.co/spaces/H-Liu1997/TANGO/discussions/2#670c09fca317a660f0e3843c ?

Anyway, if the error is something to do with CUDA, maybe that's because you installed mmcv at build time. On Spaces infra, CUDA is not available at build time, so if mmcv requires CUDA to build some CUDA kernels or else, it just doesn't work even if you used Docker as the Space SDK.

A typical solution for this kind of issue is to install the package, which is mmcv in your case, at startup time instead of build time by running something like the following in your app.py.

import shlex
import subprocess

subprocess.run(shlex.split("pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.4/index.html"))

As for the docker SDK, I think you can just search on the Hub, but the following Dockerfile is the one I used in my old Spaces, so maybe it can be helpful. (I guess some of the packages I installed in it is old, so you might want to update the version, though).

FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
    git \
    git-lfs \
    wget \
    curl \
    # python build dependencies \
    build-essential \
    libssl-dev \
    zlib1g-dev \
    libbz2-dev \
    libreadline-dev \
    libsqlite3-dev \
    libncursesw5-dev \
    xz-utils \
    tk-dev \
    libxml2-dev \
    libxmlsec1-dev \
    libffi-dev \
    liblzma-dev \
    # gradio dependencies \
    ffmpeg  && \
    rm -rf /var/lib/apt/lists/*

RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
    PATH=/home/user/.local/bin:${PATH}
WORKDIR ${HOME}/app

RUN curl https://pyenv.run | bash
ENV PATH=${HOME}/.pyenv/shims:${HOME}/.pyenv/bin:${PATH}
ARG PYTHON_VERSION=3.10.13
RUN pyenv install ${PYTHON_VERSION} && \
    pyenv global ${PYTHON_VERSION} && \
    pyenv rehash && \
    pip install --no-cache-dir -U pip setuptools wheel && \
    pip install "huggingface-hub==0.19.3" "hf-transfer==0.1.4"

COPY --chown=1000 . ${HOME}/app
RUN pip install -r ${HOME}/app/requirements.txt

ENV PYTHONPATH=${HOME}/app \
    PYTHONUNBUFFERED=1 \
    HF_HUB_ENABLE_HF_TRANSFER=1 \
    GRADIO_ALLOW_FLAGGING=never \
    GRADIO_NUM_PORTS=1 \
    GRADIO_SERVER_NAME=0.0.0.0 \
    GRADIO_THEME=huggingface \
    TQDM_POSITION=-1 \
    TQDM_MININTERVAL=1 \
    SYSTEM=spaces
CMD ["python", "app.py"]

@hysts thanks! I finally downgrade the python to 3.9 and cu117, it works. now it is all set!

H-Liu1997 changed discussion status to closed

Sign up or log in to comment