Spaces:
Running
Dynamic ZeroGPU Duration
Hi everyone, I want to share my code to request dynamic GPU duration on ZeroGPU.
I am happy to contribute this code to the spaces
package, but I can't find the repo for it. (The link on PyPI is mislinked to the huggingface_hub
repo, and I can't find the relevant code in that repo.) Does Hugging Face want to open source the repo for spaces
?
from typing import Callable
from functools import partial
import gradio as gr
import spaces
import spaces.config
from spaces.zero.decorator import P, R
def _dynGPU(
fn: Callable[P, R] | None, duration: Callable[P, int], min=30, max=300, step=10
) -> Callable[P, R]:
if not spaces.config.Config.zero_gpu:
return fn
funcs = [
(t, spaces.GPU(duration=t)(lambda *args, **kwargs: fn(*args, **kwargs)))
for t in range(min, max + 1, step)
]
def wrapper(*args, **kwargs):
requirement = duration(*args, **kwargs)
# find the function that satisfies the duration requirement
for t, func in funcs:
if t >= requirement:
gr.Info(f"Acquiring ZeroGPU for {t} seconds")
return func(*args, **kwargs)
# if no function is found, return the last one
gr.Info(f"Acquiring ZeroGPU for {funcs[-1][0]} seconds")
return funcs[-1][1](*args, **kwargs)
return wrapper
def dynGPU(
fn: Callable[P, R] | None = None,
duration: Callable[P, int] = lambda: 60,
min=30,
max=300,
step=10,
) -> Callable[P, R]:
if fn is None:
return partial(_dynGPU, duration=duration, min=min, max=max, step=step)
return _dynGPU(fn, duration, min, max, step)
It's very similar to the @spaces.GPU
decorator but accepts duration
as a function that shares the same parameters as the decorated one and returns the desired GPU time in seconds.
I have tested it in my space: https://huggingface.co/spaces/JacobLinCool/vocal-separation
The usage in my space requests GPU time based on the audio length:
def measure_duration(audio: str, model: str) -> int:
y, sr = librosa.load(audio, sr=44100)
return int(librosa.get_duration(y=y, sr=sr) / 3.0)
@dynGPU(duration=measure_duration)
def separate(audio: str, model: str) -> Tuple[str, str]:
separator = separators[model]
outs = separator.separate(audio)
outs = [os.path.join(tempfile.gettempdir(), out) for out in outs]
# roformers
if len(outs) == 2:
return outs[1], outs[0]
# demucs
if len(outs) == 4:
bgm = merge(outs[:3])
return outs[3], bgm
raise gr.Error("Unknown output format")
Which works well for me, and I think others may be interested in this.
This looks cool!
How you measure for text-generation let said using llama-cpp-python is basically by the weight of the file?
so curious ...
Thank you for sharing
I didn't try it on text-generation tasks yet. But I think that experiments are needed, and this largely depends on prior experiences.
The estimation will be on two aspects: model size and user input (e.g. duration for audio and prompt length for text generation).
Theoretically, you can calculate the FLOPs required by the model during computation, but I think the performance of hardware varies.
Very interesting this would improve how consume GPU giving a better exp for users using ZeroGPU
You can download the source distribution https://pypi.org/project/spaces/#files from the Download Files section of PyPI
Hi @JacobLinCool , thanks for your contribution!
spaces
package (and more specifically spaces.zero
sub-package) is not (yet ?) open-sourced but I'm happy to integrate "dynamic duration" in @spaces.GPU
Technically speaking, I think that we should be able to do it without needing to wrap one function per duration (thus we'll benefit from idle-reuse whatever the duration)
(if interested you can take a look at spaces.zero.client
to see how duration
ends up being used)
API-wise, I was thinking of something like:
def get_duration(prompt, steps):
return steps // 7
@spaces.GPU(duration=get_duration)
def generate(prompt, steps):
return pipe(prompt, num_inference_steps=steps)
Rule would be pretty simple:
If duration
kwarg is callable, then it will be called with the same *args
and **kwargs
than the current @spaces.GPU
decorated function call (just like in your dynGPU
version) and it should return a duration
I agree that creating a function for each duration is monkey patching-like.
After digging into the code inside the spaces
package, just like you said, one approach may be to calculate timedelta
with the user function before calling client.schedule
in generator_function_wrapper
.
Looking forward to it being integrated!
Dynamic duration is now available. Feel free to test it out!
(it's a power user feature for now but it will be in the README at some point)