do you have onnx inference ?

#10
by zoldaten - opened

hi!
saw onnx models presented. do you have inference for them ?

BRIA AI org

@Xenova can you tell how to use onnx inference here?

This should do the trick:

import onnxruntime as ort
import numpy as np
from PIL import Image
import requests

image_size = (1024, 1024)

def transform_image(image):
    # Resize image
    image = image.resize(image_size)
    
    # Convert image to NumPy array and normalize to [0, 1]
    image_array = np.asarray(image, dtype=np.float32) / 255.0
    
    # Normalize with given mean and std
    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
    std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
    normalized_image = (image_array - mean) / std
    
    # Rearrange dimensions to match tensor format (C, H, W)
    transformed_image = np.transpose(normalized_image, (2, 0, 1))
    
    return np.expand_dims(transformed_image, axis=0)

# Load image from URL
url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/ryan-gosling.jpg'
image = Image.open(requests.get(url, stream=True).raw)
pixel_values = transform_image(image)

# wget https://huggingface.co/briaai/RMBG-2.0/resolve/main/onnx/model.onnx
session = ort.InferenceSession('model.onnx')
outputs = session.run(['alphas'], {'pixel_values': pixel_values})

mask = Image.fromarray((outputs[0].squeeze() * 255).astype(np.uint8))
image.putalpha(mask.resize(image.size))
image

image.png

2024-11-20 14:32:32.5115801 [W:onnxruntime:, execution_frame.cc:651 onnxruntime::ExecutionFrame::AllocateMLValueTensorPreAllocateBuffer] Shape mismatch attempting to re-use buffer. {1,24,24,1536} != {1,256,1536}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.

thanks, that works, but...

tume inference is the same ! i compared model_q4f16.onnx, model_uint8.onnx vs briaai/RMBG-2.0.
Onnx models got the same time and tend to increase it !
20 sec on CPU (standard) vs 37 sec (onnx).

More over i have error with onnxruntime==1.18.0, 1.19.2 (memory leak ?):
"[W:onnxruntime:, execution_frame.cc:660 AllocateMLValueTensorPreAllocateBuffer] Shape mismatch attempting to re-use buffer. {1,36,36,768} != {1,1024,768}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model."

Hmm, I didn't see those warnings for the fp32 model - do you also get it with that?

As for performance, q4 is typically slower than fp32 on CPU, but q8 should be faster.
Conversely, q4 should be faster on GPU, and q8 should be slower on GPU.
Would you like to run additional benchmarks?

sure. but where is q8 model ?

Sign up or log in to comment