error
Traceback (most recent call last):
File "c:\Users\jmes1\Downloads\molmo-7B-D-bnb-4bit.py", line 17, in
model = AutoModelForCausalLM.from_pretrained(repo_name, **arguments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 557, in from_pretrained
cls.register(config.class, model_class, exist_ok=True)
File "c:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 584, in register
raise ValueError(
ValueError: The model class you are passing has a config_class
attribute that is not consistent with the config class you passed (model has <class 'transformers_modules.allenai.Molmo-7B-D-0924.b72f6745657cddaf97041d88eb02b23756338219.config_molmo.MolmoConfig'> and you passed <class 'transformers_modules.cyan2k.molmo-7B-D-bnb-4bit.8b3b140bc14c05c77c30a112a939d8d2c4e0ee42.config_molmo.MolmoConfig'>. Fix one of those so they match!
can you help me?
Excellent job quantizing using BNB, thanks was looking for this. here's a sample script for @win10 . If it still doesn't work it might have something to do with the versions of libraries that you have installed.
SAMPLE SCRIPT
import sys
import os
from pathlib import Path
def set_cuda_paths():
venv_base = Path(sys.executable).parent.parent
nvidia_base_path = venv_base / 'Lib' / 'site-packages' / 'nvidia'
cuda_path = nvidia_base_path / 'cuda_runtime' / 'bin'
cublas_path = nvidia_base_path / 'cublas' / 'bin'
cudnn_path = nvidia_base_path / 'cudnn' / 'bin'
paths_to_add = [str(cuda_path), str(cublas_path), str(cudnn_path)]
env_vars = ['CUDA_PATH', 'CUDA_PATH_V12_1', 'PATH']
for env_var in env_vars:
current_value = os.environ.get(env_var, '')
new_value = os.pathsep.join(paths_to_add + [current_value] if current_value else paths_to_add)
os.environ[env_var] = new_value
set_cuda_paths()
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
model_path = r"D:\Scripts\bench_vision\cyan2k--molmo-7B-D-bnb-4bit"
class VisionModel:
def __init__(self):
self.model = None
self.processor = None
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
def initialize_model_and_processor(self):
self.processor = AutoProcessor.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype='auto',
device_map='auto'
)
self.model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype='auto',
device_map='auto'
)
def process_single_image(self, image_path):
image = Image.open(image_path)
# Ensure the image is in RGB format
if image.mode != "RGB":
image = image.convert("RGB")
text = "Describe this image in detail as possible but be succinct and don't repeat yourself."
inputs = self.processor.process(images=[image], text=text)
inputs = {k: v.to(self.device).unsqueeze(0) for k, v in inputs.items()}
output = self.model.generate_from_batch(
inputs,
GenerationConfig(max_new_tokens=500, stop_strings=["<|endoftext|>"]),
tokenizer=self.processor.tokenizer
)
generated_tokens = output[0, inputs['input_ids'].size(1):]
generated_text = self.processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(f"\nGenerated Text:\n{generated_text}\n")
if __name__ == "__main__":
image_path = r"D:\Scripts\bench_vision\IMG_140531.JPG"
vision_model = VisionModel()
vision_model.initialize_model_and_processor()
vision_model.process_single_image(image_path)
Moreover, I modified some of the source code; specifically image_preprocessing_molmo.py
. After you download all the files, replace the resize_and_pad
function with my custom version. This no longer uses tensorflow
and simply uses more commonly-used libraries. I did this because I ran into massive problems trying to install and run tensorflow
and it's massive dependencies:
CUSTOM resize_and_pad FUNCTION
def resize_and_pad(
image: np.ndarray,
desired_output_size: List[int],
resize_method: str = "bilinear",
pad_value: float = 0,
normalize: bool = True,
image_mean: Optional[List[float]] = OPENAI_CLIP_MEAN,
image_std: Optional[List[float]] = OPENAI_CLIP_STD,
) -> (np.ndarray, np.ndarray):
"""
Resize and pad the image to the desired output size.
Args:
image (np.ndarray): Input image as a NumPy array.
desired_output_size (List[int]): Desired output size as [height, width].
resize_method (str, optional): Resize interpolation method. Defaults to "bilinear".
pad_value (float, optional): Padding value. Defaults to 0.
normalize (bool, optional): Whether to normalize the image. Defaults to True.
image_mean (Optional[List[float]], optional): Mean for normalization. Defaults to OPENAI_CLIP_MEAN.
image_std (Optional[List[float]], optional): Standard deviation for normalization. Defaults to OPENAI_CLIP_STD.
Returns:
Tuple[np.ndarray, np.ndarray]: Resized and padded image, and image mask.
"""
desired_height, desired_width = desired_output_size
height, width = image.shape[:2]
# Calculate scaling factors and determine the scaling factor to maintain aspect ratio
scale_y = desired_height / height
scale_x = desired_width / width
scale = min(scale_x, scale_y)
scaled_height = int(height * scale)
scaled_width = int(width * scale)
# Convert the image to a PyTorch tensor and normalize to [0, 1]
image_tensor = torch.from_numpy(image).permute(2, 0, 1).float() / 255.0
# Define the interpolation mode
if resize_method.lower() == "bilinear":
interpolation = InterpolationMode.BILINEAR
elif resize_method.lower() == "nearest":
interpolation = InterpolationMode.NEAREST
elif resize_method.lower() == "bicubic":
interpolation = InterpolationMode.BICUBIC
elif resize_method.lower() == "lanczos":
interpolation = InterpolationMode.LANCZOS
else:
raise ValueError(f"Unsupported resize method: {resize_method}")
# Resize the image
resized_image = torchvision.transforms.Resize(
[scaled_height, scaled_width],
interpolation=interpolation,
antialias=True
)(image_tensor)
# Clip the image to ensure values are within [0, 1]
resized_image = torch.clamp(resized_image, 0.0, 1.0)
# Convert back to NumPy
resized_image_np = resized_image.permute(1, 2, 0).numpy()
# Calculate padding
top_pad = (desired_height - scaled_height) // 2
bottom_pad = desired_height - scaled_height - top_pad
left_pad = (desired_width - scaled_width) // 2
right_pad = desired_width - scaled_width - left_pad
# Pad the image using NumPy
padded_image = np.pad(
resized_image_np,
pad_width=((top_pad, bottom_pad), (left_pad, right_pad), (0, 0)),
mode='constant',
constant_values=pad_value
)
# Create the image mask
image_mask = np.pad(
np.ones((scaled_height, scaled_width), dtype=bool),
pad_width=((top_pad, bottom_pad), (left_pad, right_pad)),
mode='constant',
constant_values=False
)
if normalize:
padded_image = normalize_image(padded_image, offset=image_mean, scale=image_std)
return padded_image, image_mask
I have verified that the above script works as long as you modify the source code as I outlined.
I am getting this error: PS C:\Users\15023\Documents\Models\FD> & c:/Users/15023/Documents/Models/GPD/.venv/Scripts/python.exe c:/Users/15023/Documents/Models/D/molmo_test.py
2024-09-29 18:03:15.863138: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-09-29 18:03:16.651278: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:03<00:00, 1.66s/it]
2024-09-29 18:03:24.165424: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "c:\Users\15023\Documents\Models\FPD\molmo_test.py", line 33, in
attention_mask=inputs["attention_mask"],
~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'attention_mask'