Ocr Correcteur v1
This model lora weight has been finetune on french OCR dataset. The architecture used is Flan T large. On a sample of 1000. More stong model is under cooks.
- Install dependencies
!pip install -q transformers accelerate peft diffusers
!pip install -U bitsandbytes
- Load and merge adaptaters in 8Bit (recommanded)
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer,BitsAndBytesConfig
# Load peft config for pre-trained checkpoint etc.
peft_model_id = "jeanflop/ocr_correcteur-v1"
config = PeftConfig.from_pretrained(peft_model_id)
# load base LLM model and tokenizer
peft_model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map={"":1})
peft_tokenizer = AutoTokenizer.from_pretrained('google/flan-t5-large')
# Load the Lora model
peft_model = PeftModel.from_pretrained(peft_model, peft_model_id, device_map={"":1})
# model.eval()
print("Peft model loaded")
- Run inference (recommanded)
Add your text
inputs=f"""
Fix text : {text}"""
Run
peft_model.config.max_length=512
peft_tokenizer.model_max_length=512
inputs = peft_tokenizer(inputs, return_tensors="pt")
outputs = peft_model.generate(**inputs,max_length=512)
answer = peft_tokenizer.decode(outputs[0])
from textwrap import fill
print(fill(answer, width=80))
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for jeanflop/ocr_correcteur-v1
Base model
google/flan-t5-large