Aekanun commited on
Commit
a28e515
1 Parent(s): 11aa08b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -90
README.md CHANGED
@@ -18,93 +18,32 @@ pipeline_tag: image-to-text
18
 
19
  A LoRA-adapted vision-language model based on Llama-3.2-11B-Vision-Instruct that transcribes Thai handwritten text from images.
20
 
21
- ## Model Architecture
22
-
23
- - Base: Llama-3.2-11B-Vision-Instruct
24
-
25
- ## Inference
26
-
27
- ### Single Image
28
- ```python
29
- import torch
30
- from transformers import AutoModelForVision2Seq, AutoProcessor
31
- from peft import PeftModel
32
- from PIL import Image
33
-
34
- def load_model():
35
- # Model paths
36
- base_model_path = "meta-llama/Llama-3.2-11B-Vision-Instruct"
37
- adapter_path = "Aekanun/thai-handwriting-llm"
38
-
39
- # Load processor
40
- processor = AutoProcessor.from_pretrained(
41
- base_model_path,
42
- use_auth_token=True
43
- )
44
-
45
- # Load base model
46
- base_model = AutoModelForVision2Seq.from_pretrained(
47
- base_model_path,
48
- device_map="auto",
49
- torch_dtype=torch.float16,
50
- trust_remote_code=True,
51
- use_auth_token=True
52
- )
53
-
54
- # Load adapter
55
- model = PeftModel.from_pretrained(
56
- base_model,
57
- adapter_path,
58
- device_map="auto",
59
- torch_dtype=torch.float16,
60
- use_auth_token=True
61
- )
62
-
63
- return model, processor
64
-
65
- def transcribe_thai_handwriting(image_path, model, processor):
66
- # Load and prepare image
67
- image = Image.open(image_path)
68
-
69
- # Create prompt
70
- prompt = """Transcribe the Thai handwritten text from the provided image.
71
- Only return the transcription in Thai language."""
72
-
73
- # Prepare inputs
74
- messages = [
75
- {
76
- "role": "user",
77
- "content": [
78
- {"type": "text", "text": prompt},
79
- {"type": "image", "image": image}
80
- ],
81
- }
82
- ]
83
-
84
- # Process with model
85
- text = processor.apply_chat_template(messages, tokenize=False)
86
- inputs = processor(text=text, images=image, return_tensors="pt")
87
- inputs = {k: v.to(model.device) for k, v in inputs.items()}
88
-
89
- # Generate
90
- with torch.no_grad():
91
- outputs = model.generate(
92
- **inputs,
93
- max_new_tokens=512,
94
- do_sample=False,
95
- pad_token_id=processor.tokenizer.pad_token_id
96
- )
97
-
98
- # Decode output
99
- transcription = processor.decode(outputs[0], skip_special_tokens=True)
100
- return transcription.strip()
101
-
102
- # Example usage
103
- if __name__ == "__main__":
104
- # Load model
105
- model, processor = load_model()
106
-
107
- # Transcribe image
108
- image_path = "path/to/your/image.jpg"
109
- result = transcribe_thai_handwriting(image_path, model, processor)
110
- print(f"Transcription: {result}")
 
18
 
19
  A LoRA-adapted vision-language model based on Llama-3.2-11B-Vision-Instruct that transcribes Thai handwritten text from images.
20
 
21
+ ## Model Description
22
+ - Base Model: Llama-3.2-11B-Vision-Instruct
23
+ - Training Technique: LoRA adaptation
24
+ - Quantization: Supports 4-bit inference
25
+ - Dataset: iapp/thai_handwriting_dataset
26
+
27
+ ## Demo
28
+
29
+ Try the model via our web interface:
30
+ 🔗 [Thai-HandWriting-to-Text](https://huggingface.co/spaces/Aekanun/Thai-HandWriting-to-Text)
31
+
32
+ ### Features
33
+ - Supports both general handwriting and medical prescriptions
34
+ - Simple drag-and-drop interface
35
+ - Real-time text recognition
36
+ - No setup required
37
+
38
+ ### Example Use Cases
39
+ 1. General Thai handwriting transcription
40
+ 2. Medical prescription reading
41
+ 3. Handwritten document digitization
42
+
43
+ ## Limitations
44
+ - Designed specifically for Thai handwriting
45
+ - Performance may vary with image quality
46
+ - Requires clear handwriting for best results
47
+
48
+ ## License
49
+ This model is released under the Apache 2.0 license.