Model Card: Falconsai/florence-2-invoice

Developed by: Michael Stattelman for Falcons.ai
Funded by [optional]: Falcons.ai

Model Sources:

Repository: https://github.com/Falcons-ai/florence2_invoice_finetuning

Model Overview

Falconsai/florence-2-invoice is a fine-tuned version of the microsoft/Florence-2-base-ft model. This model has been specifically trained to identify and extract key fields from invoice images. The fine-tuning process utilized a curated dataset of invoices annotated to recognize the following fields:

Billing address, - Discount percentage, - Due date
Email client, - Header, - Invoice date
Invoice number, - Name client, - Products
Remise, - Shipping address, - Subtotal
Tax, - Tax percentage, - Tel client, - Total

Base Model

The base model used for fine-tuning is microsoft/Florence-2-base-ft, a state-of-the-art vision model developed by Microsoft.

Fine-tuning Configuration

The fine-tuning process was carried out using a Low-Rank Adaptation (LoRa) configuration with the following parameters:

LoraConfig(
    r=8,
    lora_alpha=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "linear", "Conv2d", "lm_head", "fc2"],
    task_type="CAUSAL_LM",
    lora_dropout=0.05,
    bias="none",
    inference_mode=False,
    use_rslora=True,
    init_lora_weights="gaussian",
    revision=REVISION
)

Hardware Used

The fine-tuning process was conducted on an Alienware system, ensuring robust performance and efficient training.

Dataset

The model was trained on a curated dataset of invoice images. Each invoice was annotated to identify the specific fields listed above. This dataset ensured that the model learned to accurately detect and extract key information from various invoice formats.

Usage

Inference

To use this model for inference, you can load it via the Hugging Face Transformers library:

import torch
from PIL import Image
from transformers import (
    AdamW,
    AutoModelForCausalLM,
    AutoProcessor,
    get_scheduler
)
def run_florence_invoice(img, task_prompt, text_input=None):
    image = Image.open(img)

    # Ensure the image is in RGB format
    if image.mode != "RGB":
        image = image.convert("RGB")
        
        model_id2 = "Falconsai/florence-2-invoice"
        model = AutoModelForCausalLM.from_pretrained(model_id2, trust_remote_code=True).eval().cuda()
        processor = AutoProcessor.from_pretrained(model_id2, trust_remote_code=True)

    with torch.no_grad():
        if text_input is None:
            prompt = task_prompt
        else:
            prompt = task_prompt + text_input
        inputs = processor(text=prompt, images=image, return_tensors="pt")
        generated_ids = model.generate(
        input_ids=inputs["input_ids"].cuda(),
        pixel_values=inputs["pixel_values"].cuda(),
        max_new_tokens=1024,
        num_beams=3
        )
        generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
        parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))

    del model
    del processor

    return parsed_answer

## Call the function as follows:
### Return all fields identified:
img = './invoice.png'
run_florence_invoice(img, '<OD>')

### Return Specific field
img = './invoice.png'
results = run_florence_invoice(img, "<CAPTION_TO_PHRASE_GROUNDING>", text_input="invoice date")

Applications

This model is ideal for automating the extraction of key information from invoices in various business and financial applications. It can significantly reduce the manual effort required for data entry and validation in accounting and bookkeeping processes.

Evaluation

The model has been evaluated on a held-out set of annotated invoice images. The evaluation metrics used included precision, recall, and F1-score for each of the identified fields. Detailed evaluation results and visualizations are available in the results directory of the repository.

Limitations

The model's performance is dependent on the quality and variability of the training dataset. It may not perform as well on invoices that significantly differ from those seen during training.
Fine-tuning was conducted with specific LoRa configurations, which may need to be adjusted for different use cases or datasets.

Contact

For more information or questions about this model, please contact the developers at [[email protected]].

License

This model is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgments

We would like to thank Microsoft for the development of the Florence2 vision model and the broader machine learning community for their contributions and support.