Edit model card

Phi-2 model fine-tuned for named entity recognition task

The model was fine-tuned using one quarter of the ConLL 2012 OntoNotes v5 dataset.

The prompts and expected outputs were constructed as described in [1].

Example input:

I am an excelent linquist. The task is to label location entities in the given sentence. Below are some examples

Input: Only France and Britain backed Fischler's proposal.
Output: Only @@France## and @@Britain## backed Fischler's proposal.

Input: Germany imported 47,000 sheeps from Britain last year, nearly half of total imports.
Output: @@Germany## imported 47,000 sheeps from @@Britain## last year, nearly half of total imports.

Input: It brought in 4275 tonnes of British mutton, some 10% of overall imports.
Output: It brought in 4275 tonnes of British mutton, some 10% of overall imports.

Input: China says Taiwan spoils atmosphere for talks.
Output: 

Expected output:

@@China## says @@Taiwan## spoils atmosphere for talks.

Model Trained Using AutoTrain

This model was trained using DPO AutoTrain trainer. For more information, please visit AutoTrain.

Hyperparameters:

{
    "model": "microsoft/phi-2",
    "train_split": "train",
    "valid_split": null,
    "add_eos_token": true,
    "block_size": 1024,
    "model_max_length": 2048,
    "padding": "right",
    "trainer": "dpo",
    "use_flash_attention_2": false,
    "log": "tensorboard",
    "disable_gradient_checkpointing": false,
    "logging_steps": -1,
    "evaluation_strategy": "epoch",
    "save_total_limit": 1,
    "save_strategy": "epoch",
    "auto_find_batch_size": false,
    "mixed_precision": "bf16",
    "lr": 3e-05,
    "epochs": 1,
    "batch_size": 2,
    "warmup_ratio": 0.05,
    "gradient_accumulation": 1,
    "optimizer": "adamw_torch",
    "scheduler": "linear",
    "weight_decay": 0.0,
    "max_grad_norm": 1.0,
    "seed": 42,
    "apply_chat_template": false,
    "quantization": "int4",
    "target_modules": "",
    "merge_adapter": false,
    "peft": true,
    "lora_r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "model_ref": null,
    "dpo_beta": 0.1,
}

Usage


from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "pahautelman/phi2-ner-dpo-v1"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path
).eval()

prompt = 'Label the person entities in the given sentence: Russian President Vladimir Putin is due to arrive in Havana a few hours from now to become the first post-Soviet leader to visit Cuba.'

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt')
outputs = model.generate(
    inputs.to(model.device),
    max_new_tokens=9,
    do_sample=False,
)
output = tokenizer.batch_decode(outputs)[0]

# Model response: "Answer: Russian President, Vladimir Putin"
print(output)

References:

[1] Wang et al., GPT-NER: Named entity recognition via large language models 2023

Downloads last month
7
Safetensors
Model size
1.56B params
Tensor type
F32
·
FP16
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pahautelman/phi2-ner-dpo-v1