--- library_name: peft license: apache-2.0 base_model: meta-llama/Llama-2-7b-chat-hf language: - en metrics: - f1 --- ## Overview The model is a LoRa Adaptor based on Llama-2-7b-chat-hf. The model has been trained on a [re-annotated version](https://github.com/Teddy-Li/MulVOIEL/tree/master/CaRB/data) of the [CaRB dataset](https://github.com/dair-iitd/CaRB). The model produces multi-valent Open IE tuples, i.e. relations with various numbers of arguments (1, 2, or more). We provide an example below: Consider the following sentence (taken from the CaRB dev set): `Earlier this year , President Bush made a final `` take - it - or - leave it '' offer on the minimum wage` Our model would extract the following relation from the sentence: <President Bush, made, a final "take-it-or-leave-it" offer, on the minimum wage, earlier this year> where we include President Bush as the subject, made as the object, a final "take-it-or-leave-it" offer as thedirect object, and on the minimum wage and earlier this year> as salient _compl__ements_. We briefly describe how to use our model in the below, and provide further details in our [MulVOIEL repository on Github](https://github.com/Teddy-Li/MulVOIEL/) ## Getting Started ### Model Output Format Given a sentence, the model produces textual predictions in the following format: ` ,, ( ###) ,, ( ###) , ( ###) , ...` ### How to Use 1. Install the relevant libraries as well as the [MulVOIEL](https://github.com/Teddy-Li/MulVOIEL/) package: ```bash pip install transformers datasets peft torch git clone https://github.com/Teddy-Li/MulVOIEL cd MulVOIEL ``` 2. Load the model and perform inference (example): ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch from llamaOIE import parse_outstr_to_triples from llamaOIE_dataset import prepare_input base_model_name = "meta-llama/Llama-2-7b-chat-hf" peft_adapter_name = "Teddy487/LLaMA2-7b-for-OpenIE" model = AutoModelForCausalLM.from_pretrained(base_model_name) model = PeftModel.from_pretrained(model, peft_adapter_name) tokenizer = AutoTokenizer.from_pretrained(base_model_name) input_text = "Earlier this year , President Bush made a final `` take - it - or - leave it '' offer on the minimum wage" input_text, _ = prepare_input({'s': input_text}, tokenizer, has_labels=False) input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) outstr = tokenizer.decode(outputs[0][len(input_ids):], skip_special_tokens=True) triples = parse_outstr_to_triples(outstr) for tpl in triples: print(tpl) ``` 🍺 ## Model Performance The primary benefit of our model is the ability to extract finer-grained information for predicates. On the other hand, we also report performance on a roughly comparable basis with prior SOTA open IE models, where our method is comparable and even superior to prior models, while producing finer-grained and more complex outputs. We report evaluation results in (macro) F-1 metric, as well as in the average [Levenshtein Distance](https://pypi.org/project/python-Levenshtein/) between gold and predicted relations: | Model | Levenshtein Distance | Macro F-1 | | --- | --- | --- | | [LoRA LLaMA2-7b](https://huggingface.co/Teddy487/LLaMA2-7b-for-OpenIE) | 5.85 | 50.2 | | [LoRA LLaMA3-8b](https://huggingface.co/Teddy487/LLaMA3-8b-for-OpenIE) | **5.04** | **55.3** | | RNN OIE * | - | 49.0 | | IMOJIE * | - | 53.5 | | Open IE 6 * | - | 54.0/52.7 | Note that the precision and recall values are not directly comparable, because we evaluate the model prediction at a finer granularity, and we use different train/dev/test arrangements as the original CaRB dataset, hence the asterisk.