# A MBARTHEZ MODEL TRAINED FOR QUESTION GENERATION ## Training The model has been trained on different french and english corpus (FQuAD, PIAF and SQuAD) ## Generate ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Getting the data access_token = "hf_......" tokenizer = AutoTokenizer.from_pretrained("ThomasGerald/MBARTHEZ-QG", use_auth_token=access_token) model = AutoModelForSeq2SeqLM.from_pretrained("ThomasGerald/MBARTHEZ-QG", use_auth_token=access_token) # text input exemple notice we use the token to delimite the support of the question text = ("La recherche moderne considère généralement que la langue grecque n'est pas née en Grèce," + "mais elle n'est pas arrivée à un consensus quant à la date d'arrivée des groupes parlant un "+ "« proto-grec », qui s'est produite durant des phases préhistoriques pour lesquelles il n'y a"+ "pas de texte indiquant quelles langues étaient parlées. Les premiers textes écrits en grec sont"+ "les tablettes en linéaire B de l'époque mycénienne, au XIVe siècle av. J.-C., ce qui indique que"+ "des personnes parlant un dialecte grec sont présentes en Grèce au plus tard durant cette période."+ " La linguistique n'est pas en mesure de trancher, pas plus que l'archéologie.") tokenized_text = tokenizer([text], return_tensors="pt") # Output conditionnaly to the language (here two tokens possible '[fr_XX]' and '[en_XX]') output_ids = model.generate(**tokenized_text, forced_bos_token_id=tokenizer.convert_tokens_to_ids(['[fr_XX]'])) # Decoding output = tokenizer.batch_decode(output_ids, skip_special_tokens=False) # output: '''['[fr_XX] Quels sont les premiers textes écrits en grec?']''' ``` We can also generate question in english from french context by specifying the begining of sentence token ('[en_XX]'). Considering the previous code prepending the following one we can generate english questions executing : ``` python output_ids = model.generate(**tokenized_text, forced_bos_token_id=tokenizer.convert_tokens_to_ids(['[en_XX]'])) output = tokenizer.batch_decode(output_idsskip_special_tokens=False) # output: '''['[en_XX] What are the first texts written in grec?']''' ``` Of course you can also generate questions from english text : ``` python # text input exemple notice we use the token to delimite the support of the question text = ("By 371 BC, Thebes was in the ascendancy, defeating Sparta at" + "the Battle of Leuctra, killing the Spartan king Cleombrotus I" + ", and invading Laconia. Further Theban successes against Sparta" + "in 369 led to Messenia gaining independence; Sparta never recovered" + "from the loss of Messenia's fertile land and the helot workforce it" + "provided.[50] The rising power of Thebes led Sparta and Athens to join" + "forces; in 362 they were defeated by Thebes at the Battle of Mantinea." + " In the aftermath of Mantinea, none of the major Greek states were able" + "to dominate. Though Thebes had won the battle, their general Epaminondas" + "was killed, and they spent the following decades embroiled in wars with"+ "their neighbours; Athens, meanwhile, saw its second naval alliance," + " formed in 377, collapse in the mid-350s.") tokenized_text = tokenizer([text], return_tensors="pt") # French question output_ids = model.generate(**tokenized_text, forced_bos_token_id=tokenizer.convert_tokens_to_ids(['[fr_XX]'])) # Decoding output = tokenizer.batch_decode(output_ids, skip_special_tokens=False) # Notice it does not translate "Sparta" which is "Sparte" in french '''['[fr_XX] À quelle bataille Sparta a-t-il été vaincu par Thebes?']''' # English question output_ids = model.generate(**tokenized_text, forced_bos_token_id=tokenizer.convert_tokens_to_ids(['[en_XX]'])) # Decoding output = tokenizer.batch_decode(output_ids, skip_special_tokens=False) '''['[en_XX] At what battle did Thebes defeat Sparta?']''' ```