metadata

license: mit

logo

XMAiNframe: A Large Language Model for Mainframe Modernization

Introduction

We are introducing XMAiNframe, a state-of-the-art large language model (LLM) specifically designed with knowledge of mainframe legacy systems and COBOL codebases. XMAiNframe is built on top of DeepSeek-Coder 7B and is available with 7B and 10.5B parameters. Additionally, we present MainframeBench, a comprehensive benchmark for assessing mainframe knowledge, including multiple-choice questions, question answering, and COBOL code summarization. Our empirical evaluations demonstrate that XMAiNframe consistently outperforms existing state-of-the-art LLMs across these tasks. Specifically, XMAiNframe achieves 30% higher accuracy than DeepSeek-Coder on multiple-choice questions, doubles the BLEU score of Mixtral-Instruct 8x7B on question answering, and scores six times higher than GPT-3.5 on COBOL summarization. Our work highlights the potential of XMAiNframe to drive significant advancements in managing and modernizing legacy systems, thereby enhancing productivity and saving time for software developers.

Model Versions

We release XMAiNframe with 7B and 10.5B parameters, including base and instruct models, to the public. XMAiNframe 10.5B is expanded from DeepSeek-Coder 7B by the depth up-scaling method without introducing additional modules or dynamic expert selection methods.

Model	Download
XMAiNframe-base-7b	🤗 HuggingFace
XMAiNframe-instruct-7b	🤗 HuggingFace
XMAiNframe-base-10.5b	🤗 HuggingFace
XMAiNframe-instruct-10.5b	🤗 HuggingFace

Quickstart

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Fsoft-AIC/XMAiNframe-instruct-7b")
model = AutoModelForCausalLM.from_pretrained("Fsoft-AIC/XMAiNframe-instruct-7b")
messages=[
    {'from':'system', 'value': "You are a helpful assistant"},
    {'from': 'human', 'value': 'What is the future of Mainframe?'}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
 
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

Additional Information

Other Resources:

Github: https://github.com/FSoft-AI4Code/XMainframe
Paper: https://arxiv.org/abs/2408.04660

License

MIT License

Citation Information

More details can be found in our paper.

If you're using XMAiNframe, please cite using this BibTeX:

@misc{dau2024xmainframelargelanguagemodel,
      title={XMainframe: A Large Language Model for Mainframe Modernization}, 
      author={Anh T. V. Dau and Hieu Trung Dao and Anh Tuan Nguyen and Hieu Trung Tran and Phong X. Nguyen and Nghi D. Q. Bui},
      year={2024},
      eprint={2408.04660},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.04660}, 
}

Contact us

If you have any questions, comments or suggestions, please do not hesitate to contact us.

Website: fpt-aicenter
Email: [email protected]