--- language: - en widget: - text: > Bag-of-feature representations can be described by analogy to bag-of-words representations. - text: > Self-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. license: - mit pipeline_tag: text2text-generation datasets: - cnn_dailymail --- # Bart-Large Expansion Model ![Bart Logo](https://huggingface.co/front/assets/huggingface_logo.svg) This repository contains the **Bart-Large-paper2slides-expander Model**, which has been pre-trained on cnn-daily-mail dataset and fine-tuned on the [Automatic Slide Generation from Scientific Papers dataset](https://www.kaggle.com/datasets/andrewmvd/automatic-slide-generation-from-scientific-papers) using unsupervised learning techniques using an algorithm from the paper entitled '[Unsupervised Machine Translation Using Monolingual Corpora Only](https://arxiv.org/abs/1711.00043)'. Its primary focus is to expand the **scientific text** by providing alternative and expanded versions with improved clarity and accuracy. The model is parallelly trained with the [**Bart-Large-paper2slides-summarizer Model**](https://huggingface.co/com3dian/Bart-large-paper2slides-summarizer) from the same contributor. ## Model Details - **Model Architecture**: Bart-Large - **Fine-tuning Dataset**: [Automatic Slide Generation from Scientific Papers](https://www.kaggle.com/datasets/andrewmvd/automatic-slide-generation-from-scientific-papers) - **Fine-tuning Method**: Unsupervised Learning [Bart](https://huggingface.co/transformers/model_doc/bart.html) (Bidirectional and Auto-Regressive Transformers) is a sequence-to-sequence (seq2seq) model developed by Facebook AI Research. It has shown exceptional performance in various natural language processing (NLP) tasks such as text summarization, text generation, and machine translation. This particular model, Bart-Large, is the larger version of the Bart model. It consists of 12 encoder and decoder layers and has a total of 400 million parameters. ## Usage To use this model, you can leverage the Hugging Face [Transformers](https://huggingface.co/transformers/) library. Here's an example of how to use it in Python: ```python from transformers import BartTokenizer, BartForConditionalGeneration, pipeline # Load the model and tokenizer model_name = "com3dian/Bart-large-paper2slides-expander" tokenizer = BartTokenizer.from_pretrained(model_name) model = BartForConditionalGeneration.from_pretrained(model_name) # Generate summary from input text input_text = "Your input text here..." input_ids = tokenizer.encode(input_text, return_tensors="pt") output = model.generate(input_ids) # Decode generated summaries expanded_text = tokenizer.decode(output[0], skip_special_tokens=True) print(expanded_text) # Or using the pipeline API expander = pipeline("text2text-generation", model=model_name) expanded_text = expander(input_text, max_length=50, min_length=30, do_sample=False) print(expanded_text) ``` Ensure you have the `transformers` library installed before running the code. You can install it using `pip`: ``` pip install transformers ``` ## Model Fine-tuning Details The fine-tuning process for this model involved training on the slide generation dataset using unsupervised learning techniques. Unsupervised learning refers to training a model without explicit human-labeled targets. Instead, the model learns to back-expand the input provided by the summarization model, into the original texts. The specific hyperparameters and training details used for fine-tuning this model are as follows: - Batch Size: 4 - Learning Rate: 2e-6 - Training Steps: 3*7 - Optimizer: AdamW ## Acknowledgments We would like to acknowledge the authors of the Bart model and the creators of the slide generation dataset for their valuable contributions, which have enabled the development of this fine-tuned model. If you use this model or find it helpful in your work, please consider citing the original Bart model, the slide generation dataset, and [this paper](https://studenttheses.uu.nl/handle/20.500.12932/45939) to provide proper credit to the respective authors. ## License This model and the associated code are released under the [MIT license](https://opensource.org/license/mit/).