MU-NLPC
/

calc-baseline-t5-large

Text2Text Generation

Transformers

PyTorch

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

emnlp 2023 commited on Jun 26, 2023

Commit

3a78404

•

1 Parent(s): d80f6b2

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -153

README.md CHANGED Viewed

@@ -1,153 +0,0 @@
----
-# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
-# Doc / guide: https://huggingface.co/docs/hub/model-cards
-datasets:
-- emnlp2023/Calc-gsm8k
-- emnlp2023/Calc-aqua_rat
-- emnlp2023/Calc-math_qa
-- emnlp2023/Calc-ape210k
-metrics:
-- exact_match
-- rouge
-model-index:
-- name: calc-t5-large-baseline
-  results:
-  - task:
-      type: question-answering
-      name: Question Answering
-    dataset:
-      type: gsm8k
-      name: GSM8K
-      split: validation
-    metrics:
-    - type: exact_match
-      value: 0.420
-    - type: rouge
-      value: 0.627
-  - task:
-      type: question-answering
-      name: Question Answering
-    dataset:
-      type: aqua_rat
-      name: AQUA-RAT
-      split: validation
-    metrics:
-    - type: exact_match
-      value: 0.06
-    - type: rouge
-      value: 0.323
-license: apache-2.0
-language:
-- en
----
-# Model Card for calc-t5-large-baseline
-<!-- Provide a quick summary of what the model is/does. -->
-This model generates reasoning chains over mathematical questions while **using an external tool: Sympy calculator**.
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-With the idea to offload a symbolic reasoning from the stochastic language model,
-we train this model to utilize a calculator **for all applicable numeric operations**.
-This is achieved by training the model to construct calls to the tool's API in this format:
-```html
-<gadget id="calculator">100/2</gadget> <output>50</output>
-```
-where `<gadget>` segment triggers a call of the tool,
-which is subsequently served by extending model's decoder input context by adding the output of the tool within the `<output>` segment.
-- **Developed by:** Anonymous
-- **Model type:** Autoregressive Encoder-Decoder
-- **Language(s):** en
-- **Finetuned from:** google/calc-t5-large-baseline
-### Model Sources
-<!-- Provide the basic links for the model. -->
-- **Repository:** https://github.com/emnlp2023/gadgets
-- **Paper:** Stay tuned!
-## Usage
-Additionally to conventional generation, using Tool-augmented generation requires
-(1) implementation of the tool(s) and
-(2) a customization of generate() method augmenting input context on-demand with the outputs of the tools.
-You can find these two components implemented in the attached **gadget_assisted_model.py** and **gadget.py** in this model's repo
-and the project's [home repo](https://github.com/emnlp2023/gadgets).
-After adding these two scripts to your directory, you can use the model as follows:
-```python
-from gadget_assisted_model import GadgetAssistedModel
-from gadget import Calculator
-from transformers import T5ForConditionalGeneration, T5Tokenizer
-class GadgetAssistedT5(GadgetAssistedModel, T5ForConditionalGeneration):
-    # GadgetAssistedModel overrides the standard generate() from transformers
-    pass
-model = GadgetAssistedT5.from_pretrained("emnlp2023/calc-t5-large-baseline")
-tokenizer = T5Tokenizer.from_pretrained("emnlp2023/calc-t5-large-baseline")
-model.prepare_for_generate(tokenizer,
-                           enabled_gadgets=[Calculator()],
-                           default_max_tokens=512)
-query = """
-    The profit from a business transaction is shared among 2 business partners,
-    Mike and Johnson in the ratio 2:5 respectively.
-    If Johnson got $2500, how much will Mike have
-    after spending some of his share on a shirt that costs $200?
-"""
-inputs = tokenizer(query, return_tensors="pt")
-output_ids = model.generate(**inputs)
-tokenizer.decode(output_ids[0], spaces_between_special_tokens=False)
-```
-This returns:
-```html
-According to the ratio, Mike got 2/5*$2500 = $<gadget id="calculator">2/5*2500</gadget><output>1_000</output> 1000
-Mike will have $1000-$200 = $<gadget id="calculator">1000-200</gadget><output>800</output> 800 after buying a shirt.
-Final result is<result>800</result></s>
-```
-### Out-of-Scope Usage
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-Note that given the limited scope of the exercises' complexity in the training, this model will not work well for tasks requiring
-more complex algebraic operations, including equations, variables and operations outside the scope of (+-*/).
-## Training Details
-### Training Data
-<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-This model was trained on our Calculator-augmented set of [ape210k dataset github](https://github.com/Chenny0808/ape210k),
-[mathqa HF dataset](https://huggingface.co/datasets/math_qa),
-[gsm8k HF dataset](https://huggingface.co/datasets/gsm8k),
-[aqua_rat](https://huggingface.co/datasets/aqua_rat),
-in a standard auto-regressive setup i.e. for a conditional next-token prediction with teacher-forced prefix.
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-The model was fine-tuned from [google/calc-t5-large-baseline](https://huggingface.co/google/calc-t5-large-baseline) for TODO steps
-aiming to maximise exact-match ration on a validation split of the questions from [gsm8k dataset](https://huggingface.co/datasets/gsm8k).
-We fine-tune only TODO of the parameters finding that this circumvents overfitting to relatively small training dataset.
-The full training configuration can be identified from the [training script](https://github.com/emnlp2023/gadgets/blob/9185d1fc4b4812321179f8e5cad3e2f2a764f1df/examples/train_gsm8k_flan-t5-slice.py).