nielsr HF staff commited on
Commit
6677612
1 Parent(s): 50df240

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -74
README.md CHANGED
@@ -1,75 +1,76 @@
1
- ---
2
- language: en
3
- tags:
4
- - tapex
5
- license: mit
6
- ---
7
-
8
- # TAPEX (base-sized model)
9
-
10
- TAPEX was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
11
-
12
- ## Model description
13
-
14
- TAPEX (**Ta**ble **P**re-training via **Ex**ecution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with *table reasoning* skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
15
-
16
- TAPEX is based on the BART architecture, the transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
17
-
18
- This model is the `tapex-base` model fine-tuned on the [WikiTableQuestions](https://huggingface.co/datasets/wikitablequestions) dataset.
19
-
20
- ## Intended Uses
21
-
22
- You can use the model for table question answering on *complex* questions. Some **solveable** questions are shown below (corresponding tables now shown):
23
-
24
- | Question | Answer |
25
- |:---: |:---:|
26
- | according to the table, what is the last title that spicy horse produced? | Akaneiro: Demon Hunters |
27
- | what is the difference in runners-up from coleraine academical institution and royal school dungannon? | 20 |
28
- | what were the first and last movies greenstreet acted in? | The Maltese Falcon, Malaya |
29
- | in which olympic games did arasay thondike not finish in the top 20? | 2012 |
30
- | which broadcaster hosted 3 titles but they had only 1 episode? | Channel 4 |
31
-
32
-
33
- ### How to Use
34
-
35
- Here is how to use this model in transformers:
36
-
37
- ```python
38
- from transformers import TapexTokenizer, BartForConditionalGeneration
39
- import pandas as pd
40
-
41
- tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-base-finetuned-wtq")
42
- model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-base-finetuned-wtq")
43
-
44
- data = {
45
- "year": [1896, 1900, 1904, 2004, 2008, 2012],
46
- "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
47
- }
48
- table = pd.DataFrame.from_dict(data)
49
-
50
- # tapex accepts uncased input since it is pre-trained on the uncased corpus
51
- query = "In which year did beijing host the Olympic Games?"
52
- encoding = tokenizer(table=table, query=query, return_tensors="pt")
53
-
54
- outputs = model.generate(**encoding)
55
-
56
- print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
57
- # [' 2008.0']
58
- ```
59
-
60
- ### How to Eval
61
-
62
- Please find the eval script [here](https://github.com/SivilTaram/transformers/tree/add_tapex_bis/examples/research_projects/tapex).
63
-
64
- ### BibTeX entry and citation info
65
-
66
- ```bibtex
67
- @inproceedings{
68
- liu2022tapex,
69
- title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
70
- author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
71
- booktitle={International Conference on Learning Representations},
72
- year={2022},
73
- url={https://openreview.net/forum?id=O50443AsCP}
74
- }
 
75
  ```
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - tapex
5
+ - table-question-answering
6
+ license: mit
7
+ ---
8
+
9
+ # TAPEX (base-sized model)
10
+
11
+ TAPEX was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
12
+
13
+ ## Model description
14
+
15
+ TAPEX (**Ta**ble **P**re-training via **Ex**ecution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with *table reasoning* skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
16
+
17
+ TAPEX is based on the BART architecture, the transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
18
+
19
+ This model is the `tapex-base` model fine-tuned on the [WikiTableQuestions](https://huggingface.co/datasets/wikitablequestions) dataset.
20
+
21
+ ## Intended Uses
22
+
23
+ You can use the model for table question answering on *complex* questions. Some **solveable** questions are shown below (corresponding tables now shown):
24
+
25
+ | Question | Answer |
26
+ |:---: |:---:|
27
+ | according to the table, what is the last title that spicy horse produced? | Akaneiro: Demon Hunters |
28
+ | what is the difference in runners-up from coleraine academical institution and royal school dungannon? | 20 |
29
+ | what were the first and last movies greenstreet acted in? | The Maltese Falcon, Malaya |
30
+ | in which olympic games did arasay thondike not finish in the top 20? | 2012 |
31
+ | which broadcaster hosted 3 titles but they had only 1 episode? | Channel 4 |
32
+
33
+
34
+ ### How to Use
35
+
36
+ Here is how to use this model in transformers:
37
+
38
+ ```python
39
+ from transformers import TapexTokenizer, BartForConditionalGeneration
40
+ import pandas as pd
41
+
42
+ tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-base-finetuned-wtq")
43
+ model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-base-finetuned-wtq")
44
+
45
+ data = {
46
+ "year": [1896, 1900, 1904, 2004, 2008, 2012],
47
+ "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
48
+ }
49
+ table = pd.DataFrame.from_dict(data)
50
+
51
+ # tapex accepts uncased input since it is pre-trained on the uncased corpus
52
+ query = "In which year did beijing host the Olympic Games?"
53
+ encoding = tokenizer(table=table, query=query, return_tensors="pt")
54
+
55
+ outputs = model.generate(**encoding)
56
+
57
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
58
+ # [' 2008.0']
59
+ ```
60
+
61
+ ### How to Eval
62
+
63
+ Please find the eval script [here](https://github.com/SivilTaram/transformers/tree/add_tapex_bis/examples/research_projects/tapex).
64
+
65
+ ### BibTeX entry and citation info
66
+
67
+ ```bibtex
68
+ @inproceedings{
69
+ liu2022tapex,
70
+ title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
71
+ author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
72
+ booktitle={International Conference on Learning Representations},
73
+ year={2022},
74
+ url={https://openreview.net/forum?id=O50443AsCP}
75
+ }
76
  ```