Chenxi Whitehouse
commited on
Commit
•
147d3e2
1
Parent(s):
90d6532
update file name
Browse files
README.md
CHANGED
@@ -87,13 +87,13 @@ bash script/scraper.sh <split> <start_idx> <end_idx>
|
|
87 |
|
88 |
### 2. Rank the sentences in the knowledge store with BM25
|
89 |
Then, we rank the scraped sentences for each claim using BM25 (based on the similarity to the claim), keeping the top 100 sentences per claim.
|
90 |
-
See [bm25_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/bm25_sentences.py) for more argument options. We provide the output file for this step on the dev set [here]().
|
91 |
```bash
|
92 |
python -m src.reranking.bm25_sentences
|
93 |
```
|
94 |
|
95 |
### 3. Generate questions-answer pair for the top sentences
|
96 |
-
We use [BLOOM](https://huggingface.co/bigscience/bloom-7b1) to generate QA paris for each of the top 100 sentence, providing 10 closest claim-QA-pairs from the training set as in-context examples. See [question_generation_top_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/question_generation_top_sentences.py) for more argument options. We provide the output file for this step on the dev set [here]().
|
97 |
```bash
|
98 |
python -m src.reranking.question_generation_top_sentences
|
99 |
```
|
|
|
87 |
|
88 |
### 2. Rank the sentences in the knowledge store with BM25
|
89 |
Then, we rank the scraped sentences for each claim using BM25 (based on the similarity to the claim), keeping the top 100 sentences per claim.
|
90 |
+
See [bm25_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/bm25_sentences.py) for more argument options. We provide the output file for this step on the dev set [here](https://huggingface.co/chenxwh/AVeriTeC/blob/main/data_store/dev_top_k_sentences.json).
|
91 |
```bash
|
92 |
python -m src.reranking.bm25_sentences
|
93 |
```
|
94 |
|
95 |
### 3. Generate questions-answer pair for the top sentences
|
96 |
+
We use [BLOOM](https://huggingface.co/bigscience/bloom-7b1) to generate QA paris for each of the top 100 sentence, providing 10 closest claim-QA-pairs from the training set as in-context examples. See [question_generation_top_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/question_generation_top_sentences.py) for more argument options. We provide the output file for this step on the dev set [here](https://huggingface.co/chenxwh/AVeriTeC/blob/main/data_store/ddev_top_k_qa.json).
|
97 |
```bash
|
98 |
python -m src.reranking.question_generation_top_sentences
|
99 |
```
|
src/reranking/question_generation_top_sentences.py
CHANGED
@@ -58,13 +58,13 @@ if __name__ == "__main__":
|
|
58 |
parser.add_argument(
|
59 |
"-i",
|
60 |
"--top_k_target_knowledge",
|
61 |
-
default="data_store/
|
62 |
help="Directory where the sentences for the scraped data is saved.",
|
63 |
)
|
64 |
parser.add_argument(
|
65 |
"-o",
|
66 |
"--output_questions",
|
67 |
-
default="data_store/
|
68 |
help="Directory where the sentences for the scraped data is saved.",
|
69 |
)
|
70 |
parser.add_argument(
|
|
|
58 |
parser.add_argument(
|
59 |
"-i",
|
60 |
"--top_k_target_knowledge",
|
61 |
+
default="data_store/dev_top_k_sentences.json",
|
62 |
help="Directory where the sentences for the scraped data is saved.",
|
63 |
)
|
64 |
parser.add_argument(
|
65 |
"-o",
|
66 |
"--output_questions",
|
67 |
+
default="data_store/dev_top_k_qa.json",
|
68 |
help="Directory where the sentences for the scraped data is saved.",
|
69 |
)
|
70 |
parser.add_argument(
|