Update README.md
Browse files
README.md
CHANGED
@@ -25,16 +25,12 @@ This model is a fine-tuned version of [microsoft/phi-1_5](https://huggingface.co
|
|
25 |
It achieves the following results on the evaluation set:
|
26 |
- Loss: 2.1587
|
27 |
|
28 |
-
## Model description
|
29 |
-
|
30 |
-
## Sample Code
|
31 |
-
|
32 |
### Requirements
|
33 |
```python
|
34 |
!pip install accelerate transformers einops datasets peft bitsandbytes
|
35 |
```
|
36 |
|
37 |
-
|
38 |
If you prefer, you can use test dataset from [zelalt/scientific-papers](https://huggingface.co/datasets/zelalt/scientific-papers)
|
39 |
or [zelalt/arxiv-papers](https://huggingface.co/datasets/zelalt/arxiv-papers) or read your pdf as text with PyPDF2.PdfReader then give this text to LLM with adding "What is the title of this paper?" prompt.
|
40 |
|
@@ -44,14 +40,14 @@ from datasets import load_dataset
|
|
44 |
test_dataset = load_dataset("zelalt/scientific-papers", split='train')
|
45 |
test_dataset = test_dataset.rename_column('full_text', 'text')
|
46 |
|
47 |
-
def
|
48 |
text = f"What is the title of this paper? {example['text'][:180]}\n\nAnswer: "
|
49 |
return {'text': text}
|
50 |
|
51 |
-
formatted_dataset = test_dataset.map(
|
52 |
```
|
53 |
|
54 |
-
###
|
55 |
```python
|
56 |
|
57 |
import torch
|
@@ -79,17 +75,18 @@ text = tokenizer.batch_decode(outputs)[0]
|
|
79 |
print(text)
|
80 |
```
|
81 |
|
82 |
-
|
|
|
83 |
|
84 |
### Output
|
85 |
Input:
|
86 |
-
```
|
87 |
What is the title of this paper? Bursting Dynamics of the 3D Euler Equations\nin Cylindrical Domains\nFrançois Golse ∗ †\nEcole Polytechnique, CMLS\n91128 Palaiseau Cedex, France\nAlex Mahalov ‡and Basil Nicolaenko §\n\nAnswer:
|
88 |
```
|
89 |
|
90 |
## Output from LLM:
|
91 |
|
92 |
-
```
|
93 |
What is the title of this paper? Bursting Dynamics of the 3D Euler Equations
|
94 |
in Cylindrical Domains
|
95 |
François Golse ∗ †
|
|
|
25 |
It achieves the following results on the evaluation set:
|
26 |
- Loss: 2.1587
|
27 |
|
|
|
|
|
|
|
|
|
28 |
### Requirements
|
29 |
```python
|
30 |
!pip install accelerate transformers einops datasets peft bitsandbytes
|
31 |
```
|
32 |
|
33 |
+
## Test Dataset
|
34 |
If you prefer, you can use test dataset from [zelalt/scientific-papers](https://huggingface.co/datasets/zelalt/scientific-papers)
|
35 |
or [zelalt/arxiv-papers](https://huggingface.co/datasets/zelalt/arxiv-papers) or read your pdf as text with PyPDF2.PdfReader then give this text to LLM with adding "What is the title of this paper?" prompt.
|
36 |
|
|
|
40 |
test_dataset = load_dataset("zelalt/scientific-papers", split='train')
|
41 |
test_dataset = test_dataset.rename_column('full_text', 'text')
|
42 |
|
43 |
+
def formatting(example):
|
44 |
text = f"What is the title of this paper? {example['text'][:180]}\n\nAnswer: "
|
45 |
return {'text': text}
|
46 |
|
47 |
+
formatted_dataset = test_dataset.map(formatting)
|
48 |
```
|
49 |
|
50 |
+
### Sample Code
|
51 |
```python
|
52 |
|
53 |
import torch
|
|
|
75 |
print(text)
|
76 |
```
|
77 |
|
78 |
+
**Notes**
|
79 |
+
- After running it for the first time and loading the model and tokenizer, you can only run generating part to avoid RAM crash.
|
80 |
|
81 |
### Output
|
82 |
Input:
|
83 |
+
```markdown
|
84 |
What is the title of this paper? Bursting Dynamics of the 3D Euler Equations\nin Cylindrical Domains\nFrançois Golse ∗ †\nEcole Polytechnique, CMLS\n91128 Palaiseau Cedex, France\nAlex Mahalov ‡and Basil Nicolaenko §\n\nAnswer:
|
85 |
```
|
86 |
|
87 |
## Output from LLM:
|
88 |
|
89 |
+
```markdown
|
90 |
What is the title of this paper? Bursting Dynamics of the 3D Euler Equations
|
91 |
in Cylindrical Domains
|
92 |
François Golse ∗ †
|