philippelaban
commited on
Commit
•
5e23ecc
1
Parent(s):
3cfa2ed
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,43 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
# Try out in the Hosted inference API
|
5 |
+
|
6 |
+
In the right panel, you can try to the model (although it only handles a short sequence length).
|
7 |
+
Enter the document you want to summarize in the panel on the right.
|
8 |
+
|
9 |
+
# Model Loading
|
10 |
+
The model (based on a GPT2 base architecture) can be loaded in the following way:
|
11 |
+
```
|
12 |
+
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
|
13 |
+
|
14 |
+
model = GPT2LMHeadModel.from_pretrained("philippelaban/summary_loop46")
|
15 |
+
tokenizer = GPT2TokenizerFast.from_pretrained("philippelaban/summary_loop46")
|
16 |
+
```
|
17 |
+
|
18 |
+
# Example Use
|
19 |
+
```
|
20 |
+
document = "Bouncing Boulders Point to Quakes on Mars. A preponderance of boulder tracks on the red planet may be evidence of recent seismic activity. If a rock falls on Mars, and no one is there to see it, does it leave a trace? Yes, and it's a beautiful herringbone-like pattern, new research reveals. Scientists have now spotted thousands of tracks on the red planet created by tumbling boulders. Delicate chevron-shaped piles of Martian dust and sand frame the tracks, the team showed, and most fade over the course of a few years. Rockfalls have been spotted elsewhere in the solar system, including on the moon and even a comet. But a big open question is the timing of these processes on other worlds — are they ongoing or did they predominantly occur in the past?"
|
21 |
+
|
22 |
+
tokenized_document = tokenizer([document], max_length=300, truncation=True, return_tensors="pt")["input_ids"].cuda()
|
23 |
+
input_shape = tokenized_document.shape
|
24 |
+
outputs = model.generate(tokenized_document, do_sample=False, max_length=500, num_beams=4, num_return_sequences=4, no_repeat_ngram_size=6, return_dict_in_generate=True, output_scores=True)
|
25 |
+
candidate_sequences = outputs.sequences[:, input_shape[1]:] # Remove the encoded text, keep only the summary
|
26 |
+
candidate_scores = outputs.sequences_scores.tolist()
|
27 |
+
|
28 |
+
for candidate_tokens, score in zip(candidate_sequences, candidate_scores):
|
29 |
+
summary = tokenizer.decode(candidate_tokens)
|
30 |
+
print("[Score: %.3f] %s" % (score, summary[:summary.index("END")]))
|
31 |
+
```
|
32 |
+
|
33 |
+
# Example output
|
34 |
+
```
|
35 |
+
[Score: -0.059] The Kiribati government confirmed that 36 passengers on a flight that had come from Fiji on January 14 had all tested positive for the virus
|
36 |
+
[Score: -0.066] The Kiribati government confirmed that 36 passengers on a flight that had come from Fiji on January 14 had all tested positive for the virus
|
37 |
+
[Score: -0.073] The Kiribati government confirmed that 36 passengers on a flight that had come from Fiji on January 14 had all tested positive for the virus. Authorities had been managing the passengers from the time they entered into quarantine in Kiribati
|
38 |
+
[Score: -0.076] The Kiribati government confirmed that 36 passengers on a flight that had come from Fiji on January 14 had all tested positive for the virus. Authorities had taken all precautions and have been managing the passengers from Kiribati
|
39 |
+
```
|
40 |
+
|
41 |
+
# Github repo
|
42 |
+
|
43 |
+
You can access more information, access to the scoring function, the training script, or an example training log on the Github repo: https://github.com/CannyLab/summary_loop
|