zelalt commited on
Commit
6f4048e
1 Parent(s): 0ed453c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -4
README.md CHANGED
@@ -25,15 +25,48 @@ It achieves the following results on the evaluation set:
25
 
26
  ## Model description
27
 
28
- ### Sample Code
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ```python
 
31
  import torch
 
32
  from transformers import AutoModelForCausalLM, AutoTokenizer
33
 
34
- model = AutoModelForCausalLM.from_pretrained("zelalt/titletor-phi_1-5", trust_remote_code=True)
35
- tokenizer = AutoTokenizer.from_pretrained("zelalt/titletor-phi_1-5", trust_remote_code=True)
36
- inputs = tokenizer(f'''What is the title of this paper? ....[your pdf as text]\n\nAnswer: ''', return_tensors="pt", return_attention_mask=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  outputs = model.generate(**inputs,max_new_tokens=50, pad_token_id = tokenizer.eos_token_id, eos_token_id = tokenizer.eos_token_id)
38
  text = tokenizer.batch_decode(outputs)[0]
39
  print(text)
 
25
 
26
  ## Model description
27
 
28
+ ## Sample Code
29
 
30
+ ### Test Dataset
31
+ If you prefer, you can use test dataset from [zelalt/scientific-papers](https://huggingface.co/datasets/zelalt/scientific-papers)
32
+ or [zelalt/arxiv-papers](https://huggingface.co/datasets/zelalt/arxiv-papers) or read your pdf as text with PyPDF2.PdfReader then give this text to LLM with adding "What is the title of this paper?" prompt.
33
+
34
+ ```python
35
+ from datasets import load_dataset
36
+
37
+ test_dataset = load_dataset("zelalt/scientific-papers", split='train')
38
+ test_dataset = test_dataset.rename_column('full_text', 'text')
39
+
40
+ def formatting_prompts_func(example):
41
+ text = f"What is the title of this paper? {example['text'][:180]}\n\nAnswer: "
42
+ return {'text': text}
43
+
44
+ formatted_dataset = test_dataset.map(formatting_prompts_func)
45
+ ```
46
+
47
+ ### Inference
48
  ```python
49
+
50
  import torch
51
+ from peft import PeftModel, PeftConfig
52
  from transformers import AutoModelForCausalLM, AutoTokenizer
53
 
54
+ peft_model_id = "zelalt/titletor-phi_1-5"
55
+ config = PeftConfig.from_pretrained(peft_model_id)
56
+ model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
57
+ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path,trust_remote_code=True)
58
+ model = PeftModel.from_pretrained(model, peft_model_id)
59
+
60
+ #Put as string
61
+ inputs = tokenizer(f'''What is the title of this paper? ...[your pdf as text]..\n\nAnswer: ''', return_tensors="pt", return_attention_mask=False)
62
+ outputs = model.generate(**inputs,max_new_tokens=50, pad_token_id = tokenizer.eos_token_id, eos_token_id = tokenizer.eos_token_id)
63
+ text = tokenizer.batch_decode(outputs)[0]
64
+ print(text)
65
+ ```
66
+
67
+ ```python
68
+ #Put from dataset
69
+ inputs = tokenizer(f'''{formatted_dataset['text'][120]}''', return_tensors="pt", return_attention_mask=False)
70
  outputs = model.generate(**inputs,max_new_tokens=50, pad_token_id = tokenizer.eos_token_id, eos_token_id = tokenizer.eos_token_id)
71
  text = tokenizer.batch_decode(outputs)[0]
72
  print(text)