--- license: afl-3.0 language: - en widget: - text: "What's my name?" example_title: "Who am I?" - text: "How to make a campfire" example_title: "Tutorial" --- # Supervised Finetuning demonstration Models are finetuned on generated conversation curated from the [Open Assistant](https://github.com/LAION-AI/Open-Assistant). # Mixing reward model with sampling We can use reward model to rank the best answer using this example code: ``` import torch from transformers import AutoModelForSequenceClassification from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b-base-finetuned/checkpoint-1000") model = AutoModelForCausalLM.from_pretrained("facebook/galactica-1.3b-base-finetuned/checkpoint-1000").eval().half().cuda() reward_name = "theblackcat102/electra-large-reward-model" rank_model, rank_tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name) rank_model = rank_model.eval().half().cuda() questions = ["How do I make a resume?"] for question in questions: inputs = tokenizer(question, return_tensors="pt", padding=True).to(0) if 'token_type_ids' in inputs: inputs.pop('token_type_ids') outputs = model.generate(**inputs, do_sample=True, top_k=60, max_length=220, num_return_sequences=80, early_stopping=True ) print(question) results = [] for i, beam_output in enumerate(outputs): output = tokenizer.decode(beam_output, truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"]) question, answer = output.split('', maxsplit=1) answer = answer.split('')[0].replace('<|endoftext|>', '').lstrip().split('')[0] rank_inputs = rank_tokenizer(question, answer, return_tensors="pt", padding=True, max_length=512, truncation=True).to(1) score = rank_model(**rank_inputs).logits[0].cpu().detach() results.append((answer, score, output)) full_results[question] = results sorted_result = sorted(results, key=lambda x:x[1], reverse=True) total_scores += sorted_result[0][1].item() print('score',sorted_result[0][1].item()) print('-----Best rank-----') print(sorted_result[0][0]) print('-------------------') ``` Checkout weights and biases [report](https://api.wandb.ai/report/theblackcat102/8yg0c0r2) for training detail. Thanks to [BASIC lab](https://basiclab.lab.nycu.edu.tw/Yummy/index.html#) for compute resource. BASIC Lab is an academic research lab which focuses in multi-modality learning and data mining domain.