--- base_model: unsloth/llama-3.2-1b-instruct-bnb-4bit language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl --- # How to use? - We use Unsloth for faster inference and load the adapter: ```python from unsloth import FastLanguageModel max_seq_length = 8192 dtype = None load_in_4bit = True model, tokenizer = FastLanguageModel.from_pretrained( model_name = "patched-codes/Llama-3.2-1B-FastApply", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) FastLanguageModel.for_inference(model) # Enable native 2x faster inference ``` - The model works with original code and the edited code as input to generate the final updated code: ```python original_code = """import React from 'react'; import { Loader } from 'lucide-react'; interface ButtonProps { text: string; onClick?: () => void; loading?: boolean; disabled?: boolean; icon?: React.ReactNode; } const Button: React.FC = ({ text, onClick, loading = false, disabled = false, icon }) => ( ); export default Button; """ update_snippet = """interface ButtonProps { variant?: 'primary' | 'secondary' | 'danger'; size?: 'small' | 'medium' | 'large'; // ... other props } const Button: React.FC = ({ variant = 'primary', size = 'medium', // ... other props }) => ( ); """ ``` - Prepare your input following the prompt structure: ```python input_text = f""" Merge all changes from the snippet into the below. - Preserve the code's structure, order, comments, and indentation exactly. - Output only the updated code, enclosed within and tags. - Do not include any additional text, explanations, placeholders, ellipses, or code fences. {original_code} {update_snippet} Provide the complete updated code. """ messages = [ {"role": "system", "content": "You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated."}, {"role": "user", "content": input_text.strip()}, ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, # Must add for generation return_tensors = "pt", ).to("cuda") from transformers import TextStreamer text_streamer = TextStreamer(tokenizer, skip_prompt = True) output = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 8192, use_cache = True, temperature = 1.5, min_p = 0.1) response = tokenizer.decode(output[0][len(inputs[0]):]) updated_code = response.split("")[1].split("")[0] ``` # Uploaded model - **Developed by:** patched-codes - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3.2-1b-instruct-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)