HuggingFaceH4
/

starchat2-15b-v0.1

@@ -1,5 +1,5 @@
 ---
-base_model: HuggingFaceH4/starcoder2-15b-ift
 tags:
 - alignment-handbook
 - generated_from_trainer
@@ -7,16 +7,92 @@ datasets:
 - HuggingFaceH4/ultrafeedback_binarized
 - HuggingFaceH4/orca_dpo_pairs
 model-index:
-- name: starcoder2-15b-dpo-v4.0
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# starcoder2-15b-dpo-v4.0
-This model is a fine-tuned version of [HuggingFaceH4/starcoder2-15b-ift](https://huggingface.co/HuggingFaceH4/starcoder2-15b-ift) on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/orca_dpo_pairs datasets.
 It achieves the following results on the evaluation set:
 - Loss: 0.4347
 - Rewards/chosen: -0.9461

 ---
+base_model: HuggingFaceH4/starchat2-15b-sft-v0.1
 tags:
 - alignment-handbook
 - generated_from_trainer
 - HuggingFaceH4/ultrafeedback_binarized
 - HuggingFaceH4/orca_dpo_pairs
 model-index:
+- name: starchat2-15b-v0.1
   results: []
 ---
+<img src="https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1/resolve/main/model_logo.png" alt="StarChat2 15B Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# Model Card for StarChat2 15B
+StarChat is a series of language models that are trained to act as helpful coding assistants. StarChat2 is the latest model in the series, and is a fine-tuned version of [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b) that was trained with SFT and DPO on a mix of synthetic datasets.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Model type:** A 16B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
+- **Language(s) (NLP):** Primarily English and 80+ programming languages.
+- **License:** BigCode Open RAIL-M v1
+- **Finetuned from model:** [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b)
+### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Repository:** https://github.com/huggingface/alignment-handbook
+- **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
+## Intended uses & limitations
+The model was fine-tuned on a blend of chat, code, math, and reasoning datasets. As a result, the model can be used for chat and you can check out our [demo](https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground) to test its coding capabilities.
+Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
+```python
+# pip install 'transformers @ git+https://github.com/huggingface/transformers.git@831bc25d8fdb85768402f772cf65cc3d7872b211'
+# pip install accelerate
+import torch
+from transformers import pipeline
+pipe = pipeline(
+    "text-generation",
+    model="HuggingFaceH4/starchat2-15b-v0.1",
+    device_map="auto",
+    torch_dtype=torch.bfloat16,
+)
+messages = [
+    {
+        "role": "system",
+        "content": "You are StarChat2, an expert programming assistant",
+    },
+    {"role": "user", "content": "Write a simple website in HTML. When a user clicks the button, it shows a random Chuck Norris joke."},
+]
+outputs = pipe(
+    messages,
+    max_new_tokens=512,
+    do_sample=True,
+    temperature=0.7,
+    top_k=50,
+    top_p=0.95,
+    stop_sequence="<|im_end|>",
+)
+print(outputs[0]["generated_text"][-1]["content"])
+```
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+StarChat2 15B has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
+Models trained primarily on code data will also have a more skewed demographic bias commensurate with the demographics of the GitHub community, for more on this see the [StarCoder2 dataset](https://huggingface.co/datasets/bigcode/the-stack-v2)
+Since the base model was pretrained on a large corpus of code, it may produce code snippets that are syntactically valid but semantically incorrect.
+For example, it may produce code that does not compile or that produces incorrect results.
+It may also produce code that is vulnerable to security exploits.
+We have observed the model also has a tendency to produce false URLs which should be carefully inspected before clicking.
+StarChat2 15B was fine-tuned from the base model [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b), please refer to its model card's [Limitations Section](https://huggingface.co/bigcode/starcoder2-15b#limitations) for relevant information.
+In particular, the model was evaluated on some categories of gender biases, propensity for toxicity, and risk of suggesting code completions with known security flaws; these evaluations are reported in its [technical report](https://huggingface.co/papers/2402.19173).
+## Training details
+This model is a fine-tuned version of [starchat2-15b-sft-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-sft-v0.1) on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/orca_dpo_pairs datasets.
 It achieves the following results on the evaluation set:
 - Loss: 0.4347
 - Rewards/chosen: -0.9461