Dolphinette / README.md
player1537's picture
Create README.md
840d0ea
metadata
datasets:
  - ehartford/dolphin
  - player1537/Bloom-560m-trained-on-Dolphin
language:
  - en
library_name: transformers
pipeline_tag: text-generation

Model Card for player1537/Dolphinette

Dolphinette is my latest attempt at creating a small LLM that is intended to run locally on ones own laptop or cell phone. I believe that the area of personalized LLMs will be one of the largest driving forces towards widespread LLM usage.

Dolphinette is a fine-tuned version of bigscience/bloom-560m, trained using the ehartford/dolphin dataset. The model was trained as a LoRA using this Google Colab notebook and then the LoRA was merged into the original model using this Google Colab notebook.

Uses

Dolphinette is trained to follow instructions and uses the following template:

<s>INSTRUCTION: You are an AI assistant that follows instruction extremely well. Help as much as you can. INPUT: Answer this question: what is the capital of France? OUTPUT:

More formally, this function was used:

def __text(datum: Dict[Any, Any]=None, /, **kwargs) -> str:
    r"""

    >>> __text({
    ...   "instruction": "Test instruction.",
    ...   "input": "Test input.",
    ...   "output": "Test output.",
    ... })
    '<s>INSTRUCTION: Test instruction. INPUT: Test input. OUTPUT: Test output.</s>'

    >>> __text({
    ...   "instruction": "Test instruction.",
    ...   "input": "Test input.",
    ...   "output": None,
    ... })
    '<s>INSTRUCTION: Test instruction. INPUT: Test input. OUTPUT:'

    """

    if datum is None:
        datum = kwargs

    return (
        f"""<s>"""
        f"""INSTRUCTION: {datum['instruction']} """
        f"""INPUT: {datum['input']} """
        f"""OUTPUT: {datum['output']}</s>"""
    ) if datum.get('output', None) is not None else (
        f"""<s>"""
        f"""INSTRUCTION: {datum['instruction']} """
        f"""INPUT: {datum['input']} """
        f"""OUTPUT:"""
    )

From the original training set, the set of instructions and how many times they appeared is as follows.

  • 165175: You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.
  • 136285: You are a helpful assistant, who always provide explanation. Think like you are answering to a five year old.
  • 110127: You are an AI assistant. You will be given a task. You must generate a detailed and long answer.
  • 63267: (nothing)
  • 57303: You are an AI assistant that follows instruction extremely well. Help as much as you can.
  • 51266: You are an AI assistant. Provide a detailed answer so user don’t need to search outside to understand the answer.
  • 19146: You are an AI assistant that helps people find information.
  • 18008: You are an AI assistant that helps people find information. User will you give you a question. Your task is to answer as faithfully as you can. While answering think step-bystep and justify your answer.
  • 17181: You are an AI assistant that helps people find information. Provide a detailed answer so user don’t need to search outside to understand the answer.
  • 9938: You should describe the task and explain your answer. While answering a multiple choice question, first output the correct answer(s). Then explain why other answers are wrong. Think like you are answering to a five year old.
  • 8730: You are an AI assistant. You should describe the task and explain your answer. While answering a multiple choice question, first output the correct answer(s). Then explain why other answers are wrong. You might need to use additional knowledge to answer the question.
  • 8599: Explain how you used the definition to come up with the answer.
  • 8459: User will you give you a task with some instruction. Your job is follow the instructions as faithfully as you can. While answering think step-by-step and justify your answer.
  • 7401: You are an AI assistant, who knows every language and how to translate one language to another. Given a task, you explain in simple steps what the task is asking, any guidelines that it provides. You solve the task and show how you used the guidelines to solve the task.
  • 7212: You are a teacher. Given a task, you explain in simple steps what the task is asking, any guidelines it provides and how to use those guidelines to find the answer.
  • 6372: Given a definition of a task and a sample input, break the definition into small parts. Each of those parts will have some instruction. Explain their meaning by showing an example that meets the criteria in the instruction. Use the following format: Part # : a key part of the definition. Usage: Sample response that meets the criteria from the key part. Explain why you think it meets the criteria.
  • 55: You are an AI assistant. Provide a detailed answer so user don't need to search outside to understand the answer.

Direct Use

Using the huggingface transformers library, you can use this model simply as:

import transformers

model = transformers.AutoModelForCausalLM.from_pretrained(
    'player1537/Dolphinette',
)

tokenizer = transformers.AutoTokenizer.from_pretrained(
    'player1537/Dolphinette',
)

pipeline = transformers.pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
)

completion = pipeline(
    (
        r"""<s>INSTRUCTION: You are an AI assistant that helps people find"""
        r"""information. INPUT: Answer this question: what is the capital of"""
        r"""France? Be concise. OUTPUT:"""
    ),
    return_full_text=False,
    max_new_tokens=512,
)
completion = completion[0]['generated_text']

print(completion)
#=>  The capital of France is the city of Paris. It's located in the country of
#=>  France, which means it's a geographical location in Europe. It is
#=>  consistently called "La capitale de France" ("La capital de la France"),
#=>  its localization literally refers to theThiest city of France.
#=>  
#=> According to the English translation of the French, the capital is the place
#=> where people live for their livelihood or business. However, the actual
#=> location you are looking at is the capital of France, the city located in
#=> the center of the country along several important international routes.
#=>  
#=> The capital of France generally refers to one or a few urban locations that
#=> represent particular cities in Europe. Depending on your nationality or
#=> culture, refinements can be added to the name of the city, and the
#=> announcement can be 'tel Aviv', 'Edinburgh', 'Corinthus', 'Palace of Culture
#=> and Imperials' (a French title), 'Languedoc', `Paris' or 'Belfast'.
#=>  
#=> To be clear, the city of paris is the capital of France, and it is the
#=> geographical location of the city, not the city itself.
#=>  
#=> Conclusion: The capital of France is the city of Paris, which is the
#=> most-visited international destination in Europe.

This model is very wordy... But for less contrived tasks, I have found it to work well enough.