Update README.md

1bff121 over 1 year ago

5.37 kB

	---
	license: apache-2.0
	language:
	- en
	---

	# GPT-NeoXT-Chat-Base-20B

	> TLDR: As part of OpenChatKit (codebase available [here](https://github.com/togethercomputer/OpenChaT)),
	> GPT-NeoXT-Chat-Base-20B is a 20B parameter language model, fine-tuned from EleutherAI’s GPT-NeoX with over 40 million instructions on 100% carbon negative compute.

	We base GPT-NeoXT-Chat-Base-20B on ElutherAI’s GPT-NeoX model, and fine-tune it with data focusing on dialog-style interactions.
	We focused the tuning on several tasks such as question answering, classification, extraction, and summarization.
	We’ve fine-tuned the model with a collection of 43 million high-quality instructions.
	Together partnered with LAION and Ontocord, who both helped curate the dataset the model is based on.
	You can read more about this process and the availability of this dataset in LAION’s blog post [here](...).

	## Model Details
	- Developed by: \[TODO\] Together Computer, LAION, Ontocord, ...
	- Model type: Language Model
	- Language(s): English
	- License: Apache 2.0
	- Model Description: A 20B parameter open source chat model, fine-tuned from EleutherAI’s NeoX with over 40M instructions on 100% carbon negative compute
	- Resources for more information: [GitHub Repository](https://github.com/togethercomputer/OpenChaT).

	## Examples
	\[TODO\] sync with the blog post

	## Training Examples

	The training data consists of pairs of human queries and corresponding bot responses, with human queries prefixed with <human>: and bot responses prefixed with <bot>:.
	An example of the data format is as follows:



	# Uses
	\[TODO\]

	## Direct Use
	\[TODO\]

	The model is intended for research purposes only. Possible research areas and tasks include

	- Safe deployment of models which have the potential to generate harmful content.
	- Probing and understanding the limitations and biases of generative models.
	- Generation of artworks and use in design and other artistic processes.
	- Applications in educational or creative tools.
	- Research on generative models.

	Excluded uses are described below.

	### Misuse, Malicious Use, and Out-of-Scope Use

	The OpenChatKit community provides GPT-NeoXT-Chat-Base-20B as an open source tool for building chatbots.
	The community is not responsible for any misuse, malicious use, or out-of-scope use of the model.
	It is the responsibility of the end user to ensure that the model is used in a responsible and ethical manner.

	#### Out-of-Scope Use

	GPT-NeoXT-Chat-Base-20B is designed for use in chatbot applications and may not perform well for other use cases outside of its intended scope.
	For example, it may not be suitable for use in safety-critical applications or for making decisions that have a significant impact on individuals or society.
	It is important to consider the limitations of the model and to only use it for its intended purpose.

	#### Misuse and Malicious Use

	GPT-NeoXT-Chat-Base-20B is designed for use in chatbot applications and should not be used for any other purpose.
	Misuse of the model, such as using it to engage in illegal or unethical activities, is strictly prohibited and goes against the principles of the OpenChatKit community project.

	Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:

	- Generating fake news, misinformation, or propaganda
	- Promoting hate speech, discrimination, or violence against individuals or groups
	- Impersonating individuals or organizations without their consent
	- Engaging in cyberbullying or harassment
	- Defamatory content
	- Spamming or scamming
	- Sharing confidential or sensitive information without proper authorization
	- Violating the terms of use of the model or the data used to train it
	- Creating automated bots for malicious purposes such as spreading malware, phishing scams, or spamming

	## Limitations

	GPT-NeoXT-Chat-Base-20B, like other language model-based chatbots, has limitations that should be taken into consideration.
	For example, the model may not always provide accurate or relevant answers, particularly for questions that are complex, ambiguous, or outside of its training data.
	We therefore welcome contributions from individuals and organizations, and encourage collaboration towards creating a more robust and inclusive chatbot.

	## Training

	Training Data
	\[TODO\]

	Training Procedure

	- Hardware: 2 x 8 x A100 GPUs
	- Optimizer: [8bit-AdamW](https://github.com/TimDettmers/bitsandbytes)
	- Gradient Accumulations: 2
	- Batch: 2 x 2 x 64 x 2048 = 524288 tokens
	- Learning rate: warmup to 1e-6 for 100 steps and then kept constant

	## Evaluation Results
	\[TODO\]

	## Environmental Impact
	\[TODO\]
	Stable Diffusion v1 Estimated Emissions
	Based on that information, we estimate the following CO2 emissions using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.

	- Hardware Type: A100 PCIe 40GB
	- Hours used: 200000
	- Cloud Provider: AWS
	- Compute Region: US-east
	- Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid): 15000 kg CO2 eq.