wassemgtk commited on
Commit
08e3c90
1 Parent(s): 5ed2e8f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -3
README.md CHANGED
@@ -3,6 +3,7 @@ language:
3
  - en
4
  datasets:
5
  - English
 
6
  tags:
7
  - text generation
8
  - pytorch
@@ -12,13 +13,15 @@ tags:
12
  - NeMo
13
  pipeline_tag: text-generation
14
  library_name: transformers
 
15
  ---
16
 
17
- license: cc-by-4.0
18
 
19
 
20
  # Palmyra Large 20B
21
 
 
 
22
  <style>
23
  img {
24
  display: inline;
@@ -28,10 +31,37 @@ img {
28
  |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
29
 
30
 
31
- ## Model Description
32
 
33
  Palmyra Large was primarily pre-trained with English text. Note that there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra Large is a member of the same family of models that only contain a decoder. As a result, it was pre-trained utilizing the objective of self-supervised causal language modeling. Palmyra Large uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation per GPT-3.
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ### Use case
36
  Palmyra Large is extremely powerful while being extremely fast. This model excels at many nuanced tasks such as sentiment classification and summarization.
37
 
@@ -88,4 +118,6 @@ To cite this model:
88
  year = 2023,
89
  month = March
90
  }
91
- ```
 
 
 
3
  - en
4
  datasets:
5
  - English
6
+ - Writer/palmyra-data-index
7
  tags:
8
  - text generation
9
  - pytorch
 
13
  - NeMo
14
  pipeline_tag: text-generation
15
  library_name: transformers
16
+ license: apache-2.0
17
  ---
18
 
 
19
 
20
 
21
  # Palmyra Large 20B
22
 
23
+ **Palmyra-Large is a 20B parameters causal decoder-only model built by [Writer](https://www.Writer.com) and trained on +800B tokens of [Palmyra-Index-Data](https://huggingface.co/datasets/Writer/palmyra-data-index) enhanced with curated corpora.**
24
+
25
  <style>
26
  img {
27
  display: inline;
 
31
  |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
32
 
33
 
34
+ ## Model Details
35
 
36
  Palmyra Large was primarily pre-trained with English text. Note that there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra Large is a member of the same family of models that only contain a decoder. As a result, it was pre-trained utilizing the objective of self-supervised causal language modeling. Palmyra Large uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation per GPT-3.
37
 
38
+ ### Model Description
39
+
40
+ - **Developed by:** [https://www.writer.com](https://www.writer.com);
41
+ - **Model type:** Causal decoder-only;
42
+ - **Language(s) (NLP):** English (and limited capabilities in German, Spanish, French, Swedish);
43
+ - **License:** Apache 2.0 license.
44
+
45
+
46
+ ## Uses
47
+
48
+ ### Direct Use
49
+
50
+ Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
55
+
56
+ ## Bias, Risks, and Limitations
57
+
58
+ Palmyra-large-20B is trained mostly on English with limited capabilities also in German, Spanish, French, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
59
+
60
+ ### Recommendations
61
+
62
+ We recommend users of Palmyra-Large-20B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
63
+
64
+
65
  ### Use case
66
  Palmyra Large is extremely powerful while being extremely fast. This model excels at many nuanced tasks such as sentiment classification and summarization.
67
 
 
118
  year = 2023,
119
  month = March
120
  }
121
+ ```
122
+ ## Contact
123