Burc Gokden
commited on
Commit
•
5a550ce
1
Parent(s):
d804801
Initial commit
Browse files
.gitattributes
CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.keras filter=lfs diff=lfs merge=lfs -text
|
37 |
+
*.data-* filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,63 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- text-generation
|
6 |
+
- large-language-model
|
7 |
+
- power-law-decoder-representations
|
8 |
+
- pldr-llm
|
9 |
+
- tensorflow
|
10 |
+
license: apache-2.0
|
11 |
+
datasets:
|
12 |
+
- tiiuae/falcon-refinedweb
|
13 |
+
---
|
14 |
+
|
15 |
+
# PLDR-LLM-v5-1-104M
|
16 |
+
|
17 |
+
## Model Description
|
18 |
+
|
19 |
+
PLDR-LLM-v5-1-104M is a large language model from power law decoder representations, which is a new language model architecture that utilizes power law graph attention to generate deductive and inductive outputs. This model has a parameter size of 104M. It refers to PLDRv5-1 whose architecture and training details are provided in Tables 1 and 2 of the research paper titled [PLDR-LLM: Large Language Model from Power Law Decoder Representations](https://arxiv.org/abs/2410.16703).
|
20 |
+
|
21 |
+
## Training data
|
22 |
+
|
23 |
+
PLDR-LLM-v5-1-104M was pretrained on the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a publicly available English web dataset with extensive filtering and deduplication.
|
24 |
+
|
25 |
+
## Training procedure
|
26 |
+
|
27 |
+
This model was trained for ~8B tokens on RefinedWeb over 250k steps per rank. It was trained autoregressively with cross-entropy loss and without DAG regularization on the deductive outputs.
|
28 |
+
|
29 |
+
## Intended Use and Limitations
|
30 |
+
|
31 |
+
This model is intended to be used for research purposes. Given text as input prompt, it carries out next token prediction to generate continuation text. The context length for this model is 1024 tokens.
|
32 |
+
|
33 |
+
### How to use
|
34 |
+
|
35 |
+
- The tensorflow model checkpoint and tokenizer can be loaded into the PLDR-LLM framework to generate text as described in the code repository for training this model: [LLM-from-Power-Law-Decoder-Representations](https://github.com/burcgokden/LLM-from-Power-Law-Decoder-Representations).
|
36 |
+
|
37 |
+
### LM Evaluation Harness Support
|
38 |
+
|
39 |
+
- The keras model can be used with a fork of LM-Evaluation-Harness Suite with PLDR-LLM support: [lm-evaluation-harness-with-PLDR-LLM](https://github.com/burcgokden/lm-evaluation-harness-with-PLDR-LLM).
|
40 |
+
|
41 |
+
### Limitations and Biases
|
42 |
+
|
43 |
+
Large Language Models may generate text that is profane, lewd, socially unacceptable or offensive based on the contents of the dataset it was pretrained. RefinedWeb is a dataset that is as toxic and biased as the Pile. Please see the papers for [RefinedWeb](https://arxiv.org/abs/2306.01116) and [the Pile](https://arxiv.org/pdf/2101.00027) for more information. Moreover, large language models are also susceptible to hallucinations and may generate text that contains incorrect, irrelevant or misleading information. Since it is very hard to expect the contents of generated text ahead of time, the output of the large language models need to be heavily moderated and curated to avoid undesired content to appear without warning.
|
44 |
+
|
45 |
+
## Eval results
|
46 |
+
|
47 |
+
The evaluation results on benchmarks with zero-shot and few-shot setting and their comparison to LLM models of similar size reported in the literature can be found in Tables 3 and 4 of the [PLDR-LLM paper](https://arxiv.org/abs/2410.16703).
|
48 |
+
|
49 |
+
### BibTeX entry and citation info
|
50 |
+
|
51 |
+
Please cite this model as:
|
52 |
+
|
53 |
+
```bibtex
|
54 |
+
@misc{gokden2024pldrllm,
|
55 |
+
title={PLDR-LLM: Large Language Model from Power Law Decoder Representations},
|
56 |
+
author={Burc Gokden},
|
57 |
+
year={2024},
|
58 |
+
eprint={2410.16703},
|
59 |
+
archivePrefix={arXiv},
|
60 |
+
primaryClass={cs.CL},
|
61 |
+
url={https://arxiv.org/abs/2410.16703},
|
62 |
+
}
|
63 |
+
```
|
pldrllmv5-1-104M.keras
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9967a3fe5492beae11527bb68883752c2c6e7b227f78b7b55c5ddc8299374d74
|
3 |
+
size 418760931
|
refinedweb-tokenizer-pldr-llm-paper.tar.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:64ad7731741e37d2df354c9827e24524a18da91f3ac06f214368a5c7331f7097
|
3 |
+
size 1842758
|
tf-checkpoint/pldrllmv5-1-104M.data-00000-of-00001
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f03d42a5785c83af49a0100650869b35b9af55e3787bffdd08018e9b915eadf3
|
3 |
+
size 417201808
|
tf-checkpoint/pldrllmv5-1-104M.index
ADDED
Binary file (65 kB). View file
|
|