PLDR-LLM-v5-DAG-2-110M

Model Description

PLDR-LLM-v5-DAG-2-110M is a large language model from power law decoder representations, which is a new language model architecture that utilizes power law graph attention to generate deductive and inductive outputs. This model has a parameter size of 110M. It refers to PLDRv5-DAG-2 whose architecture and training details are provided in Tables 1 and 2 of the research paper titled PLDR-LLM: Large Language Model from Power Law Decoder Representations.

Training data

PLDR-LLM-v5-DAG-2-110M was pretrained on the RefinedWeb, a publicly available English web dataset with extensive filtering and deduplication.

Training procedure

This model was trained for ~8B tokens on RefinedWeb over 250k steps per rank. It was trained autoregressively with cross-entropy loss and with DAG regularization on the deductive outputs.

Intended Use and Limitations

This model is intended to be used for research purposes. Given text as input prompt, it carries out next token prediction to generate continuation text. The context length for this model is 1024 tokens.

How to use

The tensorflow model checkpoint and tokenizer can be loaded into the PLDR-LLM framework to generate text as described in the code repository for training this model: LLM-from-Power-Law-Decoder-Representations.

LM Evaluation Harness Support

The keras model can be used with a fork of LM-Evaluation-Harness Suite with PLDR-LLM support: lm-evaluation-harness-with-PLDR-LLM.

Limitations and Biases

Large Language Models may generate text that is profane, lewd, socially unacceptable or offensive based on the contents of the dataset it was pretrained. RefinedWeb is a dataset that is as toxic and biased as the Pile. Please see the papers for RefinedWeb and the Pile for more information. Moreover, large language models are also susceptible to hallucinations and may generate text that contains incorrect, irrelevant or misleading information. Since it is very hard to expect the contents of generated text ahead of time, the output of the large language models need to be heavily moderated and curated to avoid undesired content to appear without warning.

Eval results

The evaluation results on benchmarks with zero-shot and few-shot setting and their comparison to LLM models of similar size reported in the literature can be found in Tables 3 and 4 of the PLDR-LLM paper.

BibTeX entry and citation info

Please cite this model as:

@misc{gokden2024pldrllm,
      title={PLDR-LLM: Large Language Model from Power Law Decoder Representations}, 
      author={Burc Gokden},
      year={2024},
      eprint={2410.16703},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.16703}, 
}

fromthesky
/

pldrllmv5-DAG-2-110M