tokenizer-dna-clm / README.md
gonzalobenegas's picture
Create README.md
9849233 verified
|
raw
history blame
245 Bytes
metadata
license: mit
tags:
  - biology
  - genomics
  - dna

Tokenizer for causal language modeling of DNA sequences

    "vocab": {
      "[PAD]": 0,
      "[UNK]": 1,
      "a": 2,
      "c": 3,
      "g": 4,
      "t": 5,
    },