mofosyne commited on
Commit
7585b58
1 Parent(s): 367a722

tiny llama llamafile generated and documented

Browse files
.args ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ -m
2
+ TinyLLama-v0-5M-F16.gguf
3
+ --host
4
+ 0.0.0.0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.gguf filter=lfs diff=lfs merge=lfs -text
37
+ *.llamafile filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ *.log
.gitmodules ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ [submodule "maykeye_tinyllama"]
2
+ path = maykeye_tinyllama
3
+ url = https://huggingface.co/Maykeye/TinyLLama-v0
4
+ [submodule "llama.cpp"]
5
+ path = llama.cpp
6
+ url = [email protected]:mofosyne/llama.cpp.git
README.md CHANGED
@@ -1,3 +1,84 @@
1
  ---
 
 
 
2
  license: apache-2.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: Maykeye/TinyLLama-v0
3
+ language:
4
+ - en
5
  license: apache-2.0
6
+ tags:
7
+ - llamafile
8
+ - model-conversion
9
+ - text-generation
10
+ - gguf
11
  ---
12
+
13
+ # TinyLLama-v0 - llamafile
14
+ - Model creator: [Maykeye](https://huggingface.co/Maykeye)
15
+ - Original model: [TinyLLama-v0](https://huggingface.co/Maykeye/TinyLLama-v0)
16
+
17
+ ## Description
18
+
19
+ This repo contains llamafile format model files for [Maykeye/TinyLLama-v0](https://huggingface.co/Maykeye/TinyLLama-v0) that is a recreation of [roneneldan/TinyStories-1M](https://huggingface.co/roneneldan/TinyStories-1M) which was part of this very interesting research paper called [TinyStories: How Small Can Language Models Be and Still Speak Coherent English?](https://arxiv.org/abs/2305.07759) by Ronen Eldan and Yuanzhi Li.
20
+
21
+ In the paper this is there abstract
22
+
23
+ > Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention).
24
+
25
+ > In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities.
26
+
27
+ > We also introduce a new paradigm for the evaluation of language models: We suggest a framework which uses GPT-4 to grade the content generated by these models as if those were stories written by students and graded by a (human) teacher. This new paradigm overcomes the flaws of standard benchmarks which often requires the model's output to be very structures, and moreover provides a multidimensional score for the model, providing scores for different capabilities such as grammar, creativity and consistency.
28
+
29
+ > We hope that TinyStories can facilitate the development, analysis and research of LMs, especially for low-resource or specialized domains, and shed light on the emergence of language capabilities in LMs.
30
+
31
+ Maykeye's replication effort while didn't get down to 1M parameters, Maykeye did get downt to 5M parameters which is still quite an acheivement (in so far as known replication effort has shown so far).
32
+
33
+ Anyway, this conversiont to llamafile should give you an easy way to give this model a shot and also of the whole llamafile ecosystem in general (as it's quite quite small compaired to other larger chat capable models). As a tradeoff however, this is more of a text generation model, so while it will open up a webserver as part of llamafile, it would not chat with you as expected. Instead you would give it a story prompt and it will generate a story for you. Don't expect any great stories for this size however, but it's an interesting demo on how small you can squeeze AI models and still have it generate recognisable english.
34
+
35
+ ## Usage In Linux
36
+
37
+ ```bash
38
+ # if not already usable
39
+ chmod +x TinyLLama-v0-5M-F16.llamafile
40
+
41
+ # To start the llamafile in web sever mode just call this directly
42
+ ./TinyLLama-v0-5M-F16.llamafile
43
+
44
+ # To start the llamafile in command line use this command
45
+ # (Seems like there is an issue with omitting the -m so for now putting it in. Issue ticket: https://github.com/Mozilla-Ocho/llamafile/issues/189)
46
+ ./TinyLLama-v0-5M-F16.llamafile --cli -m TinyLLama-v0-5M-F16.gguf -p "A dog and a cat"
47
+ ```
48
+
49
+ ## About llamafile
50
+
51
+ llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023. It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp binaries that run on the stock installs of six OSes for both ARM64 and AMD64.
52
+
53
+ ## Replication Steps Assumption
54
+
55
+ * You have already installed llamafile `/usr/local/bin/llamafile`
56
+ * You have already pulled in all the submodules including Maykeye's model in safe.tensor format
57
+ * Your git has LFS configured correctly or you get this issue https://github.com/ggerganov/llama.cpp/issues/1994 where `safe.tensor` doesn't download properly (and only a small pointer file is downloaded)
58
+ * You are using llama.cpp repo that has some extra changes to convert.py to support metadata import (for now it's pointed to my repo)
59
+
60
+ ## Replication Steps
61
+
62
+ ```bash
63
+ # Pull both the model folder and llama.cpp (for the conversion script)
64
+ git submodule update --init
65
+
66
+ # Convert from safetensor to gguf
67
+ # (Assuming llama.cpp is in the next folder)
68
+ ./llama.cpp/convert.py maykeye_tinyllama --metadata maykeye_tinyllama-metadata.json
69
+
70
+ # Copy the generated gguf to this folder
71
+ cp maykeye_tinyllama/TinyLLama-v0-5M-F16.gguf TinyLLama-v0-5M-F16.gguf
72
+
73
+ # Get the llamafile engine
74
+ cp /usr/local/bin/llamafile TinyLLama-v0-5M-F16.llamafile
75
+
76
+ # Combine
77
+ zipalign -j0 \
78
+ TinyLLama-v0-5M-F16.llamafile \
79
+ TinyLLama-v0-5M-F16.gguf \
80
+ .args
81
+
82
+ # Test
83
+ ./TinyLLama-v0-5M-F16.llamafile
84
+ ```
llama.cpp ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit 8f4412980b41ccdc164ff220bfcd564f2a4a86cb
llamafile-creation.sh ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/sh
2
+
3
+ # Pull both the model folder and llama.cpp (for the conversion script)
4
+ git submodule update --init
5
+
6
+ # Convert from safetensor to gguf
7
+ # (Assuming llama.cpp is in the next folder)
8
+ ./llama.cpp/convert.py maykeye_tinyllama --metadata maykeye_tinyllama-metadata.json
9
+
10
+ # Copy the generated gguf to this folder
11
+ cp maykeye_tinyllama/TinyLLama-v0-5M-F16.gguf TinyLLama-v0-5M-F16.gguf
12
+
13
+ # Get the llamafile engine
14
+ cp /usr/local/bin/llamafile TinyLLama-v0-5M-F16.llamafile
15
+
16
+ # Combine
17
+ zipalign -j0 \
18
+ TinyLLama-v0-5M-F16.llamafile \
19
+ TinyLLama-v0-5M-F16.gguf \
20
+ .args
21
+
22
+ # Test
23
+ ./TinyLLama-v0-5M-F16.llamafile
maykeye_tinyllama ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit 8c7ff07ec91bbe08ba42634a8611deb028a77896
maykeye_tinyllama-metadata.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "general.name": "TinyLLama",
3
+ "general.version": "v0",
4
+ "general.author": "mofosyne",
5
+ "general.url": "https://huggingface.co/mofosyne/TinyLLama-v0-llamafile",
6
+ "general.description": "This gguf is ported from a first version of Maykeye attempt at recreating roneneldan/TinyStories-1M but using Llama architecture",
7
+ "general.license": "apache-2.0",
8
+ "general.source_url": "https://huggingface.co/Maykeye/TinyLLama-v0",
9
+ "general.source_hf_repo": "https://huggingface.co/Maykeye/TinyLLama-v0"
10
+ }