Christoph Holthaus commited on
Commit
b8c846d
β€’
1 Parent(s): d65f135

switch over to gradio "native"

Browse files
Files changed (3) hide show
  1. README.md +4 -8
  2. gradio_app.py β†’ app.py +2 -2
  3. requirements.txt +2 -1
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
  title: Test
3
  emoji: πŸ”₯
4
- colorFrom: indigo
5
  colorTo: yellow
6
- sdk: docker
7
  pinned: false
8
- license: apache-2.0
9
  ---
10
 
11
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
@@ -14,17 +14,13 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
14
  This is a test ...
15
 
16
  TASKS:
17
- - for fast debug: Add a debug mode that enables me to run direct cli commands? -> Never for prod!
18
- - prod harden docker with proper users etc. OR mention this is only a dev build an intended for messing with, no readonly filesystem etc.
19
  - rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
20
  - write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
21
  - test multimodal with llama?
22
- - can i use swap in docker to maximize usable memory?
23
  - proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
24
- - maybe run as webserver locally and gradio only uses the webserver as backend? (better for async but maybe worse to control - just an idea)
25
  - check ho wmuch parallel generation is possible or only one que and set
26
  - move model to DL into env-var with proper error handling
27
- - chore: cleanup ignore, dockerfile etc.
28
  - update all deps to one up to date version, then PIN them!
29
  - make a short info on how to clone and run custom 7b models in separate spaces
30
  - make a pr for popular repos to include in their readme etc.
 
1
  ---
2
  title: Test
3
  emoji: πŸ”₯
4
+ colorFrom: red
5
  colorTo: yellow
6
+ sdk: gradio
7
  pinned: false
8
+ license: mit
9
  ---
10
 
11
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
14
  This is a test ...
15
 
16
  TASKS:
 
 
17
  - rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
18
  - write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
19
  - test multimodal with llama?
 
20
  - proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
 
21
  - check ho wmuch parallel generation is possible or only one que and set
22
  - move model to DL into env-var with proper error handling
23
+ - chore: cleanup ignore, etc.
24
  - update all deps to one up to date version, then PIN them!
25
  - make a short info on how to clone and run custom 7b models in separate spaces
26
  - make a pr for popular repos to include in their readme etc.
gradio_app.py β†’ app.py RENAMED
@@ -5,8 +5,8 @@ import gradio as gr
5
  import psutil
6
 
7
  # Initing things
8
- print("! INITING LLAMA MODEL !")
9
- llm = Llama(model_path="./model.bin") # LLaMa model
10
  llama_model_name = "TheBloke/dolphin-2.2.1-AshhLimaRP-Mistral-7B-GGUF"
11
  print("! INITING DONE !")
12
 
 
5
  import psutil
6
 
7
  # Initing things
8
+ print("debug: init model")
9
+ llm = Llama(model_path="./model.bin") # LLaMa model
10
  llama_model_name = "TheBloke/dolphin-2.2.1-AshhLimaRP-Mistral-7B-GGUF"
11
  print("! INITING DONE !")
12
 
requirements.txt CHANGED
@@ -1,2 +1,3 @@
1
  psutil
2
- gradio
 
 
1
  psutil
2
+ gradio
3
+ llama_cpp