Spaces:

choltha
/

free-CPU-inference-for-testing

Paused

App Files Files Community

Christoph Holthaus commited on Dec 8, 2023

Commit

b8c846d

•

1 Parent(s): d65f135

switch over to gradio "native"

Browse files

Files changed (3) hide show

README.md +4 -8
gradio_app.py → app.py +2 -2
requirements.txt +2 -1

README.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 title: Test
 emoji: 🔥
-colorFrom: indigo
 colorTo: yellow
-sdk: docker
 pinned: false
-license: apache-2.0
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
@@ -14,17 +14,13 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
 This is a test ...
 TASKS:
-- for fast debug: Add a debug mode that enables me to run direct cli commands? -> Never for prod!
-- prod harden docker with proper users etc. OR mention this is only a dev build an intended for messing with, no readonly filesystem etc.
 - rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
 - write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
 - test multimodal with llama?
-- can i use swap in docker to maximize usable memory?
 - proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
-- maybe run as webserver locally and gradio only uses the webserver as backend? (better for async but maybe worse to control - just an idea)
 - check ho wmuch parallel generation is possible or only one que and set
 - move model to DL into env-var with proper error handling
-- chore: cleanup ignore, dockerfile etc.
 - update all deps to one up to date version, then PIN them!
 - make a short info on how to clone and run custom 7b models in separate spaces
 - make a pr for popular repos to include in their readme etc.

 ---
 title: Test
 emoji: 🔥
+colorFrom: red
 colorTo: yellow
+sdk: gradio
 pinned: false
+license: mit
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 This is a test ...
 TASKS:
 - rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
 - write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
 - test multimodal with llama?
 - proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
 - check ho wmuch parallel generation is possible or only one que and set
 - move model to DL into env-var with proper error handling
+- chore: cleanup ignore, etc.
 - update all deps to one up to date version, then PIN them!
 - make a short info on how to clone and run custom 7b models in separate spaces
 - make a pr for popular repos to include in their readme etc.

gradio_app.py → app.py RENAMED Viewed

@@ -5,8 +5,8 @@ import gradio as gr
 import psutil
 # Initing things
-print("! INITING LLAMA MODEL !")
-llm = Llama(model_path="./model.bin")                              # LLaMa model
 llama_model_name = "TheBloke/dolphin-2.2.1-AshhLimaRP-Mistral-7B-GGUF"
 print("! INITING DONE !")

 import psutil
 # Initing things
+print("debug: init model")
+llm = Llama(model_path="./model.bin")                             # LLaMa model
 llama_model_name = "TheBloke/dolphin-2.2.1-AshhLimaRP-Mistral-7B-GGUF"
 print("! INITING DONE !")

requirements.txt CHANGED Viewed

@@ -1,2 +1,3 @@
 psutil
-gradio

 psutil
+gradio
+llama_cpp