Unable to load in ollama built from PR branch
I built llama.cpp in ollama from the branch in the PR, yet whenever I load the IQ4_KM gguf file I get the "invalid file magic" error. Does what I have here look right for ollama on Windows 11?
PS C:\Users\Justin\Workspace\ollama> cd .\llm\llama.cpp\
PS C:\Users\Justin\Workspace\ollama\llm\llama.cpp> git status
HEAD detached at ea1aeba4
nothing to commit, working tree clean
PS C:\Users\Justin\Workspace\ollama\llm\llama.cpp> cd ..
PS C:\Users\Justin\Workspace\ollama\llm> cd ..
PS C:\Users\Justin\Workspace\ollama> $env:CGO_ENABLED="1"
PS C:\Users\Justin\Workspace\ollama> go generate ./...
Submodule path '../llama.cpp': checked out 'ea1aeba48b42f5f6ec41412ef271f5d708b0fa2f'
Updated 0 paths from the index
Checking for MinGW...
CommandType Name Version Source
----------- ---- ------- ------
Application gcc.exe 0.0.0.0 C:\ProgramData\mingw64\mingw64\bin\gcc.exe
Application mingw32-make.exe 0.0.0.0 C:\ProgramData\mingw64\mingw64\bin\mingw32-make.exe
Building static library
...
Generating build details from Git
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.44.0.windows.1")
build_info.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\common\build_info.dir\Release\build_info.lib
ggml.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\ggml.dir\Release\ggml.lib
llama.cpp
C:\Users\Justin\Workspace\ollama\llm\llama.cpp\llama.cpp(14181,9): warning C4297: 'llama_load_model_from_file': function assumed not to throw an exception but does [C:\Users\Jus
tin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\llama.vcxproj]
C:\Users\Justin\Workspace\ollama\llm\llama.cpp\llama.cpp(14181,9):
__declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function
Auto build dll exports
Creating library C:/Users/Justin/Workspace/ollama/llm/build/windows/amd64/cuda_v12.3/Release/llama.lib and object C:/Users/Justin/Workspace/ollama/llm/build/windows/amd64/c
uda_v12.3/Release/llama.exp
llama.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\bin\Release\llama.dll
llava.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\examples\llava\llava.dir\Release\llava.lib
common.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\common\Release\common.lib
Creating library C:/Users/Justin/Workspace/ollama/llm/build/windows/amd64/cuda_v12.3/ext_server/Release/ollama_llama_server.lib and object C:/Users/Justin/Workspace/ollama/
llm/build/windows/amd64/cuda_v12.3/ext_server/Release/ollama_llama_server.exp
ollama_llama_server.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\bin\Release\ollama_llama_server.exe
gzip not installed, not compressing files
Updated 1 path from the index
Updated 1 path from the index
go generate completed. LLM runners: cpu cpu_avx cpu_avx2 cuda_v12.3 cuda_v12.4
PS C:\Users\Justin\Workspace\ollama> go build .
PS C:\Users\Justin\Workspace\ollama> .\ollama.exe serve
time=2024-04-08T19:24:05.373-05:00 level=INFO source=images.go:793 msg="total blobs: 53"
time=2024-04-08T19:24:06.287-05:00 level=INFO source=images.go:800 msg="total unused blobs removed: 2"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
- using env: export GIN_MODE=release
- using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.EmbeddingsHandler (5 handlers)
[GIN-debug] POST /api/create --> github.com/ollama/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST /api/push --> github.com/ollama/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST /api/show --> github.com/ollama/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.ChatHandler (6 handlers)
[GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2024-04-08T19:24:06.288-05:00 level=INFO source=routes.go:1121 msg="Listening on [::]:11434 (version 0.0.0)"
time=2024-04-08T19:24:06.298-05:00 level=INFO source=payload.go:28 msg="extracting embedded files" dir=C:\Users\Justin\AppData\Local\Temp\ollama4257825441\runners
time=2024-04-08T19:24:06.329-05:00 level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v12.3]"
PS C:\Users\Justin\Workspace\ollama> .\ollama.exe create commandrplus -f C:\Users\Justin\Downloads\ModelFile
transferring model data
creating model layer
Error: invalid file magic
@gigq Not sure where the problem is, but you should take a look at this -> https://www.reddit.com/r/LocalLLaMA/comments/1bymeyw/command_r_plus_104b_working_with_ollama_using/
Thanks, yeah oddly with that compile of ollama and llama.cpp I'm able to load his model from https://ollama.com/sammcj/cohereforai_c4ai-command-r-plus but not yours. I was actually trying to load yours because while the sammcj one loads the output is a little off. Small prompts seem ok, but as soon as I have any real context I start getting repeating output from the LLM or even just garbage word list output. Just wanted to try another model to see if the issue was the model or my compile.
@gigq You might want to report this to sammcj, I see in the link you provided he used my matrix on his FP16 weights so maybe the issue is there? I always use the imatrix of the weights they have been trained on, but I'm not sure if this is a hard requirement.