Hosting using TGI : "ValueError: Unsupported model type phi3"
Tried to host phi-3 using TGI Server. But fails with the below error. Looks like the support to host phi-3 using TGI is still not available. Any leads?
I am using this TGI Server image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-hf-tgi-serve:20240220_0936_RC01
And trying to host in Kubernetes Cluster in Google Cloud.
Error:
2024-04-25T20:37:04.979957Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-25T20:37:04.981095Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-25T20:37:15.030108Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-25T20:37:25.039480Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-25T20:37:35.049359Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-25T20:37:37.784256Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 475, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type phi3
2024-04-25T20:37:38.656203Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 475, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type phi3
rank=0
2024-04-25T20:37:38.717605Z ERROR text_generation_launcher: Shard 0 failed to start
2024-04-25T20:37:38.717631Z INFO text_generation_launcher: Shutting down shards
Looking at the commit history, it doesn't appear that TGI supported Phi 3 until a couple days ago (when Phi 3 was released). TGI hasn't cut another release since then, so you'd need to grab an automated build, eg: docker pull ghcr.io/huggingface/text-generation-inference:sha-ee47973
Note: Phi-3-mini-4k-instruct on TGI works fine for me. Phi-3-mini-128k-instruct has some interesting rope factors and I haven't gotten it to work.
You can pass --trust-remote-code
when initializing the TGI container. By default, implementations fallback to transformers
if a model is not supported in TGI.
Dear @gugarosa ,
I am facing the same rope factor issue while starting TGI container for Phi-3-mini-128k-instruct
, surprisingly no issues spotted for Phi-3-mini-4k-instruct
, as mentioned by
@writerflether
, here is the error log:
tgi-container-1 | 2024-05-06T13:15:53.544574Z INFO download: text_generation_launcher: Successfully downloaded weights.
tgi-container-1 | 2024-05-06T13:15:53.544838Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
tgi-container-1 | 2024-05-06T13:15:55.366489Z ERROR text_generation_launcher: Error when initializing model
tgi-container-1 | Traceback (most recent call last):
tgi-container-1 | File "/opt/conda/bin/text-generation-server", line 8, in <module>
tgi-container-1 | sys.exit(app())
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
tgi-container-1 | return get_command(self)(*args, **kwargs)
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
tgi-container-1 | return self.main(*args, **kwargs)
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
tgi-container-1 | return _main(
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
tgi-container-1 | rv = self.invoke(ctx)
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
tgi-container-1 | return _process_result(sub_ctx.command.invoke(sub_ctx))
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
tgi-container-1 | return ctx.invoke(self.callback, **ctx.params)
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
tgi-container-1 | return __callback(*args, **kwargs)
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
tgi-container-1 | return callback(**use_params) # type: ignore
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
tgi-container-1 | server.serve(
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 253, in serve
tgi-container-1 | asyncio.run(
tgi-container-1 | File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
tgi-container-1 | return loop.run_until_complete(main)
tgi-container-1 | File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
tgi-container-1 | self.run_forever()
tgi-container-1 | File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
tgi-container-1 | self._run_once()
tgi-container-1 | File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
tgi-container-1 | handle._run()
tgi-container-1 | File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
tgi-container-1 | self._context.run(self._callback, *self._args)
tgi-container-1 | > File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 217, in serve_inner
tgi-container-1 | model = get_model(
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 333, in get_model
tgi-container-1 | return FlashLlama(
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 84, in __init__
tgi-container-1 | model = FlashLlamaForCausalLM(prefix, config, weights)
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 385, in __init__
tgi-container-1 | self.model = FlashLlamaModel(prefix, config, weights)
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 309, in __init__
tgi-container-1 | [
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 310, in <listcomp>
tgi-container-1 | FlashLlamaLayer(
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 249, in __init__
tgi-container-1 | self.self_attn = FlashLlamaAttention(
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 107, in __init__
tgi-container-1 | self.rotary_emb = PositionRotaryEmbedding.static(
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 1032, in static
tgi-container-1 | scaling_factor = rope_scaling["factor"]
tgi-container-1 | KeyError: 'factor'
tgi-container-1 |
tgi-container-1 | 2024-05-06T13:15:55.847905Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
tgi-container-1 |
tgi-container-1 | The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
tgi-container-1 | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
tgi-container-1 | /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
tgi-container-1 | warnings.warn(
tgi-container-1 | Traceback (most recent call last):
tgi-container-1 |
tgi-container-1 | File "/opt/conda/bin/text-generation-server", line 8, in <module>
tgi-container-1 | sys.exit(app())
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
tgi-container-1 | server.serve(
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 253, in serve
tgi-container-1 | asyncio.run(
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
tgi-container-1 | return loop.run_until_complete(main)
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
tgi-container-1 | return future.result()
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 217, in serve_inner
tgi-container-1 | model = get_model(
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 333, in get_model
tgi-container-1 | return FlashLlama(
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 84, in __init__
tgi-container-1 | model = FlashLlamaForCausalLM(prefix, config, weights)
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 385, in __init__
tgi-container-1 | self.model = FlashLlamaModel(prefix, config, weights)
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 309, in __init__
tgi-container-1 | [
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 310, in <listcomp>
tgi-container-1 | FlashLlamaLayer(
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 249, in __init__
tgi-container-1 | self.self_attn = FlashLlamaAttention(
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 107, in __init__
tgi-container-1 | self.rotary_emb = PositionRotaryEmbedding.static(
tgi-container-1 |
tgi-container-1 | File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 1032, in static
tgi-container-1 | scaling_factor = rope_scaling["factor"]
tgi-container-1 |
tgi-container-1 | KeyError: 'factor'
tgi-container-1 | rank=0
tgi-container-1 | Error: ShardCannotStart
tgi-container-1 | 2024-05-06T13:15:55.947161Z ERROR text_generation_launcher: Shard 0 failed to start
tgi-container-1 | 2024-05-06T13:15:55.947187Z INFO text_generation_launcher: Shutting down shards
tgi-container-1 exited with code 1
were you able to fix it?
i am getting similar errors
2024-05-22T02:27:07.993760Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-05-22T02:27:07.993970Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-05-22T02:27:11.749336Z WARN text_generation_launcher: Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2
2024-05-22T02:27:12.082919Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 240, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 201, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 661, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type phi3_v
2024-05-22T02:27:12.697919Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 240, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 201, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 661, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type phi3_v
rank=0
2024-05-22T02:27:12.797090Z ERROR text_generation_launcher: Shard 0 failed to start
2024-05-22T02:27:12.797112Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
how i am running
token=token
model=microsoft/Phi-3-vision-128k-instruct
volume=$PWD/phi3/data
docker run --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=$token -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-ee47973 --model-id $model