TGI error sliding window null
#1
by
sd3ntato
- opened
$ docker run \
> --gpus all \
> --shm-size 1g \
> -v ~/models:/data \
> ghcr.io/huggingface/text-generation-inference:1.3.0 \
> --model-id TheBloke/mixtral-8x7b-v0.1-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
2024-01-11T11:11:54.309311Z INFO text_generation_launcher: Args { model_id: "TheBloke/mixtral-8x7b-v0.1-AWQ", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Awq), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 3696, max_total_tokens: 4096, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "b1a6fd99478a", port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-01-11T11:11:54.309424Z INFO download: text_generation_launcher: Starting download process.
2024-01-11T11:11:57.699491Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-01-11T11:11:58.313113Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-01-11T11:11:58.313354Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-01-11T11:12:02.347372Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 215, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 161, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 310, in get_model
return FlashMixtral(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mixtral.py", line 21, in __init__
super(FlashMixtral, self).__init__(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 318, in __init__
SLIDING_WINDOW_BLOCKS = math.ceil(config.sliding_window / BLOCK_SIZE)
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'