|
/Users/cfruan/miniconda3/envs/mlc-chat-venv/bin/python -m mlc_llm gen_config /Users/Shared/models/Meta-Llama-3.1-70B-Instruct --quantization q0f16 --conv-template llama-3_1 --output local_dir/Llama-3.1-70B-Instruct-q0f16-MLC |
|
[2024-07-23 17:43:51] INFO auto_config.py:116: [92mFound[0m model configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json |
|
[2024-07-23 17:43:51] INFO auto_config.py:154: [92mFound[0m model type: [1mllama[0m. Use `--model-type` to override. |
|
[2024-07-23 17:43:51] INFO llama_model.py:62: [1mcontext_window_size[0m not found in config.json. Falling back to [1mmax_position_embeddings[0m (131072) |
|
[2024-07-23 17:43:51] INFO llama_model.py:82: [1mprefill_chunk_size[0m defaults to 2048 |
|
[2024-07-23 17:43:51] INFO config.py:107: Overriding [1mmax_batch_size[0m from 1 to 80 |
|
[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting [1mbos_token_id[0m: 128000 |
|
[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting [1meos_token_id[0m: [128001, 128008, 128009] |
|
[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting [1mtemperature[0m: 0.6 |
|
[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting [1mtop_p[0m: 0.9 |
|
[2024-07-23 17:43:51] INFO gen_config.py:158: [91mNot found[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer.model |
|
[2024-07-23 17:43:51] INFO gen_config.py:156: [92mFound[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer.json. Copying to [1mlocal_dir/Llama-3.1-70B-Instruct-q0f16-MLC/tokenizer.json[0m |
|
[2024-07-23 17:43:51] INFO gen_config.py:158: [91mNot found[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/vocab.json |
|
[2024-07-23 17:43:51] INFO gen_config.py:158: [91mNot found[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/merges.txt |
|
[2024-07-23 17:43:51] INFO gen_config.py:158: [91mNot found[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/added_tokens.json |
|
[2024-07-23 17:43:51] INFO gen_config.py:156: [92mFound[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer_config.json. Copying to [1mlocal_dir/Llama-3.1-70B-Instruct-q0f16-MLC/tokenizer_config.json[0m |
|
[2024-07-23 17:43:51] INFO gen_config.py:217: Detected tokenizer info: {'token_postproc_method': 'byte_level', 'prepend_space_in_encode': False, 'strip_space_in_decode': False} |
|
[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting [1mpad_token_id[0m: 0 |
|
[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting [1mpresence_penalty[0m: 0.0 |
|
[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting [1mfrequency_penalty[0m: 0.0 |
|
[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting [1mrepetition_penalty[0m: 1.0 |
|
[2024-07-23 17:43:51] INFO gen_config.py:245: Dumping configuration file to: [1mlocal_dir/Llama-3.1-70B-Instruct-q0f16-MLC/mlc-chat-config.json[0m |
|
/Users/cfruan/miniconda3/envs/mlc-chat-venv/bin/python -m mlc_llm convert_weight /Users/Shared/models/Meta-Llama-3.1-70B-Instruct --quantization q0f16 --output local_dir/Llama-3.1-70B-Instruct-q0f16-MLC |
|
[2024-07-23 17:43:52] INFO auto_config.py:116: [92mFound[0m model configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json |
|
[2024-07-23 17:43:52] INFO auto_device.py:88: [91mNot found[0m device: cuda:0 |
|
[2024-07-23 17:43:53] INFO auto_device.py:88: [91mNot found[0m device: rocm:0 |
|
[2024-07-23 17:43:54] INFO auto_device.py:79: [92mFound[0m device: metal:0 |
|
[2024-07-23 17:43:55] INFO auto_device.py:88: [91mNot found[0m device: vulkan:0 |
|
[2024-07-23 17:43:55] INFO auto_device.py:88: [91mNot found[0m device: opencl:0 |
|
[2024-07-23 17:43:55] INFO auto_device.py:35: Using device: [1mmetal:0[0m |
|
[2024-07-23 17:43:55] INFO auto_weight.py:71: Finding weights in: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct |
|
[2024-07-23 17:43:55] INFO auto_weight.py:137: [91mNot found[0m Huggingface PyTorch |
|
[2024-07-23 17:43:55] INFO auto_weight.py:144: [92mFound[0m source weight format: huggingface-safetensor. Source configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json |
|
[2024-07-23 17:43:55] INFO auto_weight.py:107: Using source weight configuration: [1m/Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json[0m. Use `--source` to override. |
|
[2024-07-23 17:43:55] INFO auto_weight.py:111: Using source weight format: [1mhuggingface-safetensor[0m. Use `--source-format` to override. |
|
[2024-07-23 17:43:55] INFO auto_config.py:154: [92mFound[0m model type: [1mllama[0m. Use `--model-type` to override. |
|
[2024-07-23 17:43:55] INFO llama_model.py:62: [1mcontext_window_size[0m not found in config.json. Falling back to [1mmax_position_embeddings[0m (131072) |
|
[2024-07-23 17:43:55] INFO llama_model.py:82: [1mprefill_chunk_size[0m defaults to 2048 |
|
[1mWeight conversion with arguments:[0m |
|
[1m--config[0m /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json |
|
[1m--quantization[0m NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16') |
|
[1m--model-type[0m llama |
|
[1m--device[0m metal:0 |
|
[1m--source[0m /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json |
|
[1m--source-format[0m huggingface-safetensor |
|
[1m--output[0m local_dir/Llama-3.1-70B-Instruct-q0f16-MLC |
|
Start storing to cache local_dir/Llama-3.1-70B-Instruct-q0f16-MLC |
|
0%| | 0/483 [00:00<?, ?it/s]
[2024-07-23 17:44:00] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00030-of-00030.safetensors |
|
0%| | 0/483 [00:00<?, ?it/s]
[2024-07-23 17:44:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mlm_head.weight[0m", shape: (128256, 8192), dtype: float16 |
|
0%| | 0/483 [00:04<?, ?it/s]
0%| | 1/483 [00:08<1:09:33, 8.66s/it]
[2024-07-23 17:44:09] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00030-of-00030.safetensors |
|
0%| | 1/483 [00:08<1:09:33, 8.66s/it]
[2024-07-23 17:44:09] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00001-of-00030.safetensors |
|
0%| | 1/483 [00:08<1:09:33, 8.66s/it]
[2024-07-23 17:44:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.embed_tokens.weight[0m", shape: (128256, 8192), dtype: float16 |
|
0%| | 1/483 [00:15<1:09:33, 8.66s/it]
0%| | 2/483 [00:19<1:20:25, 10.03s/it]
[2024-07-23 17:44:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
0%| | 2/483 [00:19<1:20:25, 10.03s/it]
1%| | 3/483 [00:19<44:16, 5.53s/it]
[2024-07-23 17:44:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
1%| | 3/483 [00:20<44:16, 5.53s/it]
1%| | 4/483 [00:21<31:43, 3.97s/it]
[2024-07-23 17:44:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
1%| | 4/483 [00:23<31:43, 3.97s/it]
1%| | 5/483 [00:25<30:36, 3.84s/it]
[2024-07-23 17:44:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
1%| | 5/483 [00:25<30:36, 3.84s/it]
1%| | 6/483 [00:25<20:27, 2.57s/it]
[2024-07-23 17:44:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
1%| | 6/483 [00:25<20:27, 2.57s/it]
1%|β | 7/483 [00:25<15:19, 1.93s/it]
[2024-07-23 17:44:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
1%|β | 7/483 [00:25<15:19, 1.93s/it]
2%|β | 8/483 [00:26<11:29, 1.45s/it]
[2024-07-23 17:44:26] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00002-of-00030.safetensors |
|
2%|β | 8/483 [00:26<11:29, 1.45s/it]
[2024-07-23 17:44:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
2%|β | 8/483 [00:32<11:29, 1.45s/it]
2%|β | 9/483 [00:34<27:20, 3.46s/it]
[2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
2%|β | 9/483 [00:34<27:20, 3.46s/it]
2%|β | 10/483 [00:34<20:33, 2.61s/it]
[2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
2%|β | 10/483 [00:34<20:33, 2.61s/it]
2%|β | 11/483 [00:35<15:14, 1.94s/it]
[2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
2%|β | 11/483 [00:35<15:14, 1.94s/it]
[2024-07-23 17:44:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
2%|β | 11/483 [00:35<15:14, 1.94s/it]
3%|β | 13/483 [00:36<10:53, 1.39s/it]
[2024-07-23 17:44:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
3%|β | 13/483 [00:36<10:53, 1.39s/it]
[2024-07-23 17:44:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
3%|β | 13/483 [00:36<10:53, 1.39s/it]
[2024-07-23 17:44:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
3%|β | 13/483 [00:37<10:53, 1.39s/it]
3%|β | 16/483 [00:38<07:23, 1.05it/s]
[2024-07-23 17:44:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
3%|β | 16/483 [00:39<07:23, 1.05it/s]
4%|β | 17/483 [00:41<11:34, 1.49s/it]
[2024-07-23 17:44:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
4%|β | 17/483 [00:42<11:34, 1.49s/it]
4%|β | 18/483 [00:42<09:15, 1.19s/it]
[2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
4%|β | 18/483 [00:42<09:15, 1.19s/it]
4%|β | 19/483 [00:42<08:04, 1.04s/it]
[2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
4%|β | 19/483 [00:42<08:04, 1.04s/it]
4%|β | 20/483 [00:43<06:48, 1.13it/s]
[2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
4%|β | 20/483 [00:43<06:48, 1.13it/s]
[2024-07-23 17:44:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
4%|β | 20/483 [00:43<06:48, 1.13it/s]
5%|β | 22/483 [00:44<06:20, 1.21it/s]
[2024-07-23 17:44:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
5%|β | 22/483 [00:46<06:20, 1.21it/s]
5%|β | 23/483 [00:48<11:03, 1.44s/it]
[2024-07-23 17:44:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
5%|β | 23/483 [00:48<11:03, 1.44s/it]
5%|β | 24/483 [00:48<08:30, 1.11s/it]
[2024-07-23 17:44:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
5%|β | 24/483 [00:48<08:30, 1.11s/it]
5%|β | 25/483 [00:48<07:23, 1.03it/s]
[2024-07-23 17:44:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
5%|β | 25/483 [00:48<07:23, 1.03it/s]
5%|β | 26/483 [00:49<06:12, 1.23it/s]
[2024-07-23 17:44:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
5%|β | 26/483 [00:49<06:12, 1.23it/s]
6%|β | 27/483 [00:49<05:38, 1.35it/s]
[2024-07-23 17:44:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
6%|β | 27/483 [00:49<05:38, 1.35it/s]
6%|β | 28/483 [00:50<04:56, 1.54it/s]
[2024-07-23 17:44:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00001-of-00030.safetensors |
|
6%|β | 28/483 [00:50<04:56, 1.54it/s]
[2024-07-23 17:44:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00002-of-00030.safetensors |
|
6%|β | 28/483 [00:50<04:56, 1.54it/s]
[2024-07-23 17:44:51] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00005-of-00030.safetensors |
|
6%|β | 28/483 [00:50<04:56, 1.54it/s]
[2024-07-23 17:44:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
6%|β | 28/483 [00:52<04:56, 1.54it/s]
6%|β | 29/483 [00:52<08:34, 1.13s/it]
[2024-07-23 17:44:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
6%|β | 29/483 [00:53<08:34, 1.13s/it]
6%|β | 30/483 [00:53<09:23, 1.24s/it]
[2024-07-23 17:44:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
6%|β | 30/483 [00:55<09:23, 1.24s/it]
6%|β | 31/483 [00:57<14:28, 1.92s/it]
[2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
6%|β | 31/483 [00:57<14:28, 1.92s/it]
7%|β | 32/483 [00:57<10:24, 1.38s/it]
[2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
7%|β | 32/483 [00:57<10:24, 1.38s/it]
7%|β | 33/483 [00:58<08:31, 1.14s/it]
[2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
7%|β | 33/483 [00:58<08:31, 1.14s/it]
7%|β | 34/483 [00:58<06:51, 1.09it/s]
[2024-07-23 17:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
7%|β | 34/483 [00:58<06:51, 1.09it/s]
[2024-07-23 17:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
7%|β | 34/483 [00:59<06:51, 1.09it/s]
7%|β | 36/483 [01:00<06:15, 1.19it/s]
[2024-07-23 17:45:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
7%|β | 36/483 [01:01<06:15, 1.19it/s]
8%|β | 37/483 [01:03<10:53, 1.46s/it]
[2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
8%|β | 37/483 [01:03<10:53, 1.46s/it]
8%|β | 38/483 [01:03<08:14, 1.11s/it]
[2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
8%|β | 38/483 [01:03<08:14, 1.11s/it]
8%|β | 39/483 [01:04<07:03, 1.05it/s]
[2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
8%|β | 39/483 [01:04<07:03, 1.05it/s]
8%|β | 40/483 [01:04<05:53, 1.25it/s]
[2024-07-23 17:45:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
8%|β | 40/483 [01:05<05:53, 1.25it/s]
8%|β | 41/483 [01:07<11:09, 1.52s/it]
[2024-07-23 17:45:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
8%|β | 41/483 [01:08<11:09, 1.52s/it]
9%|β | 42/483 [01:08<09:18, 1.27s/it]
[2024-07-23 17:45:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
9%|β | 42/483 [01:08<09:18, 1.27s/it]
9%|β | 43/483 [01:08<07:26, 1.01s/it]
[2024-07-23 17:45:09] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00005-of-00030.safetensors |
|
9%|β | 43/483 [01:08<07:26, 1.01s/it]
[2024-07-23 17:45:09] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00006-of-00030.safetensors |
|
9%|β | 43/483 [01:08<07:26, 1.01s/it]
[2024-07-23 17:45:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
9%|β | 43/483 [01:11<07:26, 1.01s/it]
9%|β | 44/483 [01:11<10:20, 1.41s/it]
[2024-07-23 17:45:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
9%|β | 44/483 [01:11<10:20, 1.41s/it]
9%|β | 45/483 [01:12<10:31, 1.44s/it]
[2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
9%|β | 45/483 [01:12<10:31, 1.44s/it]
[2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
9%|β | 45/483 [01:12<10:31, 1.44s/it]
[2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
9%|β | 45/483 [01:13<10:31, 1.44s/it]
10%|β | 48/483 [01:14<06:40, 1.09it/s]
[2024-07-23 17:45:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
10%|β | 48/483 [01:15<06:40, 1.09it/s]
10%|β | 49/483 [01:17<10:36, 1.47s/it]
[2024-07-23 17:45:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
10%|β | 49/483 [01:17<10:36, 1.47s/it]
10%|β | 50/483 [01:17<08:19, 1.15s/it]
[2024-07-23 17:45:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
10%|β | 50/483 [01:18<08:19, 1.15s/it]
11%|β | 51/483 [01:18<07:12, 1.00s/it]
[2024-07-23 17:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
11%|β | 51/483 [01:18<07:12, 1.00s/it]
11%|β | 52/483 [01:18<06:03, 1.19it/s]
[2024-07-23 17:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
11%|β | 52/483 [01:18<06:03, 1.19it/s]
[2024-07-23 17:45:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
11%|β | 52/483 [01:19<06:03, 1.19it/s]
11%|β | 54/483 [01:20<05:43, 1.25it/s]
[2024-07-23 17:45:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
11%|β | 54/483 [01:21<05:43, 1.25it/s]
11%|ββ | 55/483 [01:23<10:08, 1.42s/it]
[2024-07-23 17:45:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
11%|ββ | 55/483 [01:23<10:08, 1.42s/it]
12%|ββ | 56/483 [01:23<07:45, 1.09s/it]
[2024-07-23 17:45:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
12%|ββ | 56/483 [01:24<07:45, 1.09s/it]
12%|ββ | 57/483 [01:24<06:41, 1.06it/s]
[2024-07-23 17:45:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
12%|ββ | 57/483 [01:24<06:41, 1.06it/s]
12%|ββ | 58/483 [01:24<05:37, 1.26it/s]
[2024-07-23 17:45:25] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00007-of-00030.safetensors |
|
12%|ββ | 58/483 [01:24<05:37, 1.26it/s]
[2024-07-23 17:45:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
12%|ββ | 58/483 [01:28<05:37, 1.26it/s]
12%|ββ | 59/483 [01:30<14:42, 2.08s/it]
[2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
12%|ββ | 59/483 [01:30<14:42, 2.08s/it]
12%|ββ | 60/483 [01:30<11:49, 1.68s/it]
[2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
12%|ββ | 60/483 [01:30<11:49, 1.68s/it]
13%|ββ | 61/483 [01:31<09:11, 1.31s/it]
[2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
13%|ββ | 61/483 [01:31<09:11, 1.31s/it]
[2024-07-23 17:45:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
13%|ββ | 61/483 [01:31<09:11, 1.31s/it]
13%|ββ | 63/483 [01:32<07:24, 1.06s/it]
[2024-07-23 17:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
13%|ββ | 63/483 [01:32<07:24, 1.06s/it]
[2024-07-23 17:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
13%|ββ | 63/483 [01:32<07:24, 1.06s/it]
[2024-07-23 17:45:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
13%|ββ | 63/483 [01:33<07:24, 1.06s/it]
14%|ββ | 66/483 [01:34<05:30, 1.26it/s]
[2024-07-23 17:45:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
14%|ββ | 66/483 [01:35<05:30, 1.26it/s]
14%|ββ | 67/483 [01:37<09:03, 1.31s/it]
[2024-07-23 17:45:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
14%|ββ | 67/483 [01:37<09:03, 1.31s/it]
14%|ββ | 68/483 [01:37<07:16, 1.05s/it]
[2024-07-23 17:45:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
14%|ββ | 68/483 [01:38<07:16, 1.05s/it]
14%|ββ | 69/483 [01:38<06:25, 1.08it/s]
[2024-07-23 17:45:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
14%|ββ | 69/483 [01:38<06:25, 1.08it/s]
14%|ββ | 70/483 [01:38<05:29, 1.25it/s]
[2024-07-23 17:45:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
14%|ββ | 70/483 [01:38<05:29, 1.25it/s]
[2024-07-23 17:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
14%|ββ | 70/483 [01:39<05:29, 1.25it/s]
15%|ββ | 72/483 [01:40<05:19, 1.29it/s]
[2024-07-23 17:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
15%|ββ | 72/483 [01:41<05:19, 1.29it/s]
15%|ββ | 73/483 [01:43<09:23, 1.37s/it]
[2024-07-23 17:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
15%|ββ | 73/483 [01:43<09:23, 1.37s/it]
15%|ββ | 74/483 [01:43<07:14, 1.06s/it]
[2024-07-23 17:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
15%|ββ | 74/483 [01:44<07:14, 1.06s/it]
16%|ββ | 75/483 [01:44<06:17, 1.08it/s]
[2024-07-23 17:45:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
16%|ββ | 75/483 [01:44<06:17, 1.08it/s]
16%|ββ | 76/483 [01:44<05:18, 1.28it/s]
[2024-07-23 17:45:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
16%|ββ | 76/483 [01:44<05:18, 1.28it/s]
16%|ββ | 77/483 [01:45<04:49, 1.40it/s]
[2024-07-23 17:45:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
16%|ββ | 77/483 [01:45<04:49, 1.40it/s]
16%|ββ | 78/483 [01:45<04:12, 1.60it/s]
[2024-07-23 17:45:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00006-of-00030.safetensors |
|
16%|ββ | 78/483 [01:45<04:12, 1.60it/s]
[2024-07-23 17:45:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00007-of-00030.safetensors |
|
16%|ββ | 78/483 [01:45<04:12, 1.60it/s]
[2024-07-23 17:45:46] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00008-of-00030.safetensors |
|
16%|ββ | 78/483 [01:45<04:12, 1.60it/s]
[2024-07-23 17:45:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
16%|ββ | 78/483 [01:47<04:12, 1.60it/s]
16%|ββ | 79/483 [01:47<07:27, 1.11s/it]
[2024-07-23 17:45:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
16%|ββ | 79/483 [01:48<07:27, 1.11s/it]
17%|ββ | 80/483 [01:49<08:13, 1.22s/it]
[2024-07-23 17:45:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
17%|ββ | 80/483 [01:51<08:13, 1.22s/it]
17%|ββ | 81/483 [01:52<12:37, 1.88s/it]
[2024-07-23 17:45:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
17%|ββ | 81/483 [01:52<12:37, 1.88s/it]
17%|ββ | 82/483 [01:53<09:04, 1.36s/it]
[2024-07-23 17:45:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
17%|ββ | 82/483 [01:53<09:04, 1.36s/it]
[2024-07-23 17:45:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
17%|ββ | 82/483 [01:53<09:04, 1.36s/it]
17%|ββ | 84/483 [01:54<07:09, 1.08s/it]
[2024-07-23 17:45:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
17%|ββ | 84/483 [01:56<07:09, 1.08s/it]
18%|ββ | 85/483 [01:57<11:01, 1.66s/it]
[2024-07-23 17:45:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
18%|ββ | 85/483 [01:58<11:01, 1.66s/it]
18%|ββ | 86/483 [01:58<08:19, 1.26s/it]
[2024-07-23 17:45:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
18%|ββ | 86/483 [01:58<08:19, 1.26s/it]
18%|ββ | 87/483 [01:58<07:01, 1.06s/it]
[2024-07-23 17:45:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
18%|ββ | 87/483 [01:58<07:01, 1.06s/it]
18%|ββ | 88/483 [01:59<05:49, 1.13it/s]
[2024-07-23 17:45:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
18%|ββ | 88/483 [01:59<05:49, 1.13it/s]
[2024-07-23 17:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
18%|ββ | 88/483 [01:59<05:49, 1.13it/s]
19%|ββ | 90/483 [02:00<05:23, 1.22it/s]
[2024-07-23 17:46:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
19%|ββ | 90/483 [02:02<05:23, 1.22it/s]
19%|ββ | 91/483 [02:04<09:31, 1.46s/it]
[2024-07-23 17:46:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
19%|ββ | 91/483 [02:04<09:31, 1.46s/it]
19%|ββ | 92/483 [02:04<07:14, 1.11s/it]
[2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
19%|ββ | 92/483 [02:04<07:14, 1.11s/it]
19%|ββ | 93/483 [02:04<06:13, 1.04it/s]
[2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
19%|ββ | 93/483 [02:04<06:13, 1.04it/s]
19%|ββ | 94/483 [02:05<05:16, 1.23it/s]
[2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
19%|ββ | 94/483 [02:05<05:16, 1.23it/s]
20%|ββ | 95/483 [02:05<04:45, 1.36it/s]
[2024-07-23 17:46:06] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00008-of-00030.safetensors |
|
20%|ββ | 95/483 [02:05<04:45, 1.36it/s]
[2024-07-23 17:46:06] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00009-of-00030.safetensors |
|
20%|ββ | 95/483 [02:05<04:45, 1.36it/s]
[2024-07-23 17:46:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
20%|ββ | 95/483 [02:08<04:45, 1.36it/s]
20%|ββ | 96/483 [02:08<08:04, 1.25s/it]
[2024-07-23 17:46:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
20%|ββ | 96/483 [02:08<08:04, 1.25s/it]
20%|ββ | 97/483 [02:09<08:38, 1.34s/it]
[2024-07-23 17:46:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
20%|ββ | 97/483 [02:11<08:38, 1.34s/it]
20%|ββ | 98/483 [02:13<13:13, 2.06s/it]
[2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
20%|ββ | 98/483 [02:13<13:13, 2.06s/it]
20%|ββ | 99/483 [02:13<09:29, 1.48s/it]
[2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
20%|ββ | 99/483 [02:13<09:29, 1.48s/it]
21%|ββ | 100/483 [02:14<07:24, 1.16s/it]
[2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
21%|ββ | 100/483 [02:14<07:24, 1.16s/it]
[2024-07-23 17:46:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
21%|ββ | 100/483 [02:14<07:24, 1.16s/it]
21%|ββ | 102/483 [02:15<06:09, 1.03it/s]
[2024-07-23 17:46:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
21%|ββ | 102/483 [02:17<06:09, 1.03it/s]
21%|βββ | 103/483 [02:19<10:12, 1.61s/it]
[2024-07-23 17:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
21%|βββ | 103/483 [02:19<10:12, 1.61s/it]
22%|βββ | 104/483 [02:19<07:42, 1.22s/it]
[2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
22%|βββ | 104/483 [02:19<07:42, 1.22s/it]
22%|βββ | 105/483 [02:19<06:31, 1.03s/it]
[2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
22%|βββ | 105/483 [02:19<06:31, 1.03s/it]
22%|βββ | 106/483 [02:20<05:22, 1.17it/s]
[2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
22%|βββ | 106/483 [02:20<05:22, 1.17it/s]
[2024-07-23 17:46:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
22%|βββ | 106/483 [02:20<05:22, 1.17it/s]
22%|βββ | 108/483 [02:21<05:02, 1.24it/s]
[2024-07-23 17:46:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
22%|βββ | 108/483 [02:23<05:02, 1.24it/s]
23%|βββ | 109/483 [02:25<09:08, 1.47s/it]
[2024-07-23 17:46:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
23%|βββ | 109/483 [02:25<09:08, 1.47s/it]
23%|βββ | 110/483 [02:25<06:57, 1.12s/it]
[2024-07-23 17:46:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
23%|βββ | 110/483 [02:25<06:57, 1.12s/it]
23%|βββ | 111/483 [02:25<05:57, 1.04it/s]
[2024-07-23 17:46:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
23%|βββ | 111/483 [02:25<05:57, 1.04it/s]
23%|βββ | 112/483 [02:26<04:58, 1.24it/s]
[2024-07-23 17:46:26] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00009-of-00030.safetensors |
|
23%|βββ | 112/483 [02:26<04:58, 1.24it/s]
[2024-07-23 17:46:27] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00010-of-00030.safetensors |
|
23%|βββ | 112/483 [02:26<04:58, 1.24it/s]
[2024-07-23 17:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
23%|βββ | 112/483 [02:28<04:58, 1.24it/s]
23%|βββ | 113/483 [02:28<07:42, 1.25s/it]
[2024-07-23 17:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
23%|βββ | 113/483 [02:29<07:42, 1.25s/it]
24%|βββ | 114/483 [02:30<08:11, 1.33s/it]
[2024-07-23 17:46:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
24%|βββ | 114/483 [02:31<08:11, 1.33s/it]
24%|βββ | 115/483 [02:33<12:30, 2.04s/it]
[2024-07-23 17:46:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
24%|βββ | 115/483 [02:33<12:30, 2.04s/it]
24%|βββ | 116/483 [02:33<09:00, 1.47s/it]
[2024-07-23 17:46:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
24%|βββ | 116/483 [02:34<09:00, 1.47s/it]
24%|βββ | 117/483 [02:34<07:17, 1.19s/it]
[2024-07-23 17:46:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
24%|βββ | 117/483 [02:34<07:17, 1.19s/it]
24%|βββ | 118/483 [02:34<05:49, 1.04it/s]
[2024-07-23 17:46:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
24%|βββ | 118/483 [02:34<05:49, 1.04it/s]
[2024-07-23 17:46:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
24%|βββ | 118/483 [02:35<05:49, 1.04it/s]
25%|βββ | 120/483 [02:36<05:13, 1.16it/s]
[2024-07-23 17:46:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
25%|βββ | 120/483 [02:37<05:13, 1.16it/s]
25%|βββ | 121/483 [02:39<09:11, 1.52s/it]
[2024-07-23 17:46:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
25%|βββ | 121/483 [02:39<09:11, 1.52s/it]
25%|βββ | 122/483 [02:40<06:56, 1.15s/it]
[2024-07-23 17:46:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
25%|βββ | 122/483 [02:40<06:56, 1.15s/it]
25%|βββ | 123/483 [02:40<05:54, 1.02it/s]
[2024-07-23 17:46:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
25%|βββ | 123/483 [02:40<05:54, 1.02it/s]
26%|βββ | 124/483 [02:40<04:54, 1.22it/s]
[2024-07-23 17:46:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
26%|βββ | 124/483 [02:42<04:54, 1.22it/s]
26%|βββ | 125/483 [02:44<09:30, 1.59s/it]
[2024-07-23 17:46:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
26%|βββ | 125/483 [02:44<09:30, 1.59s/it]
26%|βββ | 126/483 [02:45<07:51, 1.32s/it]
[2024-07-23 17:46:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
26%|βββ | 126/483 [02:45<07:51, 1.32s/it]
26%|βββ | 127/483 [02:45<06:14, 1.05s/it]
[2024-07-23 17:46:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00010-of-00030.safetensors |
|
26%|βββ | 127/483 [02:45<06:14, 1.05s/it]
[2024-07-23 17:46:46] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00011-of-00030.safetensors |
|
26%|βββ | 127/483 [02:45<06:14, 1.05s/it]
[2024-07-23 17:46:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
26%|βββ | 127/483 [02:48<06:14, 1.05s/it]
27%|βββ | 128/483 [02:48<08:47, 1.48s/it]
[2024-07-23 17:46:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
27%|βββ | 128/483 [02:48<08:47, 1.48s/it]
27%|βββ | 129/483 [02:49<08:51, 1.50s/it]
[2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
27%|βββ | 129/483 [02:49<08:51, 1.50s/it]
[2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
27%|βββ | 129/483 [02:49<08:51, 1.50s/it]
[2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
27%|βββ | 129/483 [02:50<08:51, 1.50s/it]
27%|βββ | 132/483 [02:51<05:30, 1.06it/s]
[2024-07-23 17:46:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
27%|βββ | 132/483 [02:52<05:30, 1.06it/s]
28%|βββ | 133/483 [02:54<08:53, 1.53s/it]
[2024-07-23 17:46:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
28%|βββ | 133/483 [02:54<08:53, 1.53s/it]
28%|βββ | 134/483 [02:54<06:57, 1.20s/it]
[2024-07-23 17:46:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
28%|βββ | 134/483 [02:55<06:57, 1.20s/it]
28%|βββ | 135/483 [02:55<05:59, 1.03s/it]
[2024-07-23 17:46:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
28%|βββ | 135/483 [02:55<05:59, 1.03s/it]
28%|βββ | 136/483 [02:55<05:00, 1.15it/s]
[2024-07-23 17:46:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
28%|βββ | 136/483 [02:55<05:00, 1.15it/s]
[2024-07-23 17:46:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
28%|βββ | 136/483 [02:56<05:00, 1.15it/s]
29%|βββ | 138/483 [02:57<04:41, 1.23it/s]
[2024-07-23 17:46:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
29%|βββ | 138/483 [02:58<04:41, 1.23it/s]
29%|βββ | 139/483 [03:00<08:12, 1.43s/it]
[2024-07-23 17:47:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
29%|βββ | 139/483 [03:00<08:12, 1.43s/it]
29%|βββ | 140/483 [03:00<06:17, 1.10s/it]
[2024-07-23 17:47:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
29%|βββ | 140/483 [03:01<06:17, 1.10s/it]
29%|βββ | 141/483 [03:01<05:25, 1.05it/s]
[2024-07-23 17:47:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
29%|βββ | 141/483 [03:01<05:25, 1.05it/s]
29%|βββ | 142/483 [03:01<04:32, 1.25it/s]
[2024-07-23 17:47:02] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00012-of-00030.safetensors |
|
29%|βββ | 142/483 [03:01<04:32, 1.25it/s]
[2024-07-23 17:47:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
29%|βββ | 142/483 [03:07<04:32, 1.25it/s]
30%|βββ | 143/483 [03:08<14:34, 2.57s/it]
[2024-07-23 17:47:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
30%|βββ | 143/483 [03:09<14:34, 2.57s/it]
30%|βββ | 144/483 [03:09<11:23, 2.02s/it]
[2024-07-23 17:47:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
30%|βββ | 144/483 [03:09<11:23, 2.02s/it]
30%|βββ | 145/483 [03:09<08:42, 1.55s/it]
[2024-07-23 17:47:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
30%|βββ | 145/483 [03:09<08:42, 1.55s/it]
[2024-07-23 17:47:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
30%|βββ | 145/483 [03:10<08:42, 1.55s/it]
30%|βββ | 147/483 [03:11<06:33, 1.17s/it]
[2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
30%|βββ | 147/483 [03:11<06:33, 1.17s/it]
[2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
30%|βββ | 147/483 [03:11<06:33, 1.17s/it]
[2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
30%|βββ | 147/483 [03:11<06:33, 1.17s/it]
31%|βββ | 150/483 [03:12<04:37, 1.20it/s]
[2024-07-23 17:47:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
31%|βββ | 150/483 [03:14<04:37, 1.20it/s]
31%|ββββ | 151/483 [03:16<07:05, 1.28s/it]
[2024-07-23 17:47:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
31%|ββββ | 151/483 [03:16<07:05, 1.28s/it]
31%|ββββ | 152/483 [03:16<05:41, 1.03s/it]
[2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
31%|ββββ | 152/483 [03:16<05:41, 1.03s/it]
32%|ββββ | 153/483 [03:16<05:01, 1.09it/s]
[2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
32%|ββββ | 153/483 [03:16<05:01, 1.09it/s]
32%|ββββ | 154/483 [03:17<04:18, 1.27it/s]
[2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
32%|ββββ | 154/483 [03:17<04:18, 1.27it/s]
[2024-07-23 17:47:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
32%|ββββ | 154/483 [03:17<04:18, 1.27it/s]
32%|ββββ | 156/483 [03:18<04:08, 1.32it/s]
[2024-07-23 17:47:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
32%|ββββ | 156/483 [03:19<04:08, 1.32it/s]
33%|ββββ | 157/483 [03:21<07:08, 1.31s/it]
[2024-07-23 17:47:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
33%|ββββ | 157/483 [03:21<07:08, 1.31s/it]
33%|ββββ | 158/483 [03:21<05:29, 1.02s/it]
[2024-07-23 17:47:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
33%|ββββ | 158/483 [03:22<05:29, 1.02s/it]
33%|ββββ | 159/483 [03:22<04:48, 1.12it/s]
[2024-07-23 17:47:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
33%|ββββ | 159/483 [03:22<04:48, 1.12it/s]
33%|ββββ | 160/483 [03:22<04:04, 1.32it/s]
[2024-07-23 17:47:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
33%|ββββ | 160/483 [03:22<04:04, 1.32it/s]
33%|ββββ | 161/483 [03:23<03:42, 1.45it/s]
[2024-07-23 17:47:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
33%|ββββ | 161/483 [03:23<03:42, 1.45it/s]
34%|ββββ | 162/483 [03:23<03:15, 1.65it/s]
[2024-07-23 17:47:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00012-of-00030.safetensors |
|
34%|ββββ | 162/483 [03:23<03:15, 1.65it/s]
[2024-07-23 17:47:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00011-of-00030.safetensors |
|
34%|ββββ | 162/483 [03:23<03:15, 1.65it/s]
[2024-07-23 17:47:24] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00013-of-00030.safetensors |
|
34%|ββββ | 162/483 [03:23<03:15, 1.65it/s]
[2024-07-23 17:47:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
34%|ββββ | 162/483 [03:26<03:15, 1.65it/s]
34%|ββββ | 163/483 [03:26<06:03, 1.14s/it]
[2024-07-23 17:47:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
34%|ββββ | 163/483 [03:26<06:03, 1.14s/it]
34%|ββββ | 164/483 [03:27<06:30, 1.23s/it]
[2024-07-23 17:47:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
34%|ββββ | 164/483 [03:28<06:30, 1.23s/it]
34%|ββββ | 165/483 [03:30<09:36, 1.81s/it]
[2024-07-23 17:47:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
34%|ββββ | 165/483 [03:30<09:36, 1.81s/it]
34%|ββββ | 166/483 [03:30<06:54, 1.31s/it]
[2024-07-23 17:47:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
34%|ββββ | 166/483 [03:30<06:54, 1.31s/it]
[2024-07-23 17:47:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
34%|ββββ | 166/483 [03:31<06:54, 1.31s/it]
35%|ββββ | 168/483 [03:32<05:26, 1.04s/it]
[2024-07-23 17:47:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
35%|ββββ | 168/483 [03:33<05:26, 1.04s/it]
35%|ββββ | 169/483 [03:35<08:14, 1.57s/it]
[2024-07-23 17:47:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
35%|ββββ | 169/483 [03:35<08:14, 1.57s/it]
35%|ββββ | 170/483 [03:35<06:12, 1.19s/it]
[2024-07-23 17:47:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
35%|ββββ | 170/483 [03:35<06:12, 1.19s/it]
35%|ββββ | 171/483 [03:36<05:15, 1.01s/it]
[2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
35%|ββββ | 171/483 [03:36<05:15, 1.01s/it]
36%|ββββ | 172/483 [03:36<04:23, 1.18it/s]
[2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
36%|ββββ | 172/483 [03:36<04:23, 1.18it/s]
[2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
36%|ββββ | 172/483 [03:37<04:23, 1.18it/s]
36%|ββββ | 174/483 [03:38<04:04, 1.27it/s]
[2024-07-23 17:47:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
36%|ββββ | 174/483 [03:39<04:04, 1.27it/s]
36%|ββββ | 175/483 [03:41<07:03, 1.38s/it]
[2024-07-23 17:47:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
36%|ββββ | 175/483 [03:41<07:03, 1.38s/it]
36%|ββββ | 176/483 [03:41<05:22, 1.05s/it]
[2024-07-23 17:47:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
36%|ββββ | 176/483 [03:41<05:22, 1.05s/it]
37%|ββββ | 177/483 [03:41<04:39, 1.10it/s]
[2024-07-23 17:47:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
37%|ββββ | 177/483 [03:42<04:39, 1.10it/s]
37%|ββββ | 178/483 [03:42<03:57, 1.29it/s]
[2024-07-23 17:47:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
37%|ββββ | 178/483 [03:42<03:57, 1.29it/s]
37%|ββββ | 179/483 [03:42<03:36, 1.41it/s]
[2024-07-23 17:47:43] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00013-of-00030.safetensors |
|
37%|ββββ | 179/483 [03:42<03:36, 1.41it/s]
[2024-07-23 17:47:43] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00014-of-00030.safetensors |
|
37%|ββββ | 179/483 [03:43<03:36, 1.41it/s]
[2024-07-23 17:47:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
37%|ββββ | 179/483 [03:45<03:36, 1.41it/s]
37%|ββββ | 180/483 [03:45<06:41, 1.32s/it]
[2024-07-23 17:47:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
37%|ββββ | 180/483 [03:46<06:41, 1.32s/it]
37%|ββββ | 181/483 [03:47<06:51, 1.36s/it]
[2024-07-23 17:47:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
37%|ββββ | 181/483 [03:48<06:51, 1.36s/it]
38%|ββββ | 182/483 [03:50<09:37, 1.92s/it]
[2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
38%|ββββ | 182/483 [03:50<09:37, 1.92s/it]
38%|ββββ | 183/483 [03:50<06:54, 1.38s/it]
[2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
38%|ββββ | 183/483 [03:50<06:54, 1.38s/it]
38%|ββββ | 184/483 [03:50<05:25, 1.09s/it]
[2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
38%|ββββ | 184/483 [03:50<05:25, 1.09s/it]
[2024-07-23 17:47:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
38%|ββββ | 184/483 [03:51<05:25, 1.09s/it]
39%|ββββ | 186/483 [03:52<04:33, 1.09it/s]
[2024-07-23 17:47:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
39%|ββββ | 186/483 [03:53<04:33, 1.09it/s]
39%|ββββ | 187/483 [03:55<07:18, 1.48s/it]
[2024-07-23 17:47:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
39%|ββββ | 187/483 [03:55<07:18, 1.48s/it]
39%|ββββ | 188/483 [03:55<05:31, 1.12s/it]
[2024-07-23 17:47:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
39%|ββββ | 188/483 [03:55<05:31, 1.12s/it]
39%|ββββ | 189/483 [03:56<04:43, 1.04it/s]
[2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
39%|ββββ | 189/483 [03:56<04:43, 1.04it/s]
39%|ββββ | 190/483 [03:56<03:55, 1.24it/s]
[2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
39%|ββββ | 190/483 [03:56<03:55, 1.24it/s]
[2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
39%|ββββ | 190/483 [03:57<03:55, 1.24it/s]
40%|ββββ | 192/483 [03:58<03:43, 1.30it/s]
[2024-07-23 17:48:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
40%|ββββ | 192/483 [03:59<03:43, 1.30it/s]
40%|ββββ | 193/483 [04:01<06:31, 1.35s/it]
[2024-07-23 17:48:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
40%|ββββ | 193/483 [04:01<06:31, 1.35s/it]
40%|ββββ | 194/483 [04:01<04:58, 1.03s/it]
[2024-07-23 17:48:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
40%|ββββ | 194/483 [04:01<04:58, 1.03s/it]
40%|ββββ | 195/483 [04:01<04:18, 1.11it/s]
[2024-07-23 17:48:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
40%|ββββ | 195/483 [04:02<04:18, 1.11it/s]
41%|ββββ | 196/483 [04:02<03:37, 1.32it/s]
[2024-07-23 17:48:02] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00014-of-00030.safetensors |
|
41%|ββββ | 196/483 [04:02<03:37, 1.32it/s]
[2024-07-23 17:48:03] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00015-of-00030.safetensors |
|
41%|ββββ | 196/483 [04:02<03:37, 1.32it/s]
[2024-07-23 17:48:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
41%|ββββ | 196/483 [04:04<03:37, 1.32it/s]
41%|ββββ | 197/483 [04:04<05:38, 1.18s/it]
[2024-07-23 17:48:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
41%|ββββ | 197/483 [04:05<05:38, 1.18s/it]
41%|ββββ | 198/483 [04:05<05:58, 1.26s/it]
[2024-07-23 17:48:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
41%|ββββ | 198/483 [04:07<05:58, 1.26s/it]
41%|ββββ | 199/483 [04:09<08:39, 1.83s/it]
[2024-07-23 17:48:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
41%|ββββ | 199/483 [04:09<08:39, 1.83s/it]
41%|βββββ | 200/483 [04:09<06:13, 1.32s/it]
[2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
41%|βββββ | 200/483 [04:09<06:13, 1.32s/it]
42%|βββββ | 201/483 [04:09<05:06, 1.09s/it]
[2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
42%|βββββ | 201/483 [04:09<05:06, 1.09s/it]
42%|βββββ | 202/483 [04:10<04:08, 1.13it/s]
[2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
42%|βββββ | 202/483 [04:10<04:08, 1.13it/s]
[2024-07-23 17:48:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
42%|βββββ | 202/483 [04:10<04:08, 1.13it/s]
42%|βββββ | 204/483 [04:11<03:45, 1.24it/s]
[2024-07-23 17:48:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
42%|βββββ | 204/483 [04:12<03:45, 1.24it/s]
42%|βββββ | 205/483 [04:14<06:28, 1.40s/it]
[2024-07-23 17:48:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
42%|βββββ | 205/483 [04:14<06:28, 1.40s/it]
43%|βββββ | 206/483 [04:14<04:53, 1.06s/it]
[2024-07-23 17:48:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
43%|βββββ | 206/483 [04:15<04:53, 1.06s/it]
43%|βββββ | 207/483 [04:15<04:12, 1.09it/s]
[2024-07-23 17:48:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
43%|βββββ | 207/483 [04:15<04:12, 1.09it/s]
43%|βββββ | 208/483 [04:15<03:31, 1.30it/s]
[2024-07-23 17:48:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
43%|βββββ | 208/483 [04:17<03:31, 1.30it/s]
43%|βββββ | 209/483 [04:19<06:40, 1.46s/it]
[2024-07-23 17:48:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
43%|βββββ | 209/483 [04:19<06:40, 1.46s/it]
43%|βββββ | 210/483 [04:19<05:33, 1.22s/it]
[2024-07-23 17:48:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
43%|βββββ | 210/483 [04:19<05:33, 1.22s/it]
44%|βββββ | 211/483 [04:20<04:26, 1.02it/s]
[2024-07-23 17:48:20] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00015-of-00030.safetensors |
|
44%|βββββ | 211/483 [04:20<04:26, 1.02it/s]
[2024-07-23 17:48:20] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00003-of-00030.safetensors |
|
44%|βββββ | 211/483 [04:20<04:26, 1.02it/s]
[2024-07-23 17:48:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
44%|βββββ | 211/483 [04:23<04:26, 1.02it/s]
44%|βββββ | 212/483 [04:23<07:11, 1.59s/it]
[2024-07-23 17:48:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
44%|βββββ | 212/483 [04:23<07:11, 1.59s/it]
44%|βββββ | 213/483 [04:24<06:59, 1.55s/it]
[2024-07-23 17:48:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
44%|βββββ | 213/483 [04:25<06:59, 1.55s/it]
44%|βββββ | 214/483 [04:27<09:09, 2.04s/it]
[2024-07-23 17:48:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
44%|βββββ | 214/483 [04:27<09:09, 2.04s/it]
45%|βββββ | 215/483 [04:27<06:32, 1.47s/it]
[2024-07-23 17:48:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
45%|βββββ | 215/483 [04:27<06:32, 1.47s/it]
[2024-07-23 17:48:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
45%|βββββ | 215/483 [04:28<06:32, 1.47s/it]
45%|βββββ | 217/483 [04:29<04:58, 1.12s/it]
[2024-07-23 17:48:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
45%|βββββ | 217/483 [04:30<04:58, 1.12s/it]
45%|βββββ | 218/483 [04:32<07:12, 1.63s/it]
[2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
45%|βββββ | 218/483 [04:32<07:12, 1.63s/it]
45%|βββββ | 219/483 [04:32<05:25, 1.23s/it]
[2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
45%|βββββ | 219/483 [04:32<05:25, 1.23s/it]
46%|βββββ | 220/483 [04:33<04:33, 1.04s/it]
[2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
46%|βββββ | 220/483 [04:33<04:33, 1.04s/it]
46%|βββββ | 221/483 [04:33<03:45, 1.16it/s]
[2024-07-23 17:48:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
46%|βββββ | 221/483 [04:33<03:45, 1.16it/s]
[2024-07-23 17:48:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
46%|βββββ | 221/483 [04:34<03:45, 1.16it/s]
46%|βββββ | 223/483 [04:34<03:27, 1.25it/s]
[2024-07-23 17:48:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
46%|βββββ | 223/483 [04:36<03:27, 1.25it/s]
46%|βββββ | 224/483 [04:38<05:55, 1.37s/it]
[2024-07-23 17:48:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
46%|βββββ | 224/483 [04:38<05:55, 1.37s/it]
47%|βββββ | 225/483 [04:38<04:30, 1.05s/it]
[2024-07-23 17:48:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
47%|βββββ | 225/483 [04:38<04:30, 1.05s/it]
47%|βββββ | 226/483 [04:38<03:53, 1.10it/s]
[2024-07-23 17:48:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
47%|βββββ | 226/483 [04:38<03:53, 1.10it/s]
47%|βββββ | 227/483 [04:39<03:16, 1.30it/s]
[2024-07-23 17:48:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
47%|βββββ | 227/483 [04:39<03:16, 1.30it/s]
47%|βββββ | 228/483 [04:39<02:58, 1.43it/s]
[2024-07-23 17:48:40] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00003-of-00030.safetensors |
|
47%|βββββ | 228/483 [04:39<02:58, 1.43it/s]
[2024-07-23 17:48:40] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00016-of-00030.safetensors |
|
47%|βββββ | 228/483 [04:39<02:58, 1.43it/s]
[2024-07-23 17:48:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
47%|βββββ | 228/483 [04:42<02:58, 1.43it/s]
47%|βββββ | 229/483 [04:42<05:11, 1.23s/it]
[2024-07-23 17:48:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
47%|βββββ | 229/483 [04:42<05:11, 1.23s/it]
48%|βββββ | 230/483 [04:43<05:28, 1.30s/it]
[2024-07-23 17:48:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
48%|βββββ | 230/483 [04:43<05:28, 1.30s/it]
[2024-07-23 17:48:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
48%|βββββ | 230/483 [04:43<05:28, 1.30s/it]
[2024-07-23 17:48:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
48%|βββββ | 230/483 [04:44<05:28, 1.30s/it]
48%|βββββ | 233/483 [04:45<03:32, 1.18it/s]
[2024-07-23 17:48:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
48%|βββββ | 233/483 [04:46<03:32, 1.18it/s]
48%|βββββ | 234/483 [04:48<05:33, 1.34s/it]
[2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
48%|βββββ | 234/483 [04:48<05:33, 1.34s/it]
49%|βββββ | 235/483 [04:48<04:22, 1.06s/it]
[2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
49%|βββββ | 235/483 [04:48<04:22, 1.06s/it]
49%|βββββ | 236/483 [04:49<03:49, 1.07it/s]
[2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
49%|βββββ | 236/483 [04:49<03:49, 1.07it/s]
49%|βββββ | 237/483 [04:49<03:14, 1.27it/s]
[2024-07-23 17:48:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
49%|βββββ | 237/483 [04:49<03:14, 1.27it/s]
[2024-07-23 17:48:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
49%|βββββ | 237/483 [04:50<03:14, 1.27it/s]
49%|βββββ | 239/483 [04:50<03:05, 1.31it/s]
[2024-07-23 17:48:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
49%|βββββ | 239/483 [04:52<03:05, 1.31it/s]
50%|βββββ | 240/483 [04:54<05:24, 1.33s/it]
[2024-07-23 17:48:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
50%|βββββ | 240/483 [04:54<05:24, 1.33s/it]
50%|βββββ | 241/483 [04:54<04:08, 1.03s/it]
[2024-07-23 17:48:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
50%|βββββ | 241/483 [04:54<04:08, 1.03s/it]
50%|βββββ | 242/483 [04:54<03:35, 1.12it/s]
[2024-07-23 17:48:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
50%|βββββ | 242/483 [04:54<03:35, 1.12it/s]
50%|βββββ | 243/483 [04:55<03:02, 1.32it/s]
[2024-07-23 17:48:55] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00017-of-00030.safetensors |
|
50%|βββββ | 243/483 [04:55<03:02, 1.32it/s]
[2024-07-23 17:49:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
50%|βββββ | 243/483 [05:00<03:02, 1.32it/s]
51%|βββββ | 244/483 [05:02<09:59, 2.51s/it]
[2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
51%|βββββ | 244/483 [05:02<09:59, 2.51s/it]
51%|βββββ | 245/483 [05:02<07:49, 1.97s/it]
[2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
51%|βββββ | 245/483 [05:02<07:49, 1.97s/it]
51%|βββββ | 246/483 [05:03<05:59, 1.52s/it]
[2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
51%|βββββ | 246/483 [05:03<05:59, 1.52s/it]
[2024-07-23 17:49:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
51%|βββββ | 246/483 [05:03<05:59, 1.52s/it]
51%|ββββββ | 248/483 [05:04<04:31, 1.15s/it]
[2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
51%|ββββββ | 248/483 [05:04<04:31, 1.15s/it]
[2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
51%|ββββββ | 248/483 [05:04<04:31, 1.15s/it]
[2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
51%|ββββββ | 248/483 [05:05<04:31, 1.15s/it]
52%|ββββββ | 251/483 [05:06<03:11, 1.21it/s]
[2024-07-23 17:49:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
52%|ββββββ | 251/483 [05:07<03:11, 1.21it/s]
52%|ββββββ | 252/483 [05:09<04:53, 1.27s/it]
[2024-07-23 17:49:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
52%|ββββββ | 252/483 [05:09<04:53, 1.27s/it]
52%|ββββββ | 253/483 [05:09<03:55, 1.02s/it]
[2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
52%|ββββββ | 253/483 [05:09<03:55, 1.02s/it]
53%|ββββββ | 254/483 [05:09<03:28, 1.10it/s]
[2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
53%|ββββββ | 254/483 [05:10<03:28, 1.10it/s]
53%|ββββββ | 255/483 [05:10<02:58, 1.28it/s]
[2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
53%|ββββββ | 255/483 [05:10<02:58, 1.28it/s]
[2024-07-23 17:49:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
53%|ββββββ | 255/483 [05:10<02:58, 1.28it/s]
53%|ββββββ | 257/483 [05:11<02:50, 1.33it/s]
[2024-07-23 17:49:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
53%|ββββββ | 257/483 [05:13<02:50, 1.33it/s]
53%|ββββββ | 258/483 [05:14<04:55, 1.31s/it]
[2024-07-23 17:49:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
53%|ββββββ | 258/483 [05:14<04:55, 1.31s/it]
54%|ββββββ | 259/483 [05:14<03:47, 1.01s/it]
[2024-07-23 17:49:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
54%|ββββββ | 259/483 [05:15<03:47, 1.01s/it]
54%|ββββββ | 260/483 [05:15<03:17, 1.13it/s]
[2024-07-23 17:49:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
54%|ββββββ | 260/483 [05:15<03:17, 1.13it/s]
54%|ββββββ | 261/483 [05:15<02:47, 1.32it/s]
[2024-07-23 17:49:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
54%|ββββββ | 261/483 [05:16<02:47, 1.32it/s]
54%|ββββββ | 262/483 [05:16<02:32, 1.45it/s]
[2024-07-23 17:49:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
54%|ββββββ | 262/483 [05:16<02:32, 1.45it/s]
54%|ββββββ | 263/483 [05:16<02:13, 1.64it/s]
[2024-07-23 17:49:17] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00016-of-00030.safetensors |
|
54%|ββββββ | 263/483 [05:16<02:13, 1.64it/s]
[2024-07-23 17:49:17] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00017-of-00030.safetensors |
|
54%|ββββββ | 263/483 [05:16<02:13, 1.64it/s]
[2024-07-23 17:49:17] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00018-of-00030.safetensors |
|
54%|ββββββ | 263/483 [05:17<02:13, 1.64it/s]
[2024-07-23 17:49:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
54%|ββββββ | 263/483 [05:19<02:13, 1.64it/s]
55%|ββββββ | 264/483 [05:19<04:12, 1.15s/it]
[2024-07-23 17:49:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
55%|ββββββ | 264/483 [05:19<04:12, 1.15s/it]
55%|ββββββ | 265/483 [05:20<04:30, 1.24s/it]
[2024-07-23 17:49:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
55%|ββββββ | 265/483 [05:22<04:30, 1.24s/it]
55%|ββββββ | 266/483 [05:23<06:34, 1.82s/it]
[2024-07-23 17:49:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
55%|ββββββ | 266/483 [05:24<06:34, 1.82s/it]
55%|ββββββ | 267/483 [05:24<04:42, 1.31s/it]
[2024-07-23 17:49:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
55%|ββββββ | 267/483 [05:24<04:42, 1.31s/it]
[2024-07-23 17:49:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
55%|ββββββ | 267/483 [05:24<04:42, 1.31s/it]
56%|ββββββ | 269/483 [05:25<03:41, 1.04s/it]
[2024-07-23 17:49:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
56%|ββββββ | 269/483 [05:26<03:41, 1.04s/it]
56%|ββββββ | 270/483 [05:28<05:33, 1.57s/it]
[2024-07-23 17:49:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
56%|ββββββ | 270/483 [05:28<05:33, 1.57s/it]
56%|ββββββ | 271/483 [05:28<04:11, 1.19s/it]
[2024-07-23 17:49:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
56%|ββββββ | 271/483 [05:29<04:11, 1.19s/it]
56%|ββββββ | 272/483 [05:29<03:33, 1.01s/it]
[2024-07-23 17:49:30] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
56%|ββββββ | 272/483 [05:29<03:33, 1.01s/it]
57%|ββββββ | 273/483 [05:29<02:57, 1.18it/s]
[2024-07-23 17:49:30] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
57%|ββββββ | 273/483 [05:29<02:57, 1.18it/s]
[2024-07-23 17:49:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
57%|ββββββ | 273/483 [05:30<02:57, 1.18it/s]
57%|ββββββ | 275/483 [05:31<02:44, 1.26it/s]
[2024-07-23 17:49:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
57%|ββββββ | 275/483 [05:32<02:44, 1.26it/s]
57%|ββββββ | 276/483 [05:34<04:44, 1.37s/it]
[2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
57%|ββββββ | 276/483 [05:34<04:44, 1.37s/it]
57%|ββββββ | 277/483 [05:34<03:35, 1.05s/it]
[2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
57%|ββββββ | 277/483 [05:34<03:35, 1.05s/it]
58%|ββββββ | 278/483 [05:35<03:07, 1.10it/s]
[2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
58%|ββββββ | 278/483 [05:35<03:07, 1.10it/s]
58%|ββββββ | 279/483 [05:35<02:38, 1.29it/s]
[2024-07-23 17:49:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
58%|ββββββ | 279/483 [05:35<02:38, 1.29it/s]
58%|ββββββ | 280/483 [05:36<02:24, 1.41it/s]
[2024-07-23 17:49:36] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00018-of-00030.safetensors |
|
58%|ββββββ | 280/483 [05:36<02:24, 1.41it/s]
[2024-07-23 17:49:36] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00019-of-00030.safetensors |
|
58%|ββββββ | 280/483 [05:36<02:24, 1.41it/s]
[2024-07-23 17:49:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
58%|ββββββ | 280/483 [05:38<02:24, 1.41it/s]
58%|ββββββ | 281/483 [05:38<04:31, 1.34s/it]
[2024-07-23 17:49:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
58%|ββββββ | 281/483 [05:39<04:31, 1.34s/it]
58%|ββββββ | 282/483 [05:40<04:36, 1.38s/it]
[2024-07-23 17:49:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
58%|ββββββ | 282/483 [05:41<04:36, 1.38s/it]
59%|ββββββ | 283/483 [05:43<06:23, 1.92s/it]
[2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
59%|ββββββ | 283/483 [05:43<06:23, 1.92s/it]
59%|ββββββ | 284/483 [05:43<04:34, 1.38s/it]
[2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
59%|ββββββ | 284/483 [05:43<04:34, 1.38s/it]
59%|ββββββ | 285/483 [05:44<03:35, 1.09s/it]
[2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
59%|ββββββ | 285/483 [05:44<03:35, 1.09s/it]
[2024-07-23 17:49:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
59%|ββββββ | 285/483 [05:44<03:35, 1.09s/it]
59%|ββββββ | 287/483 [05:45<03:00, 1.09it/s]
[2024-07-23 17:49:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
59%|ββββββ | 287/483 [05:46<03:00, 1.09it/s]
60%|ββββββ | 288/483 [05:48<04:50, 1.49s/it]
[2024-07-23 17:49:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
60%|ββββββ | 288/483 [05:48<04:50, 1.49s/it]
60%|ββββββ | 289/483 [05:48<03:38, 1.13s/it]
[2024-07-23 17:49:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
60%|ββββββ | 289/483 [05:49<03:38, 1.13s/it]
60%|ββββββ | 290/483 [05:49<03:06, 1.03it/s]
[2024-07-23 17:49:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
60%|ββββββ | 290/483 [05:49<03:06, 1.03it/s]
60%|ββββββ | 291/483 [05:49<02:34, 1.24it/s]
[2024-07-23 17:49:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
60%|ββββββ | 291/483 [05:49<02:34, 1.24it/s]
[2024-07-23 17:49:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
60%|ββββββ | 291/483 [05:50<02:34, 1.24it/s]
61%|ββββββ | 293/483 [05:51<02:26, 1.30it/s]
[2024-07-23 17:49:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
61%|ββββββ | 293/483 [05:52<02:26, 1.30it/s]
61%|ββββββ | 294/483 [05:54<04:14, 1.35s/it]
[2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
61%|ββββββ | 294/483 [05:54<04:14, 1.35s/it]
61%|ββββββ | 295/483 [05:54<03:13, 1.03s/it]
[2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
61%|ββββββ | 295/483 [05:54<03:13, 1.03s/it]
61%|βββββββ | 296/483 [05:55<02:47, 1.12it/s]
[2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
61%|βββββββ | 296/483 [05:55<02:47, 1.12it/s]
61%|βββββββ | 297/483 [05:55<02:20, 1.32it/s]
[2024-07-23 17:49:56] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00019-of-00030.safetensors |
|
61%|βββββββ | 297/483 [05:55<02:20, 1.32it/s]
[2024-07-23 17:49:56] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00020-of-00030.safetensors |
|
61%|βββββββ | 297/483 [05:55<02:20, 1.32it/s]
[2024-07-23 17:49:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
61%|βββββββ | 297/483 [05:57<02:20, 1.32it/s]
62%|βββββββ | 298/483 [05:57<03:49, 1.24s/it]
[2024-07-23 17:49:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
62%|βββββββ | 298/483 [05:58<03:49, 1.24s/it]
62%|βββββββ | 299/483 [05:59<03:59, 1.30s/it]
[2024-07-23 17:50:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
62%|βββββββ | 299/483 [06:00<03:59, 1.30s/it]
62%|βββββββ | 300/483 [06:02<05:38, 1.85s/it]
[2024-07-23 17:50:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
62%|βββββββ | 300/483 [06:02<05:38, 1.85s/it]
62%|βββββββ | 301/483 [06:02<04:03, 1.34s/it]
[2024-07-23 17:50:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
62%|βββββββ | 301/483 [06:02<04:03, 1.34s/it]
63%|βββββββ | 302/483 [06:03<03:18, 1.10s/it]
[2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
63%|βββββββ | 302/483 [06:03<03:18, 1.10s/it]
63%|βββββββ | 303/483 [06:03<02:40, 1.12it/s]
[2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
63%|βββββββ | 303/483 [06:03<02:40, 1.12it/s]
[2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
63%|βββββββ | 303/483 [06:04<02:40, 1.12it/s]
63%|βββββββ | 305/483 [06:05<02:24, 1.23it/s]
[2024-07-23 17:50:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
63%|βββββββ | 305/483 [06:06<02:24, 1.23it/s]
63%|βββββββ | 306/483 [06:08<04:07, 1.40s/it]
[2024-07-23 17:50:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
63%|βββββββ | 306/483 [06:08<04:07, 1.40s/it]
64%|βββββββ | 307/483 [06:08<03:06, 1.06s/it]
[2024-07-23 17:50:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
64%|βββββββ | 307/483 [06:08<03:06, 1.06s/it]
64%|βββββββ | 308/483 [06:08<02:40, 1.09it/s]
[2024-07-23 17:50:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
64%|βββββββ | 308/483 [06:09<02:40, 1.09it/s]
64%|βββββββ | 309/483 [06:09<02:14, 1.29it/s]
[2024-07-23 17:50:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
64%|βββββββ | 309/483 [06:10<02:14, 1.29it/s]
64%|βββββββ | 310/483 [06:12<04:10, 1.45s/it]
[2024-07-23 17:50:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
64%|βββββββ | 310/483 [06:12<04:10, 1.45s/it]
64%|βββββββ | 311/483 [06:13<03:28, 1.21s/it]
[2024-07-23 17:50:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
64%|βββββββ | 311/483 [06:13<03:28, 1.21s/it]
65%|βββββββ | 312/483 [06:13<02:46, 1.03it/s]
[2024-07-23 17:50:14] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00020-of-00030.safetensors |
|
65%|βββββββ | 312/483 [06:13<02:46, 1.03it/s]
[2024-07-23 17:50:14] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00021-of-00030.safetensors |
|
65%|βββββββ | 312/483 [06:13<02:46, 1.03it/s]
[2024-07-23 17:50:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
65%|βββββββ | 312/483 [06:16<02:46, 1.03it/s]
65%|βββββββ | 313/483 [06:16<04:06, 1.45s/it]
[2024-07-23 17:50:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
65%|βββββββ | 313/483 [06:16<04:06, 1.45s/it]
65%|βββββββ | 314/483 [06:17<04:05, 1.45s/it]
[2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
65%|βββββββ | 314/483 [06:17<04:05, 1.45s/it]
[2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
65%|βββββββ | 314/483 [06:17<04:05, 1.45s/it]
[2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
65%|βββββββ | 314/483 [06:18<04:05, 1.45s/it]
66%|βββββββ | 317/483 [06:18<02:31, 1.10it/s]
[2024-07-23 17:50:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
66%|βββββββ | 317/483 [06:20<02:31, 1.10it/s]
66%|βββββββ | 318/483 [06:22<03:49, 1.39s/it]
[2024-07-23 17:50:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
66%|βββββββ | 318/483 [06:22<03:49, 1.39s/it]
66%|βββββββ | 319/483 [06:22<02:59, 1.09s/it]
[2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
66%|βββββββ | 319/483 [06:22<02:59, 1.09s/it]
66%|βββββββ | 320/483 [06:22<02:35, 1.05it/s]
[2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
66%|βββββββ | 320/483 [06:22<02:35, 1.05it/s]
66%|βββββββ | 321/483 [06:23<02:10, 1.24it/s]
[2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
66%|βββββββ | 321/483 [06:23<02:10, 1.24it/s]
[2024-07-23 17:50:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
66%|βββββββ | 321/483 [06:23<02:10, 1.24it/s]
67%|βββββββ | 323/483 [06:24<02:03, 1.30it/s]
[2024-07-23 17:50:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
67%|βββββββ | 323/483 [06:25<02:03, 1.30it/s]
67%|βββββββ | 324/483 [06:27<03:32, 1.34s/it]
[2024-07-23 17:50:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
67%|βββββββ | 324/483 [06:27<03:32, 1.34s/it]
67%|βββββββ | 325/483 [06:27<02:42, 1.03s/it]
[2024-07-23 17:50:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
67%|βββββββ | 325/483 [06:28<02:42, 1.03s/it]
67%|βββββββ | 326/483 [06:28<02:20, 1.12it/s]
[2024-07-23 17:50:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
67%|βββββββ | 326/483 [06:28<02:20, 1.12it/s]
68%|βββββββ | 327/483 [06:28<01:58, 1.32it/s]
[2024-07-23 17:50:29] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00022-of-00030.safetensors |
|
68%|βββββββ | 327/483 [06:28<01:58, 1.32it/s]
[2024-07-23 17:50:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
68%|βββββββ | 327/483 [06:33<01:58, 1.32it/s]
68%|βββββββ | 328/483 [06:35<06:17, 2.44s/it]
[2024-07-23 17:50:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
68%|βββββββ | 328/483 [06:35<06:17, 2.44s/it]
68%|βββββββ | 329/483 [06:36<04:55, 1.92s/it]
[2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
68%|βββββββ | 329/483 [06:36<04:55, 1.92s/it]
68%|βββββββ | 330/483 [06:36<03:46, 1.48s/it]
[2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
68%|βββββββ | 330/483 [06:36<03:46, 1.48s/it]
[2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
68%|βββββββ | 330/483 [06:37<03:46, 1.48s/it]
69%|βββββββ | 332/483 [06:38<02:51, 1.13s/it]
[2024-07-23 17:50:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
69%|βββββββ | 332/483 [06:38<02:51, 1.13s/it]
[2024-07-23 17:50:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
69%|βββββββ | 332/483 [06:38<02:51, 1.13s/it]
[2024-07-23 17:50:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
69%|βββββββ | 332/483 [06:38<02:51, 1.13s/it]
69%|βββββββ | 335/483 [06:39<02:00, 1.23it/s]
[2024-07-23 17:50:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
69%|βββββββ | 335/483 [06:40<02:00, 1.23it/s]
70%|βββββββ | 336/483 [06:42<03:06, 1.27s/it]
[2024-07-23 17:50:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
70%|βββββββ | 336/483 [06:42<03:06, 1.27s/it]
70%|βββββββ | 337/483 [06:42<02:28, 1.02s/it]
[2024-07-23 17:50:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
70%|βββββββ | 337/483 [06:42<02:28, 1.02s/it]
70%|βββββββ | 338/483 [06:43<02:10, 1.11it/s]
[2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
70%|βββββββ | 338/483 [06:43<02:10, 1.11it/s]
70%|βββββββ | 339/483 [06:43<01:51, 1.29it/s]
[2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
70%|βββββββ | 339/483 [06:43<01:51, 1.29it/s]
[2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
70%|βββββββ | 339/483 [06:44<01:51, 1.29it/s]
71%|βββββββ | 341/483 [06:45<01:46, 1.33it/s]
[2024-07-23 17:50:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
71%|βββββββ | 341/483 [06:46<01:46, 1.33it/s]
71%|βββββββ | 342/483 [06:48<03:04, 1.31s/it]
[2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
71%|βββββββ | 342/483 [06:48<03:04, 1.31s/it]
71%|βββββββ | 343/483 [06:48<02:21, 1.01s/it]
[2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
71%|βββββββ | 343/483 [06:48<02:21, 1.01s/it]
71%|βββββββ | 344/483 [06:48<02:03, 1.13it/s]
[2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
71%|βββββββ | 344/483 [06:49<02:03, 1.13it/s]
71%|ββββββββ | 345/483 [06:49<01:44, 1.33it/s]
[2024-07-23 17:50:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
71%|ββββββββ | 345/483 [06:49<01:44, 1.33it/s]
72%|ββββββββ | 346/483 [06:49<01:34, 1.45it/s]
[2024-07-23 17:50:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
72%|ββββββββ | 346/483 [06:50<01:34, 1.45it/s]
72%|ββββββββ | 347/483 [06:50<01:22, 1.65it/s]
[2024-07-23 17:50:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00021-of-00030.safetensors |
|
72%|ββββββββ | 347/483 [06:50<01:22, 1.65it/s]
[2024-07-23 17:50:51] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00022-of-00030.safetensors |
|
72%|ββββββββ | 347/483 [06:50<01:22, 1.65it/s]
[2024-07-23 17:50:51] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00023-of-00030.safetensors |
|
72%|ββββββββ | 347/483 [06:50<01:22, 1.65it/s]
[2024-07-23 17:50:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
72%|ββββββββ | 347/483 [06:52<01:22, 1.65it/s]
72%|ββββββββ | 348/483 [06:52<02:41, 1.20s/it]
[2024-07-23 17:50:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
72%|ββββββββ | 348/483 [06:53<02:41, 1.20s/it]
72%|ββββββββ | 349/483 [06:54<02:50, 1.27s/it]
[2024-07-23 17:50:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
72%|ββββββββ | 349/483 [06:55<02:50, 1.27s/it]
72%|ββββββββ | 350/483 [06:57<04:04, 1.84s/it]
[2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
72%|ββββββββ | 350/483 [06:57<04:04, 1.84s/it]
73%|ββββββββ | 351/483 [06:57<02:54, 1.32s/it]
[2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
73%|ββββββββ | 351/483 [06:57<02:54, 1.32s/it]
[2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
73%|ββββββββ | 351/483 [06:58<02:54, 1.32s/it]
73%|ββββββββ | 353/483 [06:59<02:15, 1.04s/it]
[2024-07-23 17:51:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
73%|ββββββββ | 353/483 [07:00<02:15, 1.04s/it]
73%|ββββββββ | 354/483 [07:02<03:22, 1.57s/it]
[2024-07-23 17:51:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
73%|ββββββββ | 354/483 [07:02<03:22, 1.57s/it]
73%|ββββββββ | 355/483 [07:02<02:32, 1.19s/it]
[2024-07-23 17:51:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
73%|ββββββββ | 355/483 [07:02<02:32, 1.19s/it]
74%|ββββββββ | 356/483 [07:02<02:08, 1.01s/it]
[2024-07-23 17:51:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
74%|ββββββββ | 356/483 [07:03<02:08, 1.01s/it]
74%|ββββββββ | 357/483 [07:03<01:46, 1.18it/s]
[2024-07-23 17:51:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
74%|ββββββββ | 357/483 [07:03<01:46, 1.18it/s]
[2024-07-23 17:51:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
74%|ββββββββ | 357/483 [07:03<01:46, 1.18it/s]
74%|ββββββββ | 359/483 [07:04<01:37, 1.27it/s]
[2024-07-23 17:51:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
74%|ββββββββ | 359/483 [07:06<01:37, 1.27it/s]
75%|ββββββββ | 360/483 [07:07<02:48, 1.37s/it]
[2024-07-23 17:51:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
75%|ββββββββ | 360/483 [07:08<02:48, 1.37s/it]
75%|ββββββββ | 361/483 [07:08<02:07, 1.04s/it]
[2024-07-23 17:51:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
75%|ββββββββ | 361/483 [07:08<02:07, 1.04s/it]
75%|ββββββββ | 362/483 [07:08<01:49, 1.10it/s]
[2024-07-23 17:51:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
75%|ββββββββ | 362/483 [07:08<01:49, 1.10it/s]
75%|ββββββββ | 363/483 [07:09<01:32, 1.29it/s]
[2024-07-23 17:51:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
75%|ββββββββ | 363/483 [07:09<01:32, 1.29it/s]
75%|ββββββββ | 364/483 [07:09<01:24, 1.41it/s]
[2024-07-23 17:51:10] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00023-of-00030.safetensors |
|
75%|ββββββββ | 364/483 [07:09<01:24, 1.41it/s]
[2024-07-23 17:51:10] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00024-of-00030.safetensors |
|
75%|ββββββββ | 364/483 [07:09<01:24, 1.41it/s]
[2024-07-23 17:51:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
75%|ββββββββ | 364/483 [07:12<01:24, 1.41it/s]
76%|ββββββββ | 365/483 [07:12<02:39, 1.35s/it]
[2024-07-23 17:51:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
76%|ββββββββ | 365/483 [07:13<02:39, 1.35s/it]
76%|ββββββββ | 366/483 [07:13<02:41, 1.38s/it]
[2024-07-23 17:51:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
76%|ββββββββ | 366/483 [07:15<02:41, 1.38s/it]
76%|ββββββββ | 367/483 [07:17<03:42, 1.92s/it]
[2024-07-23 17:51:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
76%|ββββββββ | 367/483 [07:17<03:42, 1.92s/it]
76%|ββββββββ | 368/483 [07:17<02:38, 1.38s/it]
[2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
76%|ββββββββ | 368/483 [07:17<02:38, 1.38s/it]
76%|ββββββββ | 369/483 [07:17<02:04, 1.09s/it]
[2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
76%|ββββββββ | 369/483 [07:17<02:04, 1.09s/it]
[2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
76%|ββββββββ | 369/483 [07:18<02:04, 1.09s/it]
77%|ββββββββ | 371/483 [07:19<01:42, 1.09it/s]
[2024-07-23 17:51:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
77%|ββββββββ | 371/483 [07:20<01:42, 1.09it/s]
77%|ββββββββ | 372/483 [07:22<02:44, 1.48s/it]
[2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
77%|ββββββββ | 372/483 [07:22<02:44, 1.48s/it]
77%|ββββββββ | 373/483 [07:22<02:03, 1.12s/it]
[2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
77%|ββββββββ | 373/483 [07:22<02:03, 1.12s/it]
77%|ββββββββ | 374/483 [07:22<01:44, 1.04it/s]
[2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
77%|ββββββββ | 374/483 [07:23<01:44, 1.04it/s]
78%|ββββββββ | 375/483 [07:23<01:26, 1.25it/s]
[2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
78%|ββββββββ | 375/483 [07:23<01:26, 1.25it/s]
[2024-07-23 17:51:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
78%|ββββββββ | 375/483 [07:23<01:26, 1.25it/s]
78%|ββββββββ | 377/483 [07:24<01:21, 1.31it/s]
[2024-07-23 17:51:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
78%|ββββββββ | 377/483 [07:26<01:21, 1.31it/s]
78%|ββββββββ | 378/483 [07:27<02:21, 1.35s/it]
[2024-07-23 17:51:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
78%|ββββββββ | 378/483 [07:27<02:21, 1.35s/it]
78%|ββββββββ | 379/483 [07:28<01:46, 1.03s/it]
[2024-07-23 17:51:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
78%|ββββββββ | 379/483 [07:28<01:46, 1.03s/it]
79%|ββββββββ | 380/483 [07:28<01:32, 1.12it/s]
[2024-07-23 17:51:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
79%|ββββββββ | 380/483 [07:28<01:32, 1.12it/s]
79%|ββββββββ | 381/483 [07:28<01:17, 1.32it/s]
[2024-07-23 17:51:29] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00024-of-00030.safetensors |
|
79%|ββββββββ | 381/483 [07:28<01:17, 1.32it/s]
[2024-07-23 17:51:29] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00025-of-00030.safetensors |
|
79%|ββββββββ | 381/483 [07:29<01:17, 1.32it/s]
[2024-07-23 17:51:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
79%|ββββββββ | 381/483 [07:31<01:17, 1.32it/s]
79%|ββββββββ | 382/483 [07:31<01:56, 1.15s/it]
[2024-07-23 17:51:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
79%|ββββββββ | 382/483 [07:31<01:56, 1.15s/it]
79%|ββββββββ | 383/483 [07:32<02:03, 1.24s/it]
[2024-07-23 17:51:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
79%|ββββββββ | 383/483 [07:33<02:03, 1.24s/it]
80%|ββββββββ | 384/483 [07:35<02:58, 1.80s/it]
[2024-07-23 17:51:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
80%|ββββββββ | 384/483 [07:35<02:58, 1.80s/it]
80%|ββββββββ | 385/483 [07:35<02:07, 1.30s/it]
[2024-07-23 17:51:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
80%|ββββββββ | 385/483 [07:36<02:07, 1.30s/it]
80%|ββββββββ | 386/483 [07:36<01:44, 1.07s/it]
[2024-07-23 17:51:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
80%|ββββββββ | 386/483 [07:36<01:44, 1.07s/it]
80%|ββββββββ | 387/483 [07:36<01:23, 1.14it/s]
[2024-07-23 17:51:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
80%|ββββββββ | 387/483 [07:36<01:23, 1.14it/s]
[2024-07-23 17:51:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
80%|ββββββββ | 387/483 [07:37<01:23, 1.14it/s]
81%|ββββββββ | 389/483 [07:38<01:15, 1.25it/s]
[2024-07-23 17:51:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
81%|ββββββββ | 389/483 [07:39<01:15, 1.25it/s]
81%|ββββββββ | 390/483 [07:41<02:09, 1.39s/it]
[2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
81%|ββββββββ | 390/483 [07:41<02:09, 1.39s/it]
81%|ββββββββ | 391/483 [07:41<01:37, 1.06s/it]
[2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
81%|ββββββββ | 391/483 [07:41<01:37, 1.06s/it]
81%|ββββββββ | 392/483 [07:42<01:23, 1.09it/s]
[2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
81%|ββββββββ | 392/483 [07:42<01:23, 1.09it/s]
81%|βββββββββ | 393/483 [07:42<01:09, 1.30it/s]
[2024-07-23 17:51:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
81%|βββββββββ | 393/483 [07:43<01:09, 1.30it/s]
82%|βββββββββ | 394/483 [07:45<02:09, 1.46s/it]
[2024-07-23 17:51:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
82%|βββββββββ | 394/483 [07:45<02:09, 1.46s/it]
82%|βββββββββ | 395/483 [07:46<01:47, 1.22s/it]
[2024-07-23 17:51:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
82%|βββββββββ | 395/483 [07:46<01:47, 1.22s/it]
82%|βββββββββ | 396/483 [07:46<01:25, 1.02it/s]
[2024-07-23 17:51:47] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00025-of-00030.safetensors |
|
82%|βββββββββ | 396/483 [07:46<01:25, 1.02it/s]
[2024-07-23 17:51:47] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00026-of-00030.safetensors |
|
82%|βββββββββ | 396/483 [07:46<01:25, 1.02it/s]
[2024-07-23 17:51:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
82%|βββββββββ | 396/483 [07:49<01:25, 1.02it/s]
82%|βββββββββ | 397/483 [07:49<02:05, 1.45s/it]
[2024-07-23 17:51:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
82%|βββββββββ | 397/483 [07:49<02:05, 1.45s/it]
82%|βββββββββ | 398/483 [07:50<02:03, 1.45s/it]
[2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
82%|βββββββββ | 398/483 [07:50<02:03, 1.45s/it]
[2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
82%|βββββββββ | 398/483 [07:50<02:03, 1.45s/it]
[2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
82%|βββββββββ | 398/483 [07:51<02:03, 1.45s/it]
83%|βββββββββ | 401/483 [07:52<01:14, 1.10it/s]
[2024-07-23 17:51:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
83%|βββββββββ | 401/483 [07:53<01:14, 1.10it/s]
83%|βββββββββ | 402/483 [07:55<01:52, 1.39s/it]
[2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
83%|βββββββββ | 402/483 [07:55<01:52, 1.39s/it]
83%|βββββββββ | 403/483 [07:55<01:27, 1.09s/it]
[2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
83%|βββββββββ | 403/483 [07:55<01:27, 1.09s/it]
84%|βββββββββ | 404/483 [07:55<01:15, 1.05it/s]
[2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
84%|βββββββββ | 404/483 [07:56<01:15, 1.05it/s]
84%|βββββββββ | 405/483 [07:56<01:03, 1.24it/s]
[2024-07-23 17:51:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
84%|βββββββββ | 405/483 [07:56<01:03, 1.24it/s]
[2024-07-23 17:51:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
84%|βββββββββ | 405/483 [07:56<01:03, 1.24it/s]
84%|βββββββββ | 407/483 [07:57<00:58, 1.30it/s]
[2024-07-23 17:51:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
84%|βββββββββ | 407/483 [07:59<00:58, 1.30it/s]
84%|βββββββββ | 408/483 [08:01<01:40, 1.34s/it]
[2024-07-23 17:52:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
84%|βββββββββ | 408/483 [08:01<01:40, 1.34s/it]
85%|βββββββββ | 409/483 [08:01<01:16, 1.03s/it]
[2024-07-23 17:52:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
85%|βββββββββ | 409/483 [08:01<01:16, 1.03s/it]
85%|βββββββββ | 410/483 [08:01<01:05, 1.11it/s]
[2024-07-23 17:52:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
85%|βββββββββ | 410/483 [08:01<01:05, 1.11it/s]
85%|βββββββββ | 411/483 [08:02<00:54, 1.31it/s]
[2024-07-23 17:52:02] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors |
|
85%|βββββββββ | 411/483 [08:02<00:54, 1.31it/s]
[2024-07-23 17:52:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
85%|βββββββββ | 411/483 [08:07<00:54, 1.31it/s]
85%|βββββββββ | 412/483 [08:09<03:03, 2.58s/it]
[2024-07-23 17:52:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
85%|βββββββββ | 412/483 [08:09<03:03, 2.58s/it]
86%|βββββββββ | 413/483 [08:09<02:21, 2.02s/it]
[2024-07-23 17:52:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
86%|βββββββββ | 413/483 [08:10<02:21, 2.02s/it]
86%|βββββββββ | 414/483 [08:10<01:47, 1.55s/it]
[2024-07-23 17:52:10] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00026-of-00030.safetensors |
|
86%|βββββββββ | 414/483 [08:10<01:47, 1.55s/it]
[2024-07-23 17:52:11] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors |
|
86%|βββββββββ | 414/483 [08:10<01:47, 1.55s/it]
[2024-07-23 17:52:11] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00004-of-00030.safetensors |
|
86%|βββββββββ | 414/483 [08:10<01:47, 1.55s/it]
[2024-07-23 17:52:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
86%|βββββββββ | 414/483 [08:12<01:47, 1.55s/it]
86%|βββββββββ | 415/483 [08:12<02:04, 1.84s/it]
[2024-07-23 17:52:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
86%|βββββββββ | 415/483 [08:13<02:04, 1.84s/it]
86%|βββββββββ | 416/483 [08:14<01:55, 1.72s/it]
[2024-07-23 17:52:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
86%|βββββββββ | 416/483 [08:15<01:55, 1.72s/it]
86%|βββββββββ | 417/483 [08:17<02:21, 2.15s/it]
[2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
86%|βββββββββ | 417/483 [08:17<02:21, 2.15s/it]
87%|βββββββββ | 418/483 [08:17<01:40, 1.54s/it]
[2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
87%|βββββββββ | 418/483 [08:17<01:40, 1.54s/it]
87%|βββββββββ | 419/483 [08:17<01:17, 1.21s/it]
[2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
87%|βββββββββ | 419/483 [08:17<01:17, 1.21s/it]
[2024-07-23 17:52:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
87%|βββββββββ | 419/483 [08:18<01:17, 1.21s/it]
87%|βββββββββ | 421/483 [08:19<01:00, 1.02it/s]
[2024-07-23 17:52:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
87%|βββββββββ | 421/483 [08:20<01:00, 1.02it/s]
87%|βββββββββ | 422/483 [08:22<01:33, 1.53s/it]
[2024-07-23 17:52:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
87%|βββββββββ | 422/483 [08:22<01:33, 1.53s/it]
88%|βββββββββ | 423/483 [08:22<01:09, 1.16s/it]
[2024-07-23 17:52:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
88%|βββββββββ | 423/483 [08:22<01:09, 1.16s/it]
88%|βββββββββ | 424/483 [08:23<00:58, 1.02it/s]
[2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
88%|βββββββββ | 424/483 [08:23<00:58, 1.02it/s]
88%|βββββββββ | 425/483 [08:23<00:47, 1.22it/s]
[2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
88%|βββββββββ | 425/483 [08:23<00:47, 1.22it/s]
[2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
88%|βββββββββ | 425/483 [08:24<00:47, 1.22it/s]
88%|βββββββββ | 427/483 [08:25<00:43, 1.29it/s]
[2024-07-23 17:52:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
88%|βββββββββ | 427/483 [08:26<00:43, 1.29it/s]
89%|βββββββββ | 428/483 [08:28<01:14, 1.36s/it]
[2024-07-23 17:52:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
89%|βββββββββ | 428/483 [08:28<01:14, 1.36s/it]
89%|βββββββββ | 429/483 [08:28<00:56, 1.04s/it]
[2024-07-23 17:52:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
89%|βββββββββ | 429/483 [08:28<00:56, 1.04s/it]
89%|βββββββββ | 430/483 [08:28<00:47, 1.11it/s]
[2024-07-23 17:52:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
89%|βββββββββ | 430/483 [08:29<00:47, 1.11it/s]
89%|βββββββββ | 431/483 [08:29<00:39, 1.31it/s]
[2024-07-23 17:52:29] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00004-of-00030.safetensors |
|
89%|βββββββββ | 431/483 [08:29<00:39, 1.31it/s]
[2024-07-23 17:52:30] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors |
|
89%|βββββββββ | 431/483 [08:29<00:39, 1.31it/s]
[2024-07-23 17:52:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
89%|βββββββββ | 431/483 [08:30<00:39, 1.31it/s]
89%|βββββββββ | 432/483 [08:30<00:48, 1.06it/s]
[2024-07-23 17:52:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
89%|βββββββββ | 432/483 [08:31<00:48, 1.06it/s]
90%|βββββββββ | 433/483 [08:32<00:54, 1.09s/it]
[2024-07-23 17:52:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
90%|βββββββββ | 433/483 [08:32<00:54, 1.09s/it]
[2024-07-23 17:52:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
90%|βββββββββ | 433/483 [08:32<00:54, 1.09s/it]
[2024-07-23 17:52:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
90%|βββββββββ | 433/483 [08:32<00:54, 1.09s/it]
90%|βββββββββ | 436/483 [08:33<00:35, 1.32it/s]
[2024-07-23 17:52:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
90%|βββββββββ | 436/483 [08:34<00:35, 1.32it/s]
90%|βββββββββ | 437/483 [08:36<00:57, 1.26s/it]
[2024-07-23 17:52:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
90%|βββββββββ | 437/483 [08:36<00:57, 1.26s/it]
91%|βββββββββ | 438/483 [08:36<00:44, 1.01it/s]
[2024-07-23 17:52:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
91%|βββββββββ | 438/483 [08:37<00:44, 1.01it/s]
91%|βββββββββ | 439/483 [08:37<00:38, 1.14it/s]
[2024-07-23 17:52:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
91%|βββββββββ | 439/483 [08:37<00:38, 1.14it/s]
91%|βββββββββ | 440/483 [08:37<00:32, 1.33it/s]
[2024-07-23 17:52:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
91%|βββββββββ | 440/483 [08:37<00:32, 1.33it/s]
[2024-07-23 17:52:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
91%|βββββββββ | 440/483 [08:38<00:32, 1.33it/s]
92%|ββββββββββ| 442/483 [08:39<00:30, 1.35it/s]
[2024-07-23 17:52:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
92%|ββββββββββ| 442/483 [08:40<00:30, 1.35it/s]
92%|ββββββββββ| 443/483 [08:42<00:52, 1.31s/it]
[2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
92%|ββββββββββ| 443/483 [08:42<00:52, 1.31s/it]
92%|ββββββββββ| 444/483 [08:42<00:39, 1.01s/it]
[2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
92%|ββββββββββ| 444/483 [08:42<00:39, 1.01s/it]
92%|ββββββββββ| 445/483 [08:43<00:33, 1.13it/s]
[2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
92%|ββββββββββ| 445/483 [08:43<00:33, 1.13it/s]
92%|ββββββββββ| 446/483 [08:43<00:27, 1.33it/s]
[2024-07-23 17:52:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
92%|ββββββββββ| 446/483 [08:43<00:27, 1.33it/s]
93%|ββββββββββ| 447/483 [08:43<00:24, 1.46it/s]
[2024-07-23 17:52:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
93%|ββββββββββ| 447/483 [08:44<00:24, 1.46it/s]
93%|ββββββββββ| 448/483 [08:44<00:21, 1.65it/s]
[2024-07-23 17:52:45] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors |
|
93%|ββββββββββ| 448/483 [08:44<00:21, 1.65it/s]
[2024-07-23 17:52:45] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00028-of-00030.safetensors |
|
93%|ββββββββββ| 448/483 [08:44<00:21, 1.65it/s]
[2024-07-23 17:52:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
93%|ββββββββββ| 448/483 [08:47<00:21, 1.65it/s]
93%|ββββββββββ| 449/483 [08:47<00:43, 1.28s/it]
[2024-07-23 17:52:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
93%|ββββββββββ| 449/483 [08:47<00:43, 1.28s/it]
93%|ββββββββββ| 450/483 [08:48<00:43, 1.33s/it]
[2024-07-23 17:52:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
93%|ββββββββββ| 450/483 [08:50<00:43, 1.33s/it]
93%|ββββββββββ| 451/483 [08:51<01:00, 1.89s/it]
[2024-07-23 17:52:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
93%|ββββββββββ| 451/483 [08:52<01:00, 1.89s/it]
94%|ββββββββββ| 452/483 [08:52<00:42, 1.36s/it]
[2024-07-23 17:52:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
94%|ββββββββββ| 452/483 [08:52<00:42, 1.36s/it]
[2024-07-23 17:52:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
94%|ββββββββββ| 452/483 [08:52<00:42, 1.36s/it]
94%|ββββββββββ| 454/483 [08:53<00:30, 1.06s/it]
[2024-07-23 17:52:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
94%|ββββββββββ| 454/483 [08:54<00:30, 1.06s/it]
94%|ββββββββββ| 455/483 [08:56<00:44, 1.58s/it]
[2024-07-23 17:52:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
94%|ββββββββββ| 455/483 [08:56<00:44, 1.58s/it]
94%|ββββββββββ| 456/483 [08:56<00:32, 1.20s/it]
[2024-07-23 17:52:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
94%|ββββββββββ| 456/483 [08:57<00:32, 1.20s/it]
95%|ββββββββββ| 457/483 [08:57<00:26, 1.02s/it]
[2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
95%|ββββββββββ| 457/483 [08:57<00:26, 1.02s/it]
95%|ββββββββββ| 458/483 [08:57<00:21, 1.17it/s]
[2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
95%|ββββββββββ| 458/483 [08:57<00:21, 1.17it/s]
[2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
95%|ββββββββββ| 458/483 [08:58<00:21, 1.17it/s]
95%|ββββββββββ| 460/483 [08:59<00:18, 1.26it/s]
[2024-07-23 17:53:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
95%|ββββββββββ| 460/483 [09:00<00:18, 1.26it/s]
95%|ββββββββββ| 461/483 [09:02<00:30, 1.37s/it]
[2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
95%|ββββββββββ| 461/483 [09:02<00:30, 1.37s/it]
96%|ββββββββββ| 462/483 [09:02<00:21, 1.05s/it]
[2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
96%|ββββββββββ| 462/483 [09:02<00:21, 1.05s/it]
96%|ββββββββββ| 463/483 [09:03<00:18, 1.10it/s]
[2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
96%|ββββββββββ| 463/483 [09:03<00:18, 1.10it/s]
96%|ββββββββββ| 464/483 [09:03<00:14, 1.29it/s]
[2024-07-23 17:53:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
96%|ββββββββββ| 464/483 [09:03<00:14, 1.29it/s]
96%|ββββββββββ| 465/483 [09:03<00:12, 1.41it/s]
[2024-07-23 17:53:04] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00028-of-00030.safetensors |
|
96%|ββββββββββ| 465/483 [09:03<00:12, 1.41it/s]
[2024-07-23 17:53:04] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00029-of-00030.safetensors |
|
96%|ββββββββββ| 465/483 [09:04<00:12, 1.41it/s]
[2024-07-23 17:53:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
96%|ββββββββββ| 465/483 [09:06<00:12, 1.41it/s]
96%|ββββββββββ| 466/483 [09:06<00:23, 1.36s/it]
[2024-07-23 17:53:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
96%|ββββββββββ| 466/483 [09:07<00:23, 1.36s/it]
97%|ββββββββββ| 467/483 [09:08<00:22, 1.39s/it]
[2024-07-23 17:53:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
97%|ββββββββββ| 467/483 [09:09<00:22, 1.39s/it]
97%|ββββββββββ| 468/483 [09:11<00:28, 1.92s/it]
[2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
97%|ββββββββββ| 468/483 [09:11<00:28, 1.92s/it]
97%|ββββββββββ| 469/483 [09:11<00:19, 1.38s/it]
[2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
97%|ββββββββββ| 469/483 [09:11<00:19, 1.38s/it]
97%|ββββββββββ| 470/483 [09:12<00:14, 1.09s/it]
[2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
97%|ββββββββββ| 470/483 [09:12<00:14, 1.09s/it]
[2024-07-23 17:53:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
97%|ββββββββββ| 470/483 [09:12<00:14, 1.09s/it]
98%|ββββββββββ| 472/483 [09:13<00:10, 1.09it/s]
[2024-07-23 17:53:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
98%|ββββββββββ| 472/483 [09:14<00:10, 1.09it/s]
98%|ββββββββββ| 473/483 [09:16<00:14, 1.48s/it]
[2024-07-23 17:53:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
98%|ββββββββββ| 473/483 [09:16<00:14, 1.48s/it]
98%|ββββββββββ| 474/483 [09:16<00:10, 1.12s/it]
[2024-07-23 17:53:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
98%|ββββββββββ| 474/483 [09:17<00:10, 1.12s/it]
98%|ββββββββββ| 475/483 [09:17<00:07, 1.04it/s]
[2024-07-23 17:53:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
98%|ββββββββββ| 475/483 [09:17<00:07, 1.04it/s]
99%|ββββββββββ| 476/483 [09:17<00:05, 1.25it/s]
[2024-07-23 17:53:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.input_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
99%|ββββββββββ| 476/483 [09:17<00:05, 1.25it/s]
[2024-07-23 17:53:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16 |
|
99%|ββββββββββ| 476/483 [09:18<00:05, 1.25it/s]
99%|ββββββββββ| 478/483 [09:19<00:03, 1.30it/s]
[2024-07-23 17:53:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16 |
|
99%|ββββββββββ| 478/483 [09:20<00:03, 1.30it/s]
99%|ββββββββββ| 479/483 [09:22<00:05, 1.35s/it]
[2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 |
|
99%|ββββββββββ| 479/483 [09:22<00:05, 1.35s/it]
99%|ββββββββββ| 480/483 [09:22<00:03, 1.03s/it]
[2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16 |
|
99%|ββββββββββ| 480/483 [09:22<00:03, 1.03s/it]
100%|ββββββββββ| 481/483 [09:23<00:01, 1.11it/s]
[2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16 |
|
100%|ββββββββββ| 481/483 [09:23<00:01, 1.11it/s]
100%|ββββββββββ| 482/483 [09:23<00:00, 1.32it/s]
[2024-07-23 17:53:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.norm.weight[0m", shape: (8192,), dtype: float16 |
|
100%|ββββββββββ| 482/483 [09:23<00:00, 1.32it/s]
100%|ββββββββββ| 483/483 [09:23<00:00, 1.17s/it] |
|
[2024-07-23 17:53:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00029-of-00030.safetensors |
|
[2024-07-23 17:53:24] INFO stats.py:77: [92mTime usage[0m: HF loading: 82.243 sec; Pre-quantization mapping: 178.396 sec; Quantization: 0.000 sec |
|
[2024-07-23 17:53:24] INFO stats.py:91: [92mRAM usage[0m: Peak RAM: 17.375 GB. Total bytes loaded from disk: 271.521 GB |
|
[2024-07-23 17:53:24] INFO convert_weight.py:155: [92mParameter size[0m after quantization: 131.417 GB |
|
[2024-07-23 17:53:24] INFO convert_weight.py:160: [92mTotal parameters[0m: 72,885,788,672 |
|
[2024-07-23 17:53:24] INFO convert_weight.py:161: [92mBits per parameter[0m: 15.488 |
|
[2024-07-23 17:53:24] INFO convert_weight.py:166: Saved to directory: [1mlocal_dir/Llama-3.1-70B-Instruct-q0f16-MLC[0m |
|
|
|
All finished, 323 total shards committed, record saved to local_dir/Llama-3.1-70B-Instruct-q0f16-MLC/ndarray-cache.json |
|
|