Charlie Ruan
Add converted weights
e75f3ac
raw
history blame
183 kB
/Users/cfruan/miniconda3/envs/mlc-chat-venv/bin/python -m mlc_llm gen_config /Users/Shared/models/Meta-Llama-3.1-70B-Instruct --quantization q0f16 --conv-template llama-3_1 --output local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
[2024-07-23 17:43:51] INFO auto_config.py:116: Found model configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json
[2024-07-23 17:43:51] INFO auto_config.py:154: Found model type: llama. Use `--model-type` to override.
[2024-07-23 17:43:51] INFO llama_model.py:62: context_window_size not found in config.json. Falling back to max_position_embeddings (131072)
[2024-07-23 17:43:51] INFO llama_model.py:82: prefill_chunk_size defaults to 2048
[2024-07-23 17:43:51] INFO config.py:107: Overriding max_batch_size from 1 to 80
[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting bos_token_id: 128000
[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting eos_token_id: [128001, 128008, 128009]
[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting temperature: 0.6
[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting top_p: 0.9
[2024-07-23 17:43:51] INFO gen_config.py:158: Not found tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer.model
[2024-07-23 17:43:51] INFO gen_config.py:156: Found tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer.json. Copying to local_dir/Llama-3.1-70B-Instruct-q0f16-MLC/tokenizer.json
[2024-07-23 17:43:51] INFO gen_config.py:158: Not found tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/vocab.json
[2024-07-23 17:43:51] INFO gen_config.py:158: Not found tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/merges.txt
[2024-07-23 17:43:51] INFO gen_config.py:158: Not found tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/added_tokens.json
[2024-07-23 17:43:51] INFO gen_config.py:156: Found tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer_config.json. Copying to local_dir/Llama-3.1-70B-Instruct-q0f16-MLC/tokenizer_config.json
[2024-07-23 17:43:51] INFO gen_config.py:217: Detected tokenizer info: {'token_postproc_method': 'byte_level', 'prepend_space_in_encode': False, 'strip_space_in_decode': False}
[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting pad_token_id: 0
[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting presence_penalty: 0.0
[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting frequency_penalty: 0.0
[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting repetition_penalty: 1.0
[2024-07-23 17:43:51] INFO gen_config.py:245: Dumping configuration file to: local_dir/Llama-3.1-70B-Instruct-q0f16-MLC/mlc-chat-config.json
/Users/cfruan/miniconda3/envs/mlc-chat-venv/bin/python -m mlc_llm convert_weight /Users/Shared/models/Meta-Llama-3.1-70B-Instruct --quantization q0f16 --output local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
[2024-07-23 17:43:52] INFO auto_config.py:116: Found model configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json
[2024-07-23 17:43:52] INFO auto_device.py:88: Not found device: cuda:0
[2024-07-23 17:43:53] INFO auto_device.py:88: Not found device: rocm:0
[2024-07-23 17:43:54] INFO auto_device.py:79: Found device: metal:0
[2024-07-23 17:43:55] INFO auto_device.py:88: Not found device: vulkan:0
[2024-07-23 17:43:55] INFO auto_device.py:88: Not found device: opencl:0
[2024-07-23 17:43:55] INFO auto_device.py:35: Using device: metal:0
[2024-07-23 17:43:55] INFO auto_weight.py:71: Finding weights in: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct
[2024-07-23 17:43:55] INFO auto_weight.py:137: Not found Huggingface PyTorch
[2024-07-23 17:43:55] INFO auto_weight.py:144: Found source weight format: huggingface-safetensor. Source configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json
[2024-07-23 17:43:55] INFO auto_weight.py:107: Using source weight configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json. Use `--source` to override.
[2024-07-23 17:43:55] INFO auto_weight.py:111: Using source weight format: huggingface-safetensor. Use `--source-format` to override.
[2024-07-23 17:43:55] INFO auto_config.py:154: Found model type: llama. Use `--model-type` to override.
[2024-07-23 17:43:55] INFO llama_model.py:62: context_window_size not found in config.json. Falling back to max_position_embeddings (131072)
[2024-07-23 17:43:55] INFO llama_model.py:82: prefill_chunk_size defaults to 2048
Weight conversion with arguments:
--config /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json
--quantization NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16')
--model-type llama
--device metal:0
--source /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json
--source-format huggingface-safetensor
--output local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
Start storing to cache local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
0%| | 0/483 [00:00<?, ?it/s] [2024-07-23 17:44:00] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00030-of-00030.safetensors
0%| | 0/483 [00:00<?, ?it/s] [2024-07-23 17:44:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "lm_head.weight", shape: (128256, 8192), dtype: float16
0%| | 0/483 [00:04<?, ?it/s] 0%| | 1/483 [00:08<1:09:33, 8.66s/it] [2024-07-23 17:44:09] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00030-of-00030.safetensors
0%| | 1/483 [00:08<1:09:33, 8.66s/it] [2024-07-23 17:44:09] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00001-of-00030.safetensors
0%| | 1/483 [00:08<1:09:33, 8.66s/it] [2024-07-23 17:44:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.embed_tokens.weight", shape: (128256, 8192), dtype: float16
0%| | 1/483 [00:15<1:09:33, 8.66s/it] 0%| | 2/483 [00:19<1:20:25, 10.03s/it] [2024-07-23 17:44:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.input_layernorm.weight", shape: (8192,), dtype: float16
0%| | 2/483 [00:19<1:20:25, 10.03s/it] 1%| | 3/483 [00:19<44:16, 5.53s/it] [2024-07-23 17:44:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
1%| | 3/483 [00:20<44:16, 5.53s/it] 1%| | 4/483 [00:21<31:43, 3.97s/it] [2024-07-23 17:44:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
1%| | 4/483 [00:23<31:43, 3.97s/it] 1%| | 5/483 [00:25<30:36, 3.84s/it] [2024-07-23 17:44:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.post_attention_layernorm.weight", shape: (8192,), dtype: float16
1%| | 5/483 [00:25<30:36, 3.84s/it] 1%| | 6/483 [00:25<20:27, 2.57s/it] [2024-07-23 17:44:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
1%| | 6/483 [00:25<20:27, 2.57s/it] 1%|▏ | 7/483 [00:25<15:19, 1.93s/it] [2024-07-23 17:44:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
1%|▏ | 7/483 [00:25<15:19, 1.93s/it] 2%|▏ | 8/483 [00:26<11:29, 1.45s/it] [2024-07-23 17:44:26] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00002-of-00030.safetensors
2%|▏ | 8/483 [00:26<11:29, 1.45s/it] [2024-07-23 17:44:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
2%|▏ | 8/483 [00:32<11:29, 1.45s/it] 2%|▏ | 9/483 [00:34<27:20, 3.46s/it] [2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
2%|▏ | 9/483 [00:34<27:20, 3.46s/it] 2%|▏ | 10/483 [00:34<20:33, 2.61s/it] [2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
2%|▏ | 10/483 [00:34<20:33, 2.61s/it] 2%|▏ | 11/483 [00:35<15:14, 1.94s/it] [2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.input_layernorm.weight", shape: (8192,), dtype: float16
2%|▏ | 11/483 [00:35<15:14, 1.94s/it] [2024-07-23 17:44:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
2%|▏ | 11/483 [00:35<15:14, 1.94s/it] 3%|β–Ž | 13/483 [00:36<10:53, 1.39s/it] [2024-07-23 17:44:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.post_attention_layernorm.weight", shape: (8192,), dtype: float16
3%|β–Ž | 13/483 [00:36<10:53, 1.39s/it] [2024-07-23 17:44:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.input_layernorm.weight", shape: (8192,), dtype: float16
3%|β–Ž | 13/483 [00:36<10:53, 1.39s/it] [2024-07-23 17:44:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
3%|β–Ž | 13/483 [00:37<10:53, 1.39s/it] 3%|β–Ž | 16/483 [00:38<07:23, 1.05it/s] [2024-07-23 17:44:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
3%|β–Ž | 16/483 [00:39<07:23, 1.05it/s] 4%|β–Ž | 17/483 [00:41<11:34, 1.49s/it] [2024-07-23 17:44:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.post_attention_layernorm.weight", shape: (8192,), dtype: float16
4%|β–Ž | 17/483 [00:42<11:34, 1.49s/it] 4%|β–Ž | 18/483 [00:42<09:15, 1.19s/it] [2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
4%|β–Ž | 18/483 [00:42<09:15, 1.19s/it] 4%|▍ | 19/483 [00:42<08:04, 1.04s/it] [2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
4%|▍ | 19/483 [00:42<08:04, 1.04s/it] 4%|▍ | 20/483 [00:43<06:48, 1.13it/s] [2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.input_layernorm.weight", shape: (8192,), dtype: float16
4%|▍ | 20/483 [00:43<06:48, 1.13it/s] [2024-07-23 17:44:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
4%|▍ | 20/483 [00:43<06:48, 1.13it/s] 5%|▍ | 22/483 [00:44<06:20, 1.21it/s] [2024-07-23 17:44:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
5%|▍ | 22/483 [00:46<06:20, 1.21it/s] 5%|▍ | 23/483 [00:48<11:03, 1.44s/it] [2024-07-23 17:44:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.post_attention_layernorm.weight", shape: (8192,), dtype: float16
5%|▍ | 23/483 [00:48<11:03, 1.44s/it] 5%|▍ | 24/483 [00:48<08:30, 1.11s/it] [2024-07-23 17:44:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
5%|▍ | 24/483 [00:48<08:30, 1.11s/it] 5%|β–Œ | 25/483 [00:48<07:23, 1.03it/s] [2024-07-23 17:44:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
5%|β–Œ | 25/483 [00:48<07:23, 1.03it/s] 5%|β–Œ | 26/483 [00:49<06:12, 1.23it/s] [2024-07-23 17:44:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
5%|β–Œ | 26/483 [00:49<06:12, 1.23it/s] 6%|β–Œ | 27/483 [00:49<05:38, 1.35it/s] [2024-07-23 17:44:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
6%|β–Œ | 27/483 [00:49<05:38, 1.35it/s] 6%|β–Œ | 28/483 [00:50<04:56, 1.54it/s] [2024-07-23 17:44:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00001-of-00030.safetensors
6%|β–Œ | 28/483 [00:50<04:56, 1.54it/s] [2024-07-23 17:44:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00002-of-00030.safetensors
6%|β–Œ | 28/483 [00:50<04:56, 1.54it/s] [2024-07-23 17:44:51] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00005-of-00030.safetensors
6%|β–Œ | 28/483 [00:50<04:56, 1.54it/s] [2024-07-23 17:44:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.input_layernorm.weight", shape: (8192,), dtype: float16
6%|β–Œ | 28/483 [00:52<04:56, 1.54it/s] 6%|β–Œ | 29/483 [00:52<08:34, 1.13s/it] [2024-07-23 17:44:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
6%|β–Œ | 29/483 [00:53<08:34, 1.13s/it] 6%|β–Œ | 30/483 [00:53<09:23, 1.24s/it] [2024-07-23 17:44:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
6%|β–Œ | 30/483 [00:55<09:23, 1.24s/it] 6%|β–‹ | 31/483 [00:57<14:28, 1.92s/it] [2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.post_attention_layernorm.weight", shape: (8192,), dtype: float16
6%|β–‹ | 31/483 [00:57<14:28, 1.92s/it] 7%|β–‹ | 32/483 [00:57<10:24, 1.38s/it] [2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
7%|β–‹ | 32/483 [00:57<10:24, 1.38s/it] 7%|β–‹ | 33/483 [00:58<08:31, 1.14s/it] [2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
7%|β–‹ | 33/483 [00:58<08:31, 1.14s/it] 7%|β–‹ | 34/483 [00:58<06:51, 1.09it/s] [2024-07-23 17:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.input_layernorm.weight", shape: (8192,), dtype: float16
7%|β–‹ | 34/483 [00:58<06:51, 1.09it/s] [2024-07-23 17:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
7%|β–‹ | 34/483 [00:59<06:51, 1.09it/s] 7%|β–‹ | 36/483 [01:00<06:15, 1.19it/s] [2024-07-23 17:45:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
7%|β–‹ | 36/483 [01:01<06:15, 1.19it/s] 8%|β–Š | 37/483 [01:03<10:53, 1.46s/it] [2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.post_attention_layernorm.weight", shape: (8192,), dtype: float16
8%|β–Š | 37/483 [01:03<10:53, 1.46s/it] 8%|β–Š | 38/483 [01:03<08:14, 1.11s/it] [2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
8%|β–Š | 38/483 [01:03<08:14, 1.11s/it] 8%|β–Š | 39/483 [01:04<07:03, 1.05it/s] [2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
8%|β–Š | 39/483 [01:04<07:03, 1.05it/s] 8%|β–Š | 40/483 [01:04<05:53, 1.25it/s] [2024-07-23 17:45:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
8%|β–Š | 40/483 [01:05<05:53, 1.25it/s] 8%|β–Š | 41/483 [01:07<11:09, 1.52s/it] [2024-07-23 17:45:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
8%|β–Š | 41/483 [01:08<11:09, 1.52s/it] 9%|β–Š | 42/483 [01:08<09:18, 1.27s/it] [2024-07-23 17:45:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
9%|β–Š | 42/483 [01:08<09:18, 1.27s/it] 9%|β–‰ | 43/483 [01:08<07:26, 1.01s/it] [2024-07-23 17:45:09] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00005-of-00030.safetensors
9%|β–‰ | 43/483 [01:08<07:26, 1.01s/it] [2024-07-23 17:45:09] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00006-of-00030.safetensors
9%|β–‰ | 43/483 [01:08<07:26, 1.01s/it] [2024-07-23 17:45:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.input_layernorm.weight", shape: (8192,), dtype: float16
9%|β–‰ | 43/483 [01:11<07:26, 1.01s/it] 9%|β–‰ | 44/483 [01:11<10:20, 1.41s/it] [2024-07-23 17:45:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
9%|β–‰ | 44/483 [01:11<10:20, 1.41s/it] 9%|β–‰ | 45/483 [01:12<10:31, 1.44s/it] [2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.post_attention_layernorm.weight", shape: (8192,), dtype: float16
9%|β–‰ | 45/483 [01:12<10:31, 1.44s/it] [2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.input_layernorm.weight", shape: (8192,), dtype: float16
9%|β–‰ | 45/483 [01:12<10:31, 1.44s/it] [2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
9%|β–‰ | 45/483 [01:13<10:31, 1.44s/it] 10%|β–‰ | 48/483 [01:14<06:40, 1.09it/s] [2024-07-23 17:45:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
10%|β–‰ | 48/483 [01:15<06:40, 1.09it/s] 10%|β–ˆ | 49/483 [01:17<10:36, 1.47s/it] [2024-07-23 17:45:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.post_attention_layernorm.weight", shape: (8192,), dtype: float16
10%|β–ˆ | 49/483 [01:17<10:36, 1.47s/it] 10%|β–ˆ | 50/483 [01:17<08:19, 1.15s/it] [2024-07-23 17:45:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
10%|β–ˆ | 50/483 [01:18<08:19, 1.15s/it] 11%|β–ˆ | 51/483 [01:18<07:12, 1.00s/it] [2024-07-23 17:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
11%|β–ˆ | 51/483 [01:18<07:12, 1.00s/it] 11%|β–ˆ | 52/483 [01:18<06:03, 1.19it/s] [2024-07-23 17:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.input_layernorm.weight", shape: (8192,), dtype: float16
11%|β–ˆ | 52/483 [01:18<06:03, 1.19it/s] [2024-07-23 17:45:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
11%|β–ˆ | 52/483 [01:19<06:03, 1.19it/s] 11%|β–ˆ | 54/483 [01:20<05:43, 1.25it/s] [2024-07-23 17:45:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
11%|β–ˆ | 54/483 [01:21<05:43, 1.25it/s] 11%|β–ˆβ– | 55/483 [01:23<10:08, 1.42s/it] [2024-07-23 17:45:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.post_attention_layernorm.weight", shape: (8192,), dtype: float16
11%|β–ˆβ– | 55/483 [01:23<10:08, 1.42s/it] 12%|β–ˆβ– | 56/483 [01:23<07:45, 1.09s/it] [2024-07-23 17:45:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
12%|β–ˆβ– | 56/483 [01:24<07:45, 1.09s/it] 12%|β–ˆβ– | 57/483 [01:24<06:41, 1.06it/s] [2024-07-23 17:45:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
12%|β–ˆβ– | 57/483 [01:24<06:41, 1.06it/s] 12%|β–ˆβ– | 58/483 [01:24<05:37, 1.26it/s] [2024-07-23 17:45:25] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00007-of-00030.safetensors
12%|β–ˆβ– | 58/483 [01:24<05:37, 1.26it/s] [2024-07-23 17:45:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
12%|β–ˆβ– | 58/483 [01:28<05:37, 1.26it/s] 12%|β–ˆβ– | 59/483 [01:30<14:42, 2.08s/it] [2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
12%|β–ˆβ– | 59/483 [01:30<14:42, 2.08s/it] 12%|β–ˆβ– | 60/483 [01:30<11:49, 1.68s/it] [2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
12%|β–ˆβ– | 60/483 [01:30<11:49, 1.68s/it] 13%|β–ˆβ–Ž | 61/483 [01:31<09:11, 1.31s/it] [2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.input_layernorm.weight", shape: (8192,), dtype: float16
13%|β–ˆβ–Ž | 61/483 [01:31<09:11, 1.31s/it] [2024-07-23 17:45:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
13%|β–ˆβ–Ž | 61/483 [01:31<09:11, 1.31s/it] 13%|β–ˆβ–Ž | 63/483 [01:32<07:24, 1.06s/it] [2024-07-23 17:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.post_attention_layernorm.weight", shape: (8192,), dtype: float16
13%|β–ˆβ–Ž | 63/483 [01:32<07:24, 1.06s/it] [2024-07-23 17:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.input_layernorm.weight", shape: (8192,), dtype: float16
13%|β–ˆβ–Ž | 63/483 [01:32<07:24, 1.06s/it] [2024-07-23 17:45:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
13%|β–ˆβ–Ž | 63/483 [01:33<07:24, 1.06s/it] 14%|β–ˆβ–Ž | 66/483 [01:34<05:30, 1.26it/s] [2024-07-23 17:45:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
14%|β–ˆβ–Ž | 66/483 [01:35<05:30, 1.26it/s] 14%|β–ˆβ– | 67/483 [01:37<09:03, 1.31s/it] [2024-07-23 17:45:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.post_attention_layernorm.weight", shape: (8192,), dtype: float16
14%|β–ˆβ– | 67/483 [01:37<09:03, 1.31s/it] 14%|β–ˆβ– | 68/483 [01:37<07:16, 1.05s/it] [2024-07-23 17:45:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
14%|β–ˆβ– | 68/483 [01:38<07:16, 1.05s/it] 14%|β–ˆβ– | 69/483 [01:38<06:25, 1.08it/s] [2024-07-23 17:45:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
14%|β–ˆβ– | 69/483 [01:38<06:25, 1.08it/s] 14%|β–ˆβ– | 70/483 [01:38<05:29, 1.25it/s] [2024-07-23 17:45:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.input_layernorm.weight", shape: (8192,), dtype: float16
14%|β–ˆβ– | 70/483 [01:38<05:29, 1.25it/s] [2024-07-23 17:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
14%|β–ˆβ– | 70/483 [01:39<05:29, 1.25it/s] 15%|β–ˆβ– | 72/483 [01:40<05:19, 1.29it/s] [2024-07-23 17:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
15%|β–ˆβ– | 72/483 [01:41<05:19, 1.29it/s] 15%|β–ˆβ–Œ | 73/483 [01:43<09:23, 1.37s/it] [2024-07-23 17:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.post_attention_layernorm.weight", shape: (8192,), dtype: float16
15%|β–ˆβ–Œ | 73/483 [01:43<09:23, 1.37s/it] 15%|β–ˆβ–Œ | 74/483 [01:43<07:14, 1.06s/it] [2024-07-23 17:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
15%|β–ˆβ–Œ | 74/483 [01:44<07:14, 1.06s/it] 16%|β–ˆβ–Œ | 75/483 [01:44<06:17, 1.08it/s] [2024-07-23 17:45:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
16%|β–ˆβ–Œ | 75/483 [01:44<06:17, 1.08it/s] 16%|β–ˆβ–Œ | 76/483 [01:44<05:18, 1.28it/s] [2024-07-23 17:45:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
16%|β–ˆβ–Œ | 76/483 [01:44<05:18, 1.28it/s] 16%|β–ˆβ–Œ | 77/483 [01:45<04:49, 1.40it/s] [2024-07-23 17:45:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
16%|β–ˆβ–Œ | 77/483 [01:45<04:49, 1.40it/s] 16%|β–ˆβ–Œ | 78/483 [01:45<04:12, 1.60it/s] [2024-07-23 17:45:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00006-of-00030.safetensors
16%|β–ˆβ–Œ | 78/483 [01:45<04:12, 1.60it/s] [2024-07-23 17:45:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00007-of-00030.safetensors
16%|β–ˆβ–Œ | 78/483 [01:45<04:12, 1.60it/s] [2024-07-23 17:45:46] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00008-of-00030.safetensors
16%|β–ˆβ–Œ | 78/483 [01:45<04:12, 1.60it/s] [2024-07-23 17:45:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.input_layernorm.weight", shape: (8192,), dtype: float16
16%|β–ˆβ–Œ | 78/483 [01:47<04:12, 1.60it/s] 16%|β–ˆβ–‹ | 79/483 [01:47<07:27, 1.11s/it] [2024-07-23 17:45:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
16%|β–ˆβ–‹ | 79/483 [01:48<07:27, 1.11s/it] 17%|β–ˆβ–‹ | 80/483 [01:49<08:13, 1.22s/it] [2024-07-23 17:45:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
17%|β–ˆβ–‹ | 80/483 [01:51<08:13, 1.22s/it] 17%|β–ˆβ–‹ | 81/483 [01:52<12:37, 1.88s/it] [2024-07-23 17:45:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.post_attention_layernorm.weight", shape: (8192,), dtype: float16
17%|β–ˆβ–‹ | 81/483 [01:52<12:37, 1.88s/it] 17%|β–ˆβ–‹ | 82/483 [01:53<09:04, 1.36s/it] [2024-07-23 17:45:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.input_layernorm.weight", shape: (8192,), dtype: float16
17%|β–ˆβ–‹ | 82/483 [01:53<09:04, 1.36s/it] [2024-07-23 17:45:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
17%|β–ˆβ–‹ | 82/483 [01:53<09:04, 1.36s/it] 17%|β–ˆβ–‹ | 84/483 [01:54<07:09, 1.08s/it] [2024-07-23 17:45:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
17%|β–ˆβ–‹ | 84/483 [01:56<07:09, 1.08s/it] 18%|β–ˆβ–Š | 85/483 [01:57<11:01, 1.66s/it] [2024-07-23 17:45:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.post_attention_layernorm.weight", shape: (8192,), dtype: float16
18%|β–ˆβ–Š | 85/483 [01:58<11:01, 1.66s/it] 18%|β–ˆβ–Š | 86/483 [01:58<08:19, 1.26s/it] [2024-07-23 17:45:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
18%|β–ˆβ–Š | 86/483 [01:58<08:19, 1.26s/it] 18%|β–ˆβ–Š | 87/483 [01:58<07:01, 1.06s/it] [2024-07-23 17:45:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
18%|β–ˆβ–Š | 87/483 [01:58<07:01, 1.06s/it] 18%|β–ˆβ–Š | 88/483 [01:59<05:49, 1.13it/s] [2024-07-23 17:45:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.input_layernorm.weight", shape: (8192,), dtype: float16
18%|β–ˆβ–Š | 88/483 [01:59<05:49, 1.13it/s] [2024-07-23 17:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
18%|β–ˆβ–Š | 88/483 [01:59<05:49, 1.13it/s] 19%|β–ˆβ–Š | 90/483 [02:00<05:23, 1.22it/s] [2024-07-23 17:46:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
19%|β–ˆβ–Š | 90/483 [02:02<05:23, 1.22it/s] 19%|β–ˆβ–‰ | 91/483 [02:04<09:31, 1.46s/it] [2024-07-23 17:46:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.post_attention_layernorm.weight", shape: (8192,), dtype: float16
19%|β–ˆβ–‰ | 91/483 [02:04<09:31, 1.46s/it] 19%|β–ˆβ–‰ | 92/483 [02:04<07:14, 1.11s/it] [2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
19%|β–ˆβ–‰ | 92/483 [02:04<07:14, 1.11s/it] 19%|β–ˆβ–‰ | 93/483 [02:04<06:13, 1.04it/s] [2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
19%|β–ˆβ–‰ | 93/483 [02:04<06:13, 1.04it/s] 19%|β–ˆβ–‰ | 94/483 [02:05<05:16, 1.23it/s] [2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
19%|β–ˆβ–‰ | 94/483 [02:05<05:16, 1.23it/s] 20%|β–ˆβ–‰ | 95/483 [02:05<04:45, 1.36it/s] [2024-07-23 17:46:06] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00008-of-00030.safetensors
20%|β–ˆβ–‰ | 95/483 [02:05<04:45, 1.36it/s] [2024-07-23 17:46:06] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00009-of-00030.safetensors
20%|β–ˆβ–‰ | 95/483 [02:05<04:45, 1.36it/s] [2024-07-23 17:46:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.input_layernorm.weight", shape: (8192,), dtype: float16
20%|β–ˆβ–‰ | 95/483 [02:08<04:45, 1.36it/s] 20%|β–ˆβ–‰ | 96/483 [02:08<08:04, 1.25s/it] [2024-07-23 17:46:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
20%|β–ˆβ–‰ | 96/483 [02:08<08:04, 1.25s/it] 20%|β–ˆβ–ˆ | 97/483 [02:09<08:38, 1.34s/it] [2024-07-23 17:46:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
20%|β–ˆβ–ˆ | 97/483 [02:11<08:38, 1.34s/it] 20%|β–ˆβ–ˆ | 98/483 [02:13<13:13, 2.06s/it] [2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.post_attention_layernorm.weight", shape: (8192,), dtype: float16
20%|β–ˆβ–ˆ | 98/483 [02:13<13:13, 2.06s/it] 20%|β–ˆβ–ˆ | 99/483 [02:13<09:29, 1.48s/it] [2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
20%|β–ˆβ–ˆ | 99/483 [02:13<09:29, 1.48s/it] 21%|β–ˆβ–ˆ | 100/483 [02:14<07:24, 1.16s/it] [2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.input_layernorm.weight", shape: (8192,), dtype: float16
21%|β–ˆβ–ˆ | 100/483 [02:14<07:24, 1.16s/it] [2024-07-23 17:46:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
21%|β–ˆβ–ˆ | 100/483 [02:14<07:24, 1.16s/it] 21%|β–ˆβ–ˆ | 102/483 [02:15<06:09, 1.03it/s] [2024-07-23 17:46:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
21%|β–ˆβ–ˆ | 102/483 [02:17<06:09, 1.03it/s] 21%|β–ˆβ–ˆβ– | 103/483 [02:19<10:12, 1.61s/it] [2024-07-23 17:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.post_attention_layernorm.weight", shape: (8192,), dtype: float16
21%|β–ˆβ–ˆβ– | 103/483 [02:19<10:12, 1.61s/it] 22%|β–ˆβ–ˆβ– | 104/483 [02:19<07:42, 1.22s/it] [2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
22%|β–ˆβ–ˆβ– | 104/483 [02:19<07:42, 1.22s/it] 22%|β–ˆβ–ˆβ– | 105/483 [02:19<06:31, 1.03s/it] [2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
22%|β–ˆβ–ˆβ– | 105/483 [02:19<06:31, 1.03s/it] 22%|β–ˆβ–ˆβ– | 106/483 [02:20<05:22, 1.17it/s] [2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.input_layernorm.weight", shape: (8192,), dtype: float16
22%|β–ˆβ–ˆβ– | 106/483 [02:20<05:22, 1.17it/s] [2024-07-23 17:46:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
22%|β–ˆβ–ˆβ– | 106/483 [02:20<05:22, 1.17it/s] 22%|β–ˆβ–ˆβ– | 108/483 [02:21<05:02, 1.24it/s] [2024-07-23 17:46:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
22%|β–ˆβ–ˆβ– | 108/483 [02:23<05:02, 1.24it/s] 23%|β–ˆβ–ˆβ–Ž | 109/483 [02:25<09:08, 1.47s/it] [2024-07-23 17:46:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.post_attention_layernorm.weight", shape: (8192,), dtype: float16
23%|β–ˆβ–ˆβ–Ž | 109/483 [02:25<09:08, 1.47s/it] 23%|β–ˆβ–ˆβ–Ž | 110/483 [02:25<06:57, 1.12s/it] [2024-07-23 17:46:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
23%|β–ˆβ–ˆβ–Ž | 110/483 [02:25<06:57, 1.12s/it] 23%|β–ˆβ–ˆβ–Ž | 111/483 [02:25<05:57, 1.04it/s] [2024-07-23 17:46:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
23%|β–ˆβ–ˆβ–Ž | 111/483 [02:25<05:57, 1.04it/s] 23%|β–ˆβ–ˆβ–Ž | 112/483 [02:26<04:58, 1.24it/s] [2024-07-23 17:46:26] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00009-of-00030.safetensors
23%|β–ˆβ–ˆβ–Ž | 112/483 [02:26<04:58, 1.24it/s] [2024-07-23 17:46:27] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00010-of-00030.safetensors
23%|β–ˆβ–ˆβ–Ž | 112/483 [02:26<04:58, 1.24it/s] [2024-07-23 17:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.input_layernorm.weight", shape: (8192,), dtype: float16
23%|β–ˆβ–ˆβ–Ž | 112/483 [02:28<04:58, 1.24it/s] 23%|β–ˆβ–ˆβ–Ž | 113/483 [02:28<07:42, 1.25s/it] [2024-07-23 17:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
23%|β–ˆβ–ˆβ–Ž | 113/483 [02:29<07:42, 1.25s/it] 24%|β–ˆβ–ˆβ–Ž | 114/483 [02:30<08:11, 1.33s/it] [2024-07-23 17:46:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
24%|β–ˆβ–ˆβ–Ž | 114/483 [02:31<08:11, 1.33s/it] 24%|β–ˆβ–ˆβ– | 115/483 [02:33<12:30, 2.04s/it] [2024-07-23 17:46:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.post_attention_layernorm.weight", shape: (8192,), dtype: float16
24%|β–ˆβ–ˆβ– | 115/483 [02:33<12:30, 2.04s/it] 24%|β–ˆβ–ˆβ– | 116/483 [02:33<09:00, 1.47s/it] [2024-07-23 17:46:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
24%|β–ˆβ–ˆβ– | 116/483 [02:34<09:00, 1.47s/it] 24%|β–ˆβ–ˆβ– | 117/483 [02:34<07:17, 1.19s/it] [2024-07-23 17:46:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
24%|β–ˆβ–ˆβ– | 117/483 [02:34<07:17, 1.19s/it] 24%|β–ˆβ–ˆβ– | 118/483 [02:34<05:49, 1.04it/s] [2024-07-23 17:46:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.input_layernorm.weight", shape: (8192,), dtype: float16
24%|β–ˆβ–ˆβ– | 118/483 [02:34<05:49, 1.04it/s] [2024-07-23 17:46:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
24%|β–ˆβ–ˆβ– | 118/483 [02:35<05:49, 1.04it/s] 25%|β–ˆβ–ˆβ– | 120/483 [02:36<05:13, 1.16it/s] [2024-07-23 17:46:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
25%|β–ˆβ–ˆβ– | 120/483 [02:37<05:13, 1.16it/s] 25%|β–ˆβ–ˆβ–Œ | 121/483 [02:39<09:11, 1.52s/it] [2024-07-23 17:46:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.post_attention_layernorm.weight", shape: (8192,), dtype: float16
25%|β–ˆβ–ˆβ–Œ | 121/483 [02:39<09:11, 1.52s/it] 25%|β–ˆβ–ˆβ–Œ | 122/483 [02:40<06:56, 1.15s/it] [2024-07-23 17:46:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
25%|β–ˆβ–ˆβ–Œ | 122/483 [02:40<06:56, 1.15s/it] 25%|β–ˆβ–ˆβ–Œ | 123/483 [02:40<05:54, 1.02it/s] [2024-07-23 17:46:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
25%|β–ˆβ–ˆβ–Œ | 123/483 [02:40<05:54, 1.02it/s] 26%|β–ˆβ–ˆβ–Œ | 124/483 [02:40<04:54, 1.22it/s] [2024-07-23 17:46:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
26%|β–ˆβ–ˆβ–Œ | 124/483 [02:42<04:54, 1.22it/s] 26%|β–ˆβ–ˆβ–Œ | 125/483 [02:44<09:30, 1.59s/it] [2024-07-23 17:46:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
26%|β–ˆβ–ˆβ–Œ | 125/483 [02:44<09:30, 1.59s/it] 26%|β–ˆβ–ˆβ–Œ | 126/483 [02:45<07:51, 1.32s/it] [2024-07-23 17:46:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
26%|β–ˆβ–ˆβ–Œ | 126/483 [02:45<07:51, 1.32s/it] 26%|β–ˆβ–ˆβ–‹ | 127/483 [02:45<06:14, 1.05s/it] [2024-07-23 17:46:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00010-of-00030.safetensors
26%|β–ˆβ–ˆβ–‹ | 127/483 [02:45<06:14, 1.05s/it] [2024-07-23 17:46:46] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00011-of-00030.safetensors
26%|β–ˆβ–ˆβ–‹ | 127/483 [02:45<06:14, 1.05s/it] [2024-07-23 17:46:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.input_layernorm.weight", shape: (8192,), dtype: float16
26%|β–ˆβ–ˆβ–‹ | 127/483 [02:48<06:14, 1.05s/it] 27%|β–ˆβ–ˆβ–‹ | 128/483 [02:48<08:47, 1.48s/it] [2024-07-23 17:46:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
27%|β–ˆβ–ˆβ–‹ | 128/483 [02:48<08:47, 1.48s/it] 27%|β–ˆβ–ˆβ–‹ | 129/483 [02:49<08:51, 1.50s/it] [2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.post_attention_layernorm.weight", shape: (8192,), dtype: float16
27%|β–ˆβ–ˆβ–‹ | 129/483 [02:49<08:51, 1.50s/it] [2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.input_layernorm.weight", shape: (8192,), dtype: float16
27%|β–ˆβ–ˆβ–‹ | 129/483 [02:49<08:51, 1.50s/it] [2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
27%|β–ˆβ–ˆβ–‹ | 129/483 [02:50<08:51, 1.50s/it] 27%|β–ˆβ–ˆβ–‹ | 132/483 [02:51<05:30, 1.06it/s] [2024-07-23 17:46:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
27%|β–ˆβ–ˆβ–‹ | 132/483 [02:52<05:30, 1.06it/s] 28%|β–ˆβ–ˆβ–Š | 133/483 [02:54<08:53, 1.53s/it] [2024-07-23 17:46:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.post_attention_layernorm.weight", shape: (8192,), dtype: float16
28%|β–ˆβ–ˆβ–Š | 133/483 [02:54<08:53, 1.53s/it] 28%|β–ˆβ–ˆβ–Š | 134/483 [02:54<06:57, 1.20s/it] [2024-07-23 17:46:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
28%|β–ˆβ–ˆβ–Š | 134/483 [02:55<06:57, 1.20s/it] 28%|β–ˆβ–ˆβ–Š | 135/483 [02:55<05:59, 1.03s/it] [2024-07-23 17:46:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
28%|β–ˆβ–ˆβ–Š | 135/483 [02:55<05:59, 1.03s/it] 28%|β–ˆβ–ˆβ–Š | 136/483 [02:55<05:00, 1.15it/s] [2024-07-23 17:46:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.28.input_layernorm.weight", shape: (8192,), dtype: float16
28%|β–ˆβ–ˆβ–Š | 136/483 [02:55<05:00, 1.15it/s] [2024-07-23 17:46:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.28.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
28%|β–ˆβ–ˆβ–Š | 136/483 [02:56<05:00, 1.15it/s] 29%|β–ˆβ–ˆβ–Š | 138/483 [02:57<04:41, 1.23it/s] [2024-07-23 17:46:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.28.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
29%|β–ˆβ–ˆβ–Š | 138/483 [02:58<04:41, 1.23it/s] 29%|β–ˆβ–ˆβ–‰ | 139/483 [03:00<08:12, 1.43s/it] [2024-07-23 17:47:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.28.post_attention_layernorm.weight", shape: (8192,), dtype: float16
29%|β–ˆβ–ˆβ–‰ | 139/483 [03:00<08:12, 1.43s/it] 29%|β–ˆβ–ˆβ–‰ | 140/483 [03:00<06:17, 1.10s/it] [2024-07-23 17:47:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.28.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
29%|β–ˆβ–ˆβ–‰ | 140/483 [03:01<06:17, 1.10s/it] 29%|β–ˆβ–ˆβ–‰ | 141/483 [03:01<05:25, 1.05it/s] [2024-07-23 17:47:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.28.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
29%|β–ˆβ–ˆβ–‰ | 141/483 [03:01<05:25, 1.05it/s] 29%|β–ˆβ–ˆβ–‰ | 142/483 [03:01<04:32, 1.25it/s] [2024-07-23 17:47:02] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00012-of-00030.safetensors
29%|β–ˆβ–ˆβ–‰ | 142/483 [03:01<04:32, 1.25it/s] [2024-07-23 17:47:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.29.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
29%|β–ˆβ–ˆβ–‰ | 142/483 [03:07<04:32, 1.25it/s] 30%|β–ˆβ–ˆβ–‰ | 143/483 [03:08<14:34, 2.57s/it] [2024-07-23 17:47:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.29.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
30%|β–ˆβ–ˆβ–‰ | 143/483 [03:09<14:34, 2.57s/it] 30%|β–ˆβ–ˆβ–‰ | 144/483 [03:09<11:23, 2.02s/it] [2024-07-23 17:47:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.29.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
30%|β–ˆβ–ˆβ–‰ | 144/483 [03:09<11:23, 2.02s/it] 30%|β–ˆβ–ˆβ–ˆ | 145/483 [03:09<08:42, 1.55s/it] [2024-07-23 17:47:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.29.input_layernorm.weight", shape: (8192,), dtype: float16
30%|β–ˆβ–ˆβ–ˆ | 145/483 [03:09<08:42, 1.55s/it] [2024-07-23 17:47:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.29.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
30%|β–ˆβ–ˆβ–ˆ | 145/483 [03:10<08:42, 1.55s/it] 30%|β–ˆβ–ˆβ–ˆ | 147/483 [03:11<06:33, 1.17s/it] [2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.29.post_attention_layernorm.weight", shape: (8192,), dtype: float16
30%|β–ˆβ–ˆβ–ˆ | 147/483 [03:11<06:33, 1.17s/it] [2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.30.input_layernorm.weight", shape: (8192,), dtype: float16
30%|β–ˆβ–ˆβ–ˆ | 147/483 [03:11<06:33, 1.17s/it] [2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.30.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
30%|β–ˆβ–ˆβ–ˆ | 147/483 [03:11<06:33, 1.17s/it] 31%|β–ˆβ–ˆβ–ˆ | 150/483 [03:12<04:37, 1.20it/s] [2024-07-23 17:47:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.30.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
31%|β–ˆβ–ˆβ–ˆ | 150/483 [03:14<04:37, 1.20it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 151/483 [03:16<07:05, 1.28s/it] [2024-07-23 17:47:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.30.post_attention_layernorm.weight", shape: (8192,), dtype: float16
31%|β–ˆβ–ˆβ–ˆβ– | 151/483 [03:16<07:05, 1.28s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 152/483 [03:16<05:41, 1.03s/it] [2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.30.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
31%|β–ˆβ–ˆβ–ˆβ– | 152/483 [03:16<05:41, 1.03s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 153/483 [03:16<05:01, 1.09it/s] [2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.30.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
32%|β–ˆβ–ˆβ–ˆβ– | 153/483 [03:16<05:01, 1.09it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 154/483 [03:17<04:18, 1.27it/s] [2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.31.input_layernorm.weight", shape: (8192,), dtype: float16
32%|β–ˆβ–ˆβ–ˆβ– | 154/483 [03:17<04:18, 1.27it/s] [2024-07-23 17:47:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.31.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
32%|β–ˆβ–ˆβ–ˆβ– | 154/483 [03:17<04:18, 1.27it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 156/483 [03:18<04:08, 1.32it/s] [2024-07-23 17:47:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.31.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
32%|β–ˆβ–ˆβ–ˆβ– | 156/483 [03:19<04:08, 1.32it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 157/483 [03:21<07:08, 1.31s/it] [2024-07-23 17:47:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.31.post_attention_layernorm.weight", shape: (8192,), dtype: float16
33%|β–ˆβ–ˆβ–ˆβ–Ž | 157/483 [03:21<07:08, 1.31s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 158/483 [03:21<05:29, 1.02s/it] [2024-07-23 17:47:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.31.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
33%|β–ˆβ–ˆβ–ˆβ–Ž | 158/483 [03:22<05:29, 1.02s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 159/483 [03:22<04:48, 1.12it/s] [2024-07-23 17:47:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.31.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
33%|β–ˆβ–ˆβ–ˆβ–Ž | 159/483 [03:22<04:48, 1.12it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 160/483 [03:22<04:04, 1.32it/s] [2024-07-23 17:47:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.32.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
33%|β–ˆβ–ˆβ–ˆβ–Ž | 160/483 [03:22<04:04, 1.32it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 161/483 [03:23<03:42, 1.45it/s] [2024-07-23 17:47:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.32.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
33%|β–ˆβ–ˆβ–ˆβ–Ž | 161/483 [03:23<03:42, 1.45it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 162/483 [03:23<03:15, 1.65it/s] [2024-07-23 17:47:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00012-of-00030.safetensors
34%|β–ˆβ–ˆβ–ˆβ–Ž | 162/483 [03:23<03:15, 1.65it/s] [2024-07-23 17:47:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00011-of-00030.safetensors
34%|β–ˆβ–ˆβ–ˆβ–Ž | 162/483 [03:23<03:15, 1.65it/s] [2024-07-23 17:47:24] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00013-of-00030.safetensors
34%|β–ˆβ–ˆβ–ˆβ–Ž | 162/483 [03:23<03:15, 1.65it/s] [2024-07-23 17:47:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.32.input_layernorm.weight", shape: (8192,), dtype: float16
34%|β–ˆβ–ˆβ–ˆβ–Ž | 162/483 [03:26<03:15, 1.65it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 163/483 [03:26<06:03, 1.14s/it] [2024-07-23 17:47:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.32.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
34%|β–ˆβ–ˆβ–ˆβ–Ž | 163/483 [03:26<06:03, 1.14s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 164/483 [03:27<06:30, 1.23s/it] [2024-07-23 17:47:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.32.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
34%|β–ˆβ–ˆβ–ˆβ– | 164/483 [03:28<06:30, 1.23s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 165/483 [03:30<09:36, 1.81s/it] [2024-07-23 17:47:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.32.post_attention_layernorm.weight", shape: (8192,), dtype: float16
34%|β–ˆβ–ˆβ–ˆβ– | 165/483 [03:30<09:36, 1.81s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 166/483 [03:30<06:54, 1.31s/it] [2024-07-23 17:47:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.33.input_layernorm.weight", shape: (8192,), dtype: float16
34%|β–ˆβ–ˆβ–ˆβ– | 166/483 [03:30<06:54, 1.31s/it] [2024-07-23 17:47:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.33.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
34%|β–ˆβ–ˆβ–ˆβ– | 166/483 [03:31<06:54, 1.31s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 168/483 [03:32<05:26, 1.04s/it] [2024-07-23 17:47:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.33.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
35%|β–ˆβ–ˆβ–ˆβ– | 168/483 [03:33<05:26, 1.04s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 169/483 [03:35<08:14, 1.57s/it] [2024-07-23 17:47:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.33.post_attention_layernorm.weight", shape: (8192,), dtype: float16
35%|β–ˆβ–ˆβ–ˆβ– | 169/483 [03:35<08:14, 1.57s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 170/483 [03:35<06:12, 1.19s/it] [2024-07-23 17:47:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.33.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
35%|β–ˆβ–ˆβ–ˆβ–Œ | 170/483 [03:35<06:12, 1.19s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 171/483 [03:36<05:15, 1.01s/it] [2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.33.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
35%|β–ˆβ–ˆβ–ˆβ–Œ | 171/483 [03:36<05:15, 1.01s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 172/483 [03:36<04:23, 1.18it/s] [2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.34.input_layernorm.weight", shape: (8192,), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–Œ | 172/483 [03:36<04:23, 1.18it/s] [2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.34.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–Œ | 172/483 [03:37<04:23, 1.18it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 174/483 [03:38<04:04, 1.27it/s] [2024-07-23 17:47:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.34.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–Œ | 174/483 [03:39<04:04, 1.27it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 175/483 [03:41<07:03, 1.38s/it] [2024-07-23 17:47:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.34.post_attention_layernorm.weight", shape: (8192,), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–Œ | 175/483 [03:41<07:03, 1.38s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 176/483 [03:41<05:22, 1.05s/it] [2024-07-23 17:47:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.34.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–‹ | 176/483 [03:41<05:22, 1.05s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 177/483 [03:41<04:39, 1.10it/s] [2024-07-23 17:47:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.34.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
37%|β–ˆβ–ˆβ–ˆβ–‹ | 177/483 [03:42<04:39, 1.10it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 178/483 [03:42<03:57, 1.29it/s] [2024-07-23 17:47:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.35.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
37%|β–ˆβ–ˆβ–ˆβ–‹ | 178/483 [03:42<03:57, 1.29it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 179/483 [03:42<03:36, 1.41it/s] [2024-07-23 17:47:43] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00013-of-00030.safetensors
37%|β–ˆβ–ˆβ–ˆβ–‹ | 179/483 [03:42<03:36, 1.41it/s] [2024-07-23 17:47:43] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00014-of-00030.safetensors
37%|β–ˆβ–ˆβ–ˆβ–‹ | 179/483 [03:43<03:36, 1.41it/s] [2024-07-23 17:47:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.35.input_layernorm.weight", shape: (8192,), dtype: float16
37%|β–ˆβ–ˆβ–ˆβ–‹ | 179/483 [03:45<03:36, 1.41it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 180/483 [03:45<06:41, 1.32s/it] [2024-07-23 17:47:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.35.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
37%|β–ˆβ–ˆβ–ˆβ–‹ | 180/483 [03:46<06:41, 1.32s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 181/483 [03:47<06:51, 1.36s/it] [2024-07-23 17:47:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.35.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
37%|β–ˆβ–ˆβ–ˆβ–‹ | 181/483 [03:48<06:51, 1.36s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 182/483 [03:50<09:37, 1.92s/it] [2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.35.post_attention_layernorm.weight", shape: (8192,), dtype: float16
38%|β–ˆβ–ˆβ–ˆβ–Š | 182/483 [03:50<09:37, 1.92s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 183/483 [03:50<06:54, 1.38s/it] [2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.35.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
38%|β–ˆβ–ˆβ–ˆβ–Š | 183/483 [03:50<06:54, 1.38s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 184/483 [03:50<05:25, 1.09s/it] [2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.36.input_layernorm.weight", shape: (8192,), dtype: float16
38%|β–ˆβ–ˆβ–ˆβ–Š | 184/483 [03:50<05:25, 1.09s/it] [2024-07-23 17:47:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.36.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
38%|β–ˆβ–ˆβ–ˆβ–Š | 184/483 [03:51<05:25, 1.09s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 186/483 [03:52<04:33, 1.09it/s] [2024-07-23 17:47:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.36.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
39%|β–ˆβ–ˆβ–ˆβ–Š | 186/483 [03:53<04:33, 1.09it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 187/483 [03:55<07:18, 1.48s/it] [2024-07-23 17:47:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.36.post_attention_layernorm.weight", shape: (8192,), dtype: float16
39%|β–ˆβ–ˆβ–ˆβ–Š | 187/483 [03:55<07:18, 1.48s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 188/483 [03:55<05:31, 1.12s/it] [2024-07-23 17:47:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.36.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
39%|β–ˆβ–ˆβ–ˆβ–‰ | 188/483 [03:55<05:31, 1.12s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 189/483 [03:56<04:43, 1.04it/s] [2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.36.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
39%|β–ˆβ–ˆβ–ˆβ–‰ | 189/483 [03:56<04:43, 1.04it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 190/483 [03:56<03:55, 1.24it/s] [2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.37.input_layernorm.weight", shape: (8192,), dtype: float16
39%|β–ˆβ–ˆβ–ˆβ–‰ | 190/483 [03:56<03:55, 1.24it/s] [2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.37.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
39%|β–ˆβ–ˆβ–ˆβ–‰ | 190/483 [03:57<03:55, 1.24it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 192/483 [03:58<03:43, 1.30it/s] [2024-07-23 17:48:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.37.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
40%|β–ˆβ–ˆβ–ˆβ–‰ | 192/483 [03:59<03:43, 1.30it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 193/483 [04:01<06:31, 1.35s/it] [2024-07-23 17:48:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.37.post_attention_layernorm.weight", shape: (8192,), dtype: float16
40%|β–ˆβ–ˆβ–ˆβ–‰ | 193/483 [04:01<06:31, 1.35s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 194/483 [04:01<04:58, 1.03s/it] [2024-07-23 17:48:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.37.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 194/483 [04:01<04:58, 1.03s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 195/483 [04:01<04:18, 1.11it/s] [2024-07-23 17:48:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.37.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 195/483 [04:02<04:18, 1.11it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 196/483 [04:02<03:37, 1.32it/s] [2024-07-23 17:48:02] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00014-of-00030.safetensors
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 196/483 [04:02<03:37, 1.32it/s] [2024-07-23 17:48:03] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00015-of-00030.safetensors
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 196/483 [04:02<03:37, 1.32it/s] [2024-07-23 17:48:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.38.input_layernorm.weight", shape: (8192,), dtype: float16
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 196/483 [04:04<03:37, 1.32it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 197/483 [04:04<05:38, 1.18s/it] [2024-07-23 17:48:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.38.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 197/483 [04:05<05:38, 1.18s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 198/483 [04:05<05:58, 1.26s/it] [2024-07-23 17:48:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.38.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 198/483 [04:07<05:58, 1.26s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 199/483 [04:09<08:39, 1.83s/it] [2024-07-23 17:48:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.38.post_attention_layernorm.weight", shape: (8192,), dtype: float16
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 199/483 [04:09<08:39, 1.83s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 200/483 [04:09<06:13, 1.32s/it] [2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.38.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 200/483 [04:09<06:13, 1.32s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 201/483 [04:09<05:06, 1.09s/it] [2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.38.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 201/483 [04:09<05:06, 1.09s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 202/483 [04:10<04:08, 1.13it/s] [2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.39.input_layernorm.weight", shape: (8192,), dtype: float16
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 202/483 [04:10<04:08, 1.13it/s] [2024-07-23 17:48:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.39.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 202/483 [04:10<04:08, 1.13it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 204/483 [04:11<03:45, 1.24it/s] [2024-07-23 17:48:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.39.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 204/483 [04:12<03:45, 1.24it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 205/483 [04:14<06:28, 1.40s/it] [2024-07-23 17:48:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.39.post_attention_layernorm.weight", shape: (8192,), dtype: float16
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 205/483 [04:14<06:28, 1.40s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 206/483 [04:14<04:53, 1.06s/it] [2024-07-23 17:48:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.39.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 206/483 [04:15<04:53, 1.06s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 207/483 [04:15<04:12, 1.09it/s] [2024-07-23 17:48:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.39.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 207/483 [04:15<04:12, 1.09it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 208/483 [04:15<03:31, 1.30it/s] [2024-07-23 17:48:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.40.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 208/483 [04:17<03:31, 1.30it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 209/483 [04:19<06:40, 1.46s/it] [2024-07-23 17:48:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.40.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 209/483 [04:19<06:40, 1.46s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 210/483 [04:19<05:33, 1.22s/it] [2024-07-23 17:48:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.40.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 210/483 [04:19<05:33, 1.22s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 211/483 [04:20<04:26, 1.02it/s] [2024-07-23 17:48:20] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00015-of-00030.safetensors
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 211/483 [04:20<04:26, 1.02it/s] [2024-07-23 17:48:20] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00003-of-00030.safetensors
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 211/483 [04:20<04:26, 1.02it/s] [2024-07-23 17:48:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.input_layernorm.weight", shape: (8192,), dtype: float16
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 211/483 [04:23<04:26, 1.02it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 212/483 [04:23<07:11, 1.59s/it] [2024-07-23 17:48:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 212/483 [04:23<07:11, 1.59s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 213/483 [04:24<06:59, 1.55s/it] [2024-07-23 17:48:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 213/483 [04:25<06:59, 1.55s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 214/483 [04:27<09:09, 2.04s/it] [2024-07-23 17:48:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.post_attention_layernorm.weight", shape: (8192,), dtype: float16
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 214/483 [04:27<09:09, 2.04s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 215/483 [04:27<06:32, 1.47s/it] [2024-07-23 17:48:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.input_layernorm.weight", shape: (8192,), dtype: float16
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 215/483 [04:27<06:32, 1.47s/it] [2024-07-23 17:48:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 215/483 [04:28<06:32, 1.47s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 217/483 [04:29<04:58, 1.12s/it] [2024-07-23 17:48:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 217/483 [04:30<04:58, 1.12s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 218/483 [04:32<07:12, 1.63s/it] [2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.post_attention_layernorm.weight", shape: (8192,), dtype: float16
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 218/483 [04:32<07:12, 1.63s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 219/483 [04:32<05:25, 1.23s/it] [2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 219/483 [04:32<05:25, 1.23s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 220/483 [04:33<04:33, 1.04s/it] [2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 220/483 [04:33<04:33, 1.04s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 221/483 [04:33<03:45, 1.16it/s] [2024-07-23 17:48:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.input_layernorm.weight", shape: (8192,), dtype: float16
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 221/483 [04:33<03:45, 1.16it/s] [2024-07-23 17:48:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 221/483 [04:34<03:45, 1.16it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 223/483 [04:34<03:27, 1.25it/s] [2024-07-23 17:48:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 223/483 [04:36<03:27, 1.25it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 224/483 [04:38<05:55, 1.37s/it] [2024-07-23 17:48:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.post_attention_layernorm.weight", shape: (8192,), dtype: float16
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 224/483 [04:38<05:55, 1.37s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 225/483 [04:38<04:30, 1.05s/it] [2024-07-23 17:48:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 225/483 [04:38<04:30, 1.05s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 226/483 [04:38<03:53, 1.10it/s] [2024-07-23 17:48:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 226/483 [04:38<03:53, 1.10it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 227/483 [04:39<03:16, 1.30it/s] [2024-07-23 17:48:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 227/483 [04:39<03:16, 1.30it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 228/483 [04:39<02:58, 1.43it/s] [2024-07-23 17:48:40] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00003-of-00030.safetensors
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 228/483 [04:39<02:58, 1.43it/s] [2024-07-23 17:48:40] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00016-of-00030.safetensors
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 228/483 [04:39<02:58, 1.43it/s] [2024-07-23 17:48:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.40.input_layernorm.weight", shape: (8192,), dtype: float16
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 228/483 [04:42<02:58, 1.43it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 229/483 [04:42<05:11, 1.23s/it] [2024-07-23 17:48:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.40.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 229/483 [04:42<05:11, 1.23s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 230/483 [04:43<05:28, 1.30s/it] [2024-07-23 17:48:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.40.post_attention_layernorm.weight", shape: (8192,), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 230/483 [04:43<05:28, 1.30s/it] [2024-07-23 17:48:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.41.input_layernorm.weight", shape: (8192,), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 230/483 [04:43<05:28, 1.30s/it] [2024-07-23 17:48:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.41.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 230/483 [04:44<05:28, 1.30s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 233/483 [04:45<03:32, 1.18it/s] [2024-07-23 17:48:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.41.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 233/483 [04:46<03:32, 1.18it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 234/483 [04:48<05:33, 1.34s/it] [2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.41.post_attention_layernorm.weight", shape: (8192,), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 234/483 [04:48<05:33, 1.34s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 235/483 [04:48<04:22, 1.06s/it] [2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.41.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 235/483 [04:48<04:22, 1.06s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 236/483 [04:49<03:49, 1.07it/s] [2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.41.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 236/483 [04:49<03:49, 1.07it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 237/483 [04:49<03:14, 1.27it/s] [2024-07-23 17:48:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.42.input_layernorm.weight", shape: (8192,), dtype: float16
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 237/483 [04:49<03:14, 1.27it/s] [2024-07-23 17:48:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.42.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 237/483 [04:50<03:14, 1.27it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 239/483 [04:50<03:05, 1.31it/s] [2024-07-23 17:48:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.42.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 239/483 [04:52<03:05, 1.31it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 240/483 [04:54<05:24, 1.33s/it] [2024-07-23 17:48:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.42.post_attention_layernorm.weight", shape: (8192,), dtype: float16
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 240/483 [04:54<05:24, 1.33s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 241/483 [04:54<04:08, 1.03s/it] [2024-07-23 17:48:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.42.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 241/483 [04:54<04:08, 1.03s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 242/483 [04:54<03:35, 1.12it/s] [2024-07-23 17:48:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.42.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 242/483 [04:54<03:35, 1.12it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 243/483 [04:55<03:02, 1.32it/s] [2024-07-23 17:48:55] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00017-of-00030.safetensors
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 243/483 [04:55<03:02, 1.32it/s] [2024-07-23 17:49:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.43.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 243/483 [05:00<03:02, 1.32it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 244/483 [05:02<09:59, 2.51s/it] [2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.43.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 244/483 [05:02<09:59, 2.51s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 245/483 [05:02<07:49, 1.97s/it] [2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.43.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 245/483 [05:02<07:49, 1.97s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 246/483 [05:03<05:59, 1.52s/it] [2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.43.input_layernorm.weight", shape: (8192,), dtype: float16
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 246/483 [05:03<05:59, 1.52s/it] [2024-07-23 17:49:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.43.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 246/483 [05:03<05:59, 1.52s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 248/483 [05:04<04:31, 1.15s/it] [2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.43.post_attention_layernorm.weight", shape: (8192,), dtype: float16
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 248/483 [05:04<04:31, 1.15s/it] [2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.44.input_layernorm.weight", shape: (8192,), dtype: float16
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 248/483 [05:04<04:31, 1.15s/it] [2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.44.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 248/483 [05:05<04:31, 1.15s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 251/483 [05:06<03:11, 1.21it/s] [2024-07-23 17:49:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.44.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 251/483 [05:07<03:11, 1.21it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 252/483 [05:09<04:53, 1.27s/it] [2024-07-23 17:49:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.44.post_attention_layernorm.weight", shape: (8192,), dtype: float16
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 252/483 [05:09<04:53, 1.27s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 253/483 [05:09<03:55, 1.02s/it] [2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.44.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 253/483 [05:09<03:55, 1.02s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 254/483 [05:09<03:28, 1.10it/s] [2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.44.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 254/483 [05:10<03:28, 1.10it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 255/483 [05:10<02:58, 1.28it/s] [2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.45.input_layernorm.weight", shape: (8192,), dtype: float16
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 255/483 [05:10<02:58, 1.28it/s] [2024-07-23 17:49:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.45.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 255/483 [05:10<02:58, 1.28it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 257/483 [05:11<02:50, 1.33it/s] [2024-07-23 17:49:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.45.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 257/483 [05:13<02:50, 1.33it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 258/483 [05:14<04:55, 1.31s/it] [2024-07-23 17:49:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.45.post_attention_layernorm.weight", shape: (8192,), dtype: float16
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 258/483 [05:14<04:55, 1.31s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 259/483 [05:14<03:47, 1.01s/it] [2024-07-23 17:49:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.45.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 259/483 [05:15<03:47, 1.01s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 260/483 [05:15<03:17, 1.13it/s] [2024-07-23 17:49:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.45.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 260/483 [05:15<03:17, 1.13it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 261/483 [05:15<02:47, 1.32it/s] [2024-07-23 17:49:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.46.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 261/483 [05:16<02:47, 1.32it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 262/483 [05:16<02:32, 1.45it/s] [2024-07-23 17:49:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.46.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 262/483 [05:16<02:32, 1.45it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 263/483 [05:16<02:13, 1.64it/s] [2024-07-23 17:49:17] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00016-of-00030.safetensors
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 263/483 [05:16<02:13, 1.64it/s] [2024-07-23 17:49:17] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00017-of-00030.safetensors
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 263/483 [05:16<02:13, 1.64it/s] [2024-07-23 17:49:17] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00018-of-00030.safetensors
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 263/483 [05:17<02:13, 1.64it/s] [2024-07-23 17:49:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.46.input_layernorm.weight", shape: (8192,), dtype: float16
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 263/483 [05:19<02:13, 1.64it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 264/483 [05:19<04:12, 1.15s/it] [2024-07-23 17:49:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.46.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 264/483 [05:19<04:12, 1.15s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 265/483 [05:20<04:30, 1.24s/it] [2024-07-23 17:49:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.46.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 265/483 [05:22<04:30, 1.24s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 266/483 [05:23<06:34, 1.82s/it] [2024-07-23 17:49:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.46.post_attention_layernorm.weight", shape: (8192,), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 266/483 [05:24<06:34, 1.82s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 267/483 [05:24<04:42, 1.31s/it] [2024-07-23 17:49:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.47.input_layernorm.weight", shape: (8192,), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 267/483 [05:24<04:42, 1.31s/it] [2024-07-23 17:49:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.47.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 267/483 [05:24<04:42, 1.31s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 269/483 [05:25<03:41, 1.04s/it] [2024-07-23 17:49:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.47.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 269/483 [05:26<03:41, 1.04s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 270/483 [05:28<05:33, 1.57s/it] [2024-07-23 17:49:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.47.post_attention_layernorm.weight", shape: (8192,), dtype: float16
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 270/483 [05:28<05:33, 1.57s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 271/483 [05:28<04:11, 1.19s/it] [2024-07-23 17:49:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.47.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 271/483 [05:29<04:11, 1.19s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 272/483 [05:29<03:33, 1.01s/it] [2024-07-23 17:49:30] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.47.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 272/483 [05:29<03:33, 1.01s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 273/483 [05:29<02:57, 1.18it/s] [2024-07-23 17:49:30] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.48.input_layernorm.weight", shape: (8192,), dtype: float16
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 273/483 [05:29<02:57, 1.18it/s] [2024-07-23 17:49:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.48.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 273/483 [05:30<02:57, 1.18it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 275/483 [05:31<02:44, 1.26it/s] [2024-07-23 17:49:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.48.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 275/483 [05:32<02:44, 1.26it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 276/483 [05:34<04:44, 1.37s/it] [2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.48.post_attention_layernorm.weight", shape: (8192,), dtype: float16
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 276/483 [05:34<04:44, 1.37s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 277/483 [05:34<03:35, 1.05s/it] [2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.48.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 277/483 [05:34<03:35, 1.05s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 278/483 [05:35<03:07, 1.10it/s] [2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.48.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 278/483 [05:35<03:07, 1.10it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 279/483 [05:35<02:38, 1.29it/s] [2024-07-23 17:49:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.49.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 279/483 [05:35<02:38, 1.29it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 280/483 [05:36<02:24, 1.41it/s] [2024-07-23 17:49:36] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00018-of-00030.safetensors
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 280/483 [05:36<02:24, 1.41it/s] [2024-07-23 17:49:36] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00019-of-00030.safetensors
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 280/483 [05:36<02:24, 1.41it/s] [2024-07-23 17:49:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.49.input_layernorm.weight", shape: (8192,), dtype: float16
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 280/483 [05:38<02:24, 1.41it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 281/483 [05:38<04:31, 1.34s/it] [2024-07-23 17:49:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.49.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 281/483 [05:39<04:31, 1.34s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 282/483 [05:40<04:36, 1.38s/it] [2024-07-23 17:49:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.49.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 282/483 [05:41<04:36, 1.38s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 283/483 [05:43<06:23, 1.92s/it] [2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.49.post_attention_layernorm.weight", shape: (8192,), dtype: float16
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 283/483 [05:43<06:23, 1.92s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 284/483 [05:43<04:34, 1.38s/it] [2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.49.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 284/483 [05:43<04:34, 1.38s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 285/483 [05:44<03:35, 1.09s/it] [2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.50.input_layernorm.weight", shape: (8192,), dtype: float16
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 285/483 [05:44<03:35, 1.09s/it] [2024-07-23 17:49:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.50.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 285/483 [05:44<03:35, 1.09s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 287/483 [05:45<03:00, 1.09it/s] [2024-07-23 17:49:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.50.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 287/483 [05:46<03:00, 1.09it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 288/483 [05:48<04:50, 1.49s/it] [2024-07-23 17:49:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.50.post_attention_layernorm.weight", shape: (8192,), dtype: float16
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 288/483 [05:48<04:50, 1.49s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 289/483 [05:48<03:38, 1.13s/it] [2024-07-23 17:49:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.50.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 289/483 [05:49<03:38, 1.13s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 290/483 [05:49<03:06, 1.03it/s] [2024-07-23 17:49:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.50.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 290/483 [05:49<03:06, 1.03it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 291/483 [05:49<02:34, 1.24it/s] [2024-07-23 17:49:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.51.input_layernorm.weight", shape: (8192,), dtype: float16
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 291/483 [05:49<02:34, 1.24it/s] [2024-07-23 17:49:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.51.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 291/483 [05:50<02:34, 1.24it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 293/483 [05:51<02:26, 1.30it/s] [2024-07-23 17:49:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.51.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 293/483 [05:52<02:26, 1.30it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 294/483 [05:54<04:14, 1.35s/it] [2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.51.post_attention_layernorm.weight", shape: (8192,), dtype: float16
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 294/483 [05:54<04:14, 1.35s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 295/483 [05:54<03:13, 1.03s/it] [2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.51.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 295/483 [05:54<03:13, 1.03s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 296/483 [05:55<02:47, 1.12it/s] [2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.51.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 296/483 [05:55<02:47, 1.12it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 297/483 [05:55<02:20, 1.32it/s] [2024-07-23 17:49:56] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00019-of-00030.safetensors
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 297/483 [05:55<02:20, 1.32it/s] [2024-07-23 17:49:56] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00020-of-00030.safetensors
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 297/483 [05:55<02:20, 1.32it/s] [2024-07-23 17:49:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.52.input_layernorm.weight", shape: (8192,), dtype: float16
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 297/483 [05:57<02:20, 1.32it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 298/483 [05:57<03:49, 1.24s/it] [2024-07-23 17:49:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.52.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 298/483 [05:58<03:49, 1.24s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 299/483 [05:59<03:59, 1.30s/it] [2024-07-23 17:50:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.52.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 299/483 [06:00<03:59, 1.30s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 300/483 [06:02<05:38, 1.85s/it] [2024-07-23 17:50:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.52.post_attention_layernorm.weight", shape: (8192,), dtype: float16
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 300/483 [06:02<05:38, 1.85s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 301/483 [06:02<04:03, 1.34s/it] [2024-07-23 17:50:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.52.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 301/483 [06:02<04:03, 1.34s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 302/483 [06:03<03:18, 1.10s/it] [2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.52.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 302/483 [06:03<03:18, 1.10s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 303/483 [06:03<02:40, 1.12it/s] [2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.53.input_layernorm.weight", shape: (8192,), dtype: float16
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 303/483 [06:03<02:40, 1.12it/s] [2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.53.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 303/483 [06:04<02:40, 1.12it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 305/483 [06:05<02:24, 1.23it/s] [2024-07-23 17:50:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.53.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 305/483 [06:06<02:24, 1.23it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 306/483 [06:08<04:07, 1.40s/it] [2024-07-23 17:50:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.53.post_attention_layernorm.weight", shape: (8192,), dtype: float16
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 306/483 [06:08<04:07, 1.40s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 307/483 [06:08<03:06, 1.06s/it] [2024-07-23 17:50:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.53.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 307/483 [06:08<03:06, 1.06s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 308/483 [06:08<02:40, 1.09it/s] [2024-07-23 17:50:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.53.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 308/483 [06:09<02:40, 1.09it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 309/483 [06:09<02:14, 1.29it/s] [2024-07-23 17:50:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.54.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 309/483 [06:10<02:14, 1.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 310/483 [06:12<04:10, 1.45s/it] [2024-07-23 17:50:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.54.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 310/483 [06:12<04:10, 1.45s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 311/483 [06:13<03:28, 1.21s/it] [2024-07-23 17:50:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.54.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 311/483 [06:13<03:28, 1.21s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 312/483 [06:13<02:46, 1.03it/s] [2024-07-23 17:50:14] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00020-of-00030.safetensors
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 312/483 [06:13<02:46, 1.03it/s] [2024-07-23 17:50:14] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00021-of-00030.safetensors
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 312/483 [06:13<02:46, 1.03it/s] [2024-07-23 17:50:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.54.input_layernorm.weight", shape: (8192,), dtype: float16
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 312/483 [06:16<02:46, 1.03it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 313/483 [06:16<04:06, 1.45s/it] [2024-07-23 17:50:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.54.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 313/483 [06:16<04:06, 1.45s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 314/483 [06:17<04:05, 1.45s/it] [2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.54.post_attention_layernorm.weight", shape: (8192,), dtype: float16
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 314/483 [06:17<04:05, 1.45s/it] [2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.55.input_layernorm.weight", shape: (8192,), dtype: float16
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 314/483 [06:17<04:05, 1.45s/it] [2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.55.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 314/483 [06:18<04:05, 1.45s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 317/483 [06:18<02:31, 1.10it/s] [2024-07-23 17:50:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.55.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 317/483 [06:20<02:31, 1.10it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 318/483 [06:22<03:49, 1.39s/it] [2024-07-23 17:50:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.55.post_attention_layernorm.weight", shape: (8192,), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 318/483 [06:22<03:49, 1.39s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 319/483 [06:22<02:59, 1.09s/it] [2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.55.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 319/483 [06:22<02:59, 1.09s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 320/483 [06:22<02:35, 1.05it/s] [2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.55.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 320/483 [06:22<02:35, 1.05it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 321/483 [06:23<02:10, 1.24it/s] [2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.56.input_layernorm.weight", shape: (8192,), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 321/483 [06:23<02:10, 1.24it/s] [2024-07-23 17:50:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.56.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 321/483 [06:23<02:10, 1.24it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 323/483 [06:24<02:03, 1.30it/s] [2024-07-23 17:50:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.56.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 323/483 [06:25<02:03, 1.30it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 324/483 [06:27<03:32, 1.34s/it] [2024-07-23 17:50:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.56.post_attention_layernorm.weight", shape: (8192,), dtype: float16
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 324/483 [06:27<03:32, 1.34s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 325/483 [06:27<02:42, 1.03s/it] [2024-07-23 17:50:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.56.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 325/483 [06:28<02:42, 1.03s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 326/483 [06:28<02:20, 1.12it/s] [2024-07-23 17:50:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.56.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 326/483 [06:28<02:20, 1.12it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 327/483 [06:28<01:58, 1.32it/s] [2024-07-23 17:50:29] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00022-of-00030.safetensors
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 327/483 [06:28<01:58, 1.32it/s] [2024-07-23 17:50:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.57.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 327/483 [06:33<01:58, 1.32it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 328/483 [06:35<06:17, 2.44s/it] [2024-07-23 17:50:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.57.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 328/483 [06:35<06:17, 2.44s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 329/483 [06:36<04:55, 1.92s/it] [2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.57.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 329/483 [06:36<04:55, 1.92s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 330/483 [06:36<03:46, 1.48s/it] [2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.57.input_layernorm.weight", shape: (8192,), dtype: float16
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 330/483 [06:36<03:46, 1.48s/it] [2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.57.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 330/483 [06:37<03:46, 1.48s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 332/483 [06:38<02:51, 1.13s/it] [2024-07-23 17:50:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.57.post_attention_layernorm.weight", shape: (8192,), dtype: float16
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 332/483 [06:38<02:51, 1.13s/it] [2024-07-23 17:50:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.58.input_layernorm.weight", shape: (8192,), dtype: float16
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 332/483 [06:38<02:51, 1.13s/it] [2024-07-23 17:50:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.58.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 332/483 [06:38<02:51, 1.13s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 335/483 [06:39<02:00, 1.23it/s] [2024-07-23 17:50:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.58.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 335/483 [06:40<02:00, 1.23it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 336/483 [06:42<03:06, 1.27s/it] [2024-07-23 17:50:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.58.post_attention_layernorm.weight", shape: (8192,), dtype: float16
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 336/483 [06:42<03:06, 1.27s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 337/483 [06:42<02:28, 1.02s/it] [2024-07-23 17:50:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.58.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 337/483 [06:42<02:28, 1.02s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 338/483 [06:43<02:10, 1.11it/s] [2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.58.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 338/483 [06:43<02:10, 1.11it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 339/483 [06:43<01:51, 1.29it/s] [2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.59.input_layernorm.weight", shape: (8192,), dtype: float16
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 339/483 [06:43<01:51, 1.29it/s] [2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.59.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 339/483 [06:44<01:51, 1.29it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 341/483 [06:45<01:46, 1.33it/s] [2024-07-23 17:50:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.59.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 341/483 [06:46<01:46, 1.33it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 342/483 [06:48<03:04, 1.31s/it] [2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.59.post_attention_layernorm.weight", shape: (8192,), dtype: float16
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 342/483 [06:48<03:04, 1.31s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 343/483 [06:48<02:21, 1.01s/it] [2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.59.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 343/483 [06:48<02:21, 1.01s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 344/483 [06:48<02:03, 1.13it/s] [2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.59.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 344/483 [06:49<02:03, 1.13it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 345/483 [06:49<01:44, 1.33it/s] [2024-07-23 17:50:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.60.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 345/483 [06:49<01:44, 1.33it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 346/483 [06:49<01:34, 1.45it/s] [2024-07-23 17:50:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.60.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 346/483 [06:50<01:34, 1.45it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 347/483 [06:50<01:22, 1.65it/s] [2024-07-23 17:50:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00021-of-00030.safetensors
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 347/483 [06:50<01:22, 1.65it/s] [2024-07-23 17:50:51] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00022-of-00030.safetensors
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 347/483 [06:50<01:22, 1.65it/s] [2024-07-23 17:50:51] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00023-of-00030.safetensors
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 347/483 [06:50<01:22, 1.65it/s] [2024-07-23 17:50:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.60.input_layernorm.weight", shape: (8192,), dtype: float16
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 347/483 [06:52<01:22, 1.65it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 348/483 [06:52<02:41, 1.20s/it] [2024-07-23 17:50:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.60.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 348/483 [06:53<02:41, 1.20s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 349/483 [06:54<02:50, 1.27s/it] [2024-07-23 17:50:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.60.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 349/483 [06:55<02:50, 1.27s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 350/483 [06:57<04:04, 1.84s/it] [2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.60.post_attention_layernorm.weight", shape: (8192,), dtype: float16
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 350/483 [06:57<04:04, 1.84s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 351/483 [06:57<02:54, 1.32s/it] [2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.61.input_layernorm.weight", shape: (8192,), dtype: float16
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 351/483 [06:57<02:54, 1.32s/it] [2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.61.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 351/483 [06:58<02:54, 1.32s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 353/483 [06:59<02:15, 1.04s/it] [2024-07-23 17:51:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.61.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 353/483 [07:00<02:15, 1.04s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 354/483 [07:02<03:22, 1.57s/it] [2024-07-23 17:51:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.61.post_attention_layernorm.weight", shape: (8192,), dtype: float16
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 354/483 [07:02<03:22, 1.57s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 355/483 [07:02<02:32, 1.19s/it] [2024-07-23 17:51:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.61.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 355/483 [07:02<02:32, 1.19s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 356/483 [07:02<02:08, 1.01s/it] [2024-07-23 17:51:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.61.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 356/483 [07:03<02:08, 1.01s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 357/483 [07:03<01:46, 1.18it/s] [2024-07-23 17:51:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.62.input_layernorm.weight", shape: (8192,), dtype: float16
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 357/483 [07:03<01:46, 1.18it/s] [2024-07-23 17:51:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.62.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 357/483 [07:03<01:46, 1.18it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 359/483 [07:04<01:37, 1.27it/s] [2024-07-23 17:51:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.62.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 359/483 [07:06<01:37, 1.27it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 360/483 [07:07<02:48, 1.37s/it] [2024-07-23 17:51:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.62.post_attention_layernorm.weight", shape: (8192,), dtype: float16
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 360/483 [07:08<02:48, 1.37s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 361/483 [07:08<02:07, 1.04s/it] [2024-07-23 17:51:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.62.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 361/483 [07:08<02:07, 1.04s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 362/483 [07:08<01:49, 1.10it/s] [2024-07-23 17:51:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.62.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 362/483 [07:08<01:49, 1.10it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 363/483 [07:09<01:32, 1.29it/s] [2024-07-23 17:51:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.63.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 363/483 [07:09<01:32, 1.29it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 364/483 [07:09<01:24, 1.41it/s] [2024-07-23 17:51:10] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00023-of-00030.safetensors
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 364/483 [07:09<01:24, 1.41it/s] [2024-07-23 17:51:10] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00024-of-00030.safetensors
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 364/483 [07:09<01:24, 1.41it/s] [2024-07-23 17:51:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.63.input_layernorm.weight", shape: (8192,), dtype: float16
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 364/483 [07:12<01:24, 1.41it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 365/483 [07:12<02:39, 1.35s/it] [2024-07-23 17:51:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.63.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 365/483 [07:13<02:39, 1.35s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 366/483 [07:13<02:41, 1.38s/it] [2024-07-23 17:51:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.63.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 366/483 [07:15<02:41, 1.38s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 367/483 [07:17<03:42, 1.92s/it] [2024-07-23 17:51:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.63.post_attention_layernorm.weight", shape: (8192,), dtype: float16
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 367/483 [07:17<03:42, 1.92s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 368/483 [07:17<02:38, 1.38s/it] [2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.63.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 368/483 [07:17<02:38, 1.38s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 369/483 [07:17<02:04, 1.09s/it] [2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.64.input_layernorm.weight", shape: (8192,), dtype: float16
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 369/483 [07:17<02:04, 1.09s/it] [2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.64.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 369/483 [07:18<02:04, 1.09s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 371/483 [07:19<01:42, 1.09it/s] [2024-07-23 17:51:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.64.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 371/483 [07:20<01:42, 1.09it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 372/483 [07:22<02:44, 1.48s/it] [2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.64.post_attention_layernorm.weight", shape: (8192,), dtype: float16
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 372/483 [07:22<02:44, 1.48s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 373/483 [07:22<02:03, 1.12s/it] [2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.64.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 373/483 [07:22<02:03, 1.12s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 374/483 [07:22<01:44, 1.04it/s] [2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.64.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 374/483 [07:23<01:44, 1.04it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 375/483 [07:23<01:26, 1.25it/s] [2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.65.input_layernorm.weight", shape: (8192,), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 375/483 [07:23<01:26, 1.25it/s] [2024-07-23 17:51:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.65.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 375/483 [07:23<01:26, 1.25it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 377/483 [07:24<01:21, 1.31it/s] [2024-07-23 17:51:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.65.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 377/483 [07:26<01:21, 1.31it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 378/483 [07:27<02:21, 1.35s/it] [2024-07-23 17:51:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.65.post_attention_layernorm.weight", shape: (8192,), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 378/483 [07:27<02:21, 1.35s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 379/483 [07:28<01:46, 1.03s/it] [2024-07-23 17:51:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.65.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 379/483 [07:28<01:46, 1.03s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 380/483 [07:28<01:32, 1.12it/s] [2024-07-23 17:51:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.65.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 380/483 [07:28<01:32, 1.12it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 381/483 [07:28<01:17, 1.32it/s] [2024-07-23 17:51:29] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00024-of-00030.safetensors
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 381/483 [07:28<01:17, 1.32it/s] [2024-07-23 17:51:29] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00025-of-00030.safetensors
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 381/483 [07:29<01:17, 1.32it/s] [2024-07-23 17:51:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.66.input_layernorm.weight", shape: (8192,), dtype: float16
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 381/483 [07:31<01:17, 1.32it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 382/483 [07:31<01:56, 1.15s/it] [2024-07-23 17:51:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.66.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 382/483 [07:31<01:56, 1.15s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 383/483 [07:32<02:03, 1.24s/it] [2024-07-23 17:51:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.66.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 383/483 [07:33<02:03, 1.24s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 384/483 [07:35<02:58, 1.80s/it] [2024-07-23 17:51:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.66.post_attention_layernorm.weight", shape: (8192,), dtype: float16
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 384/483 [07:35<02:58, 1.80s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 385/483 [07:35<02:07, 1.30s/it] [2024-07-23 17:51:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.66.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 385/483 [07:36<02:07, 1.30s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 386/483 [07:36<01:44, 1.07s/it] [2024-07-23 17:51:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.66.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 386/483 [07:36<01:44, 1.07s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 387/483 [07:36<01:23, 1.14it/s] [2024-07-23 17:51:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.67.input_layernorm.weight", shape: (8192,), dtype: float16
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 387/483 [07:36<01:23, 1.14it/s] [2024-07-23 17:51:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.67.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 387/483 [07:37<01:23, 1.14it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 389/483 [07:38<01:15, 1.25it/s] [2024-07-23 17:51:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.67.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 389/483 [07:39<01:15, 1.25it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 390/483 [07:41<02:09, 1.39s/it] [2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.67.post_attention_layernorm.weight", shape: (8192,), dtype: float16
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 390/483 [07:41<02:09, 1.39s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 391/483 [07:41<01:37, 1.06s/it] [2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.67.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 391/483 [07:41<01:37, 1.06s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 392/483 [07:42<01:23, 1.09it/s] [2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.67.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 392/483 [07:42<01:23, 1.09it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 393/483 [07:42<01:09, 1.30it/s] [2024-07-23 17:51:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.68.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 393/483 [07:43<01:09, 1.30it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 394/483 [07:45<02:09, 1.46s/it] [2024-07-23 17:51:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.68.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 394/483 [07:45<02:09, 1.46s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 395/483 [07:46<01:47, 1.22s/it] [2024-07-23 17:51:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.68.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 395/483 [07:46<01:47, 1.22s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 396/483 [07:46<01:25, 1.02it/s] [2024-07-23 17:51:47] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00025-of-00030.safetensors
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 396/483 [07:46<01:25, 1.02it/s] [2024-07-23 17:51:47] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00026-of-00030.safetensors
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 396/483 [07:46<01:25, 1.02it/s] [2024-07-23 17:51:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.68.input_layernorm.weight", shape: (8192,), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 396/483 [07:49<01:25, 1.02it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 397/483 [07:49<02:05, 1.45s/it] [2024-07-23 17:51:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.68.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 397/483 [07:49<02:05, 1.45s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 398/483 [07:50<02:03, 1.45s/it] [2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.68.post_attention_layernorm.weight", shape: (8192,), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 398/483 [07:50<02:03, 1.45s/it] [2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.69.input_layernorm.weight", shape: (8192,), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 398/483 [07:50<02:03, 1.45s/it] [2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.69.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 398/483 [07:51<02:03, 1.45s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 401/483 [07:52<01:14, 1.10it/s] [2024-07-23 17:51:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.69.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 401/483 [07:53<01:14, 1.10it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 402/483 [07:55<01:52, 1.39s/it] [2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.69.post_attention_layernorm.weight", shape: (8192,), dtype: float16
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 402/483 [07:55<01:52, 1.39s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 403/483 [07:55<01:27, 1.09s/it] [2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.69.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 403/483 [07:55<01:27, 1.09s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 404/483 [07:55<01:15, 1.05it/s] [2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.69.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 404/483 [07:56<01:15, 1.05it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 405/483 [07:56<01:03, 1.24it/s] [2024-07-23 17:51:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.70.input_layernorm.weight", shape: (8192,), dtype: float16
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 405/483 [07:56<01:03, 1.24it/s] [2024-07-23 17:51:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.70.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 405/483 [07:56<01:03, 1.24it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 407/483 [07:57<00:58, 1.30it/s] [2024-07-23 17:51:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.70.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 407/483 [07:59<00:58, 1.30it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 408/483 [08:01<01:40, 1.34s/it] [2024-07-23 17:52:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.70.post_attention_layernorm.weight", shape: (8192,), dtype: float16
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 408/483 [08:01<01:40, 1.34s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 409/483 [08:01<01:16, 1.03s/it] [2024-07-23 17:52:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.70.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 409/483 [08:01<01:16, 1.03s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 410/483 [08:01<01:05, 1.11it/s] [2024-07-23 17:52:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.70.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 410/483 [08:01<01:05, 1.11it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 411/483 [08:02<00:54, 1.31it/s] [2024-07-23 17:52:02] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 411/483 [08:02<00:54, 1.31it/s] [2024-07-23 17:52:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.71.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 411/483 [08:07<00:54, 1.31it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 412/483 [08:09<03:03, 2.58s/it] [2024-07-23 17:52:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.71.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 412/483 [08:09<03:03, 2.58s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 413/483 [08:09<02:21, 2.02s/it] [2024-07-23 17:52:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.71.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 413/483 [08:10<02:21, 2.02s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 414/483 [08:10<01:47, 1.55s/it] [2024-07-23 17:52:10] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00026-of-00030.safetensors
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 414/483 [08:10<01:47, 1.55s/it] [2024-07-23 17:52:11] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 414/483 [08:10<01:47, 1.55s/it] [2024-07-23 17:52:11] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00004-of-00030.safetensors
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 414/483 [08:10<01:47, 1.55s/it] [2024-07-23 17:52:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.input_layernorm.weight", shape: (8192,), dtype: float16
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 414/483 [08:12<01:47, 1.55s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 415/483 [08:12<02:04, 1.84s/it] [2024-07-23 17:52:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 415/483 [08:13<02:04, 1.84s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 416/483 [08:14<01:55, 1.72s/it] [2024-07-23 17:52:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 416/483 [08:15<01:55, 1.72s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 417/483 [08:17<02:21, 2.15s/it] [2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.post_attention_layernorm.weight", shape: (8192,), dtype: float16
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 417/483 [08:17<02:21, 2.15s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 418/483 [08:17<01:40, 1.54s/it] [2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 418/483 [08:17<01:40, 1.54s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 419/483 [08:17<01:17, 1.21s/it] [2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.input_layernorm.weight", shape: (8192,), dtype: float16
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 419/483 [08:17<01:17, 1.21s/it] [2024-07-23 17:52:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 419/483 [08:18<01:17, 1.21s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 421/483 [08:19<01:00, 1.02it/s] [2024-07-23 17:52:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 421/483 [08:20<01:00, 1.02it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 422/483 [08:22<01:33, 1.53s/it] [2024-07-23 17:52:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.post_attention_layernorm.weight", shape: (8192,), dtype: float16
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 422/483 [08:22<01:33, 1.53s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 423/483 [08:22<01:09, 1.16s/it] [2024-07-23 17:52:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 423/483 [08:22<01:09, 1.16s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 424/483 [08:23<00:58, 1.02it/s] [2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 424/483 [08:23<00:58, 1.02it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 425/483 [08:23<00:47, 1.22it/s] [2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.input_layernorm.weight", shape: (8192,), dtype: float16
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 425/483 [08:23<00:47, 1.22it/s] [2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 425/483 [08:24<00:47, 1.22it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 427/483 [08:25<00:43, 1.29it/s] [2024-07-23 17:52:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 427/483 [08:26<00:43, 1.29it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 428/483 [08:28<01:14, 1.36s/it] [2024-07-23 17:52:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.post_attention_layernorm.weight", shape: (8192,), dtype: float16
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 428/483 [08:28<01:14, 1.36s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 429/483 [08:28<00:56, 1.04s/it] [2024-07-23 17:52:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 429/483 [08:28<00:56, 1.04s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 430/483 [08:28<00:47, 1.11it/s] [2024-07-23 17:52:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 430/483 [08:29<00:47, 1.11it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 431/483 [08:29<00:39, 1.31it/s] [2024-07-23 17:52:29] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00004-of-00030.safetensors
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 431/483 [08:29<00:39, 1.31it/s] [2024-07-23 17:52:30] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 431/483 [08:29<00:39, 1.31it/s] [2024-07-23 17:52:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.71.input_layernorm.weight", shape: (8192,), dtype: float16
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 431/483 [08:30<00:39, 1.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 432/483 [08:30<00:48, 1.06it/s] [2024-07-23 17:52:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.71.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 432/483 [08:31<00:48, 1.06it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 433/483 [08:32<00:54, 1.09s/it] [2024-07-23 17:52:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.71.post_attention_layernorm.weight", shape: (8192,), dtype: float16
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 433/483 [08:32<00:54, 1.09s/it] [2024-07-23 17:52:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.72.input_layernorm.weight", shape: (8192,), dtype: float16
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 433/483 [08:32<00:54, 1.09s/it] [2024-07-23 17:52:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.72.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 433/483 [08:32<00:54, 1.09s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 436/483 [08:33<00:35, 1.32it/s] [2024-07-23 17:52:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.72.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 436/483 [08:34<00:35, 1.32it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 437/483 [08:36<00:57, 1.26s/it] [2024-07-23 17:52:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.72.post_attention_layernorm.weight", shape: (8192,), dtype: float16
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 437/483 [08:36<00:57, 1.26s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 438/483 [08:36<00:44, 1.01it/s] [2024-07-23 17:52:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.72.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 438/483 [08:37<00:44, 1.01it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 439/483 [08:37<00:38, 1.14it/s] [2024-07-23 17:52:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.72.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 439/483 [08:37<00:38, 1.14it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 440/483 [08:37<00:32, 1.33it/s] [2024-07-23 17:52:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.73.input_layernorm.weight", shape: (8192,), dtype: float16
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 440/483 [08:37<00:32, 1.33it/s] [2024-07-23 17:52:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.73.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 440/483 [08:38<00:32, 1.33it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 442/483 [08:39<00:30, 1.35it/s] [2024-07-23 17:52:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.73.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 442/483 [08:40<00:30, 1.35it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 443/483 [08:42<00:52, 1.31s/it] [2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.73.post_attention_layernorm.weight", shape: (8192,), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 443/483 [08:42<00:52, 1.31s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 444/483 [08:42<00:39, 1.01s/it] [2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.73.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 444/483 [08:42<00:39, 1.01s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 445/483 [08:43<00:33, 1.13it/s] [2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.73.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 445/483 [08:43<00:33, 1.13it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 446/483 [08:43<00:27, 1.33it/s] [2024-07-23 17:52:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.74.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 446/483 [08:43<00:27, 1.33it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 447/483 [08:43<00:24, 1.46it/s] [2024-07-23 17:52:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.74.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 447/483 [08:44<00:24, 1.46it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 448/483 [08:44<00:21, 1.65it/s] [2024-07-23 17:52:45] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 448/483 [08:44<00:21, 1.65it/s] [2024-07-23 17:52:45] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00028-of-00030.safetensors
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 448/483 [08:44<00:21, 1.65it/s] [2024-07-23 17:52:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.74.input_layernorm.weight", shape: (8192,), dtype: float16
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 448/483 [08:47<00:21, 1.65it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 449/483 [08:47<00:43, 1.28s/it] [2024-07-23 17:52:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.74.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 449/483 [08:47<00:43, 1.28s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 450/483 [08:48<00:43, 1.33s/it] [2024-07-23 17:52:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.74.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 450/483 [08:50<00:43, 1.33s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 451/483 [08:51<01:00, 1.89s/it] [2024-07-23 17:52:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.74.post_attention_layernorm.weight", shape: (8192,), dtype: float16
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 451/483 [08:52<01:00, 1.89s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 452/483 [08:52<00:42, 1.36s/it] [2024-07-23 17:52:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.75.input_layernorm.weight", shape: (8192,), dtype: float16
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 452/483 [08:52<00:42, 1.36s/it] [2024-07-23 17:52:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.75.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 452/483 [08:52<00:42, 1.36s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 454/483 [08:53<00:30, 1.06s/it] [2024-07-23 17:52:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.75.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 454/483 [08:54<00:30, 1.06s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 455/483 [08:56<00:44, 1.58s/it] [2024-07-23 17:52:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.75.post_attention_layernorm.weight", shape: (8192,), dtype: float16
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 455/483 [08:56<00:44, 1.58s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 456/483 [08:56<00:32, 1.20s/it] [2024-07-23 17:52:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.75.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 456/483 [08:57<00:32, 1.20s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 457/483 [08:57<00:26, 1.02s/it] [2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.75.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 457/483 [08:57<00:26, 1.02s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 458/483 [08:57<00:21, 1.17it/s] [2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.76.input_layernorm.weight", shape: (8192,), dtype: float16
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 458/483 [08:57<00:21, 1.17it/s] [2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.76.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 458/483 [08:58<00:21, 1.17it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 460/483 [08:59<00:18, 1.26it/s] [2024-07-23 17:53:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.76.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 460/483 [09:00<00:18, 1.26it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 461/483 [09:02<00:30, 1.37s/it] [2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.76.post_attention_layernorm.weight", shape: (8192,), dtype: float16
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 461/483 [09:02<00:30, 1.37s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 462/483 [09:02<00:21, 1.05s/it] [2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.76.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 462/483 [09:02<00:21, 1.05s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 463/483 [09:03<00:18, 1.10it/s] [2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.76.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 463/483 [09:03<00:18, 1.10it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 464/483 [09:03<00:14, 1.29it/s] [2024-07-23 17:53:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.77.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 464/483 [09:03<00:14, 1.29it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 465/483 [09:03<00:12, 1.41it/s] [2024-07-23 17:53:04] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00028-of-00030.safetensors
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 465/483 [09:03<00:12, 1.41it/s] [2024-07-23 17:53:04] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00029-of-00030.safetensors
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 465/483 [09:04<00:12, 1.41it/s] [2024-07-23 17:53:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.77.input_layernorm.weight", shape: (8192,), dtype: float16
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 465/483 [09:06<00:12, 1.41it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 466/483 [09:06<00:23, 1.36s/it] [2024-07-23 17:53:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.77.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 466/483 [09:07<00:23, 1.36s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 467/483 [09:08<00:22, 1.39s/it] [2024-07-23 17:53:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.77.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 467/483 [09:09<00:22, 1.39s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 468/483 [09:11<00:28, 1.92s/it] [2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.77.post_attention_layernorm.weight", shape: (8192,), dtype: float16
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 468/483 [09:11<00:28, 1.92s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 469/483 [09:11<00:19, 1.38s/it] [2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.77.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 469/483 [09:11<00:19, 1.38s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 470/483 [09:12<00:14, 1.09s/it] [2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.78.input_layernorm.weight", shape: (8192,), dtype: float16
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 470/483 [09:12<00:14, 1.09s/it] [2024-07-23 17:53:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.78.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 470/483 [09:12<00:14, 1.09s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 472/483 [09:13<00:10, 1.09it/s] [2024-07-23 17:53:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.78.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 472/483 [09:14<00:10, 1.09it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 473/483 [09:16<00:14, 1.48s/it] [2024-07-23 17:53:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.78.post_attention_layernorm.weight", shape: (8192,), dtype: float16
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 473/483 [09:16<00:14, 1.48s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 474/483 [09:16<00:10, 1.12s/it] [2024-07-23 17:53:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.78.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 474/483 [09:17<00:10, 1.12s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 475/483 [09:17<00:07, 1.04it/s] [2024-07-23 17:53:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.78.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 475/483 [09:17<00:07, 1.04it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 476/483 [09:17<00:05, 1.25it/s] [2024-07-23 17:53:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.79.input_layernorm.weight", shape: (8192,), dtype: float16
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 476/483 [09:17<00:05, 1.25it/s] [2024-07-23 17:53:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.79.mlp.down_proj.weight", shape: (8192, 28672), dtype: float16
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 476/483 [09:18<00:05, 1.25it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 478/483 [09:19<00:03, 1.30it/s] [2024-07-23 17:53:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.79.mlp.gate_up_proj.weight", shape: (57344, 8192), dtype: float16
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 478/483 [09:20<00:03, 1.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 479/483 [09:22<00:05, 1.35s/it] [2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.79.post_attention_layernorm.weight", shape: (8192,), dtype: float16
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 479/483 [09:22<00:05, 1.35s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 480/483 [09:22<00:03, 1.03s/it] [2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.79.self_attn.qkv_proj.weight", shape: (10240, 8192), dtype: float16
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 480/483 [09:22<00:03, 1.03s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 481/483 [09:23<00:01, 1.11it/s] [2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.79.self_attn.o_proj.weight", shape: (8192, 8192), dtype: float16
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 481/483 [09:23<00:01, 1.11it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 482/483 [09:23<00:00, 1.32it/s] [2024-07-23 17:53:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.norm.weight", shape: (8192,), dtype: float16
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 482/483 [09:23<00:00, 1.32it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 483/483 [09:23<00:00, 1.17s/it]
[2024-07-23 17:53:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00029-of-00030.safetensors
[2024-07-23 17:53:24] INFO stats.py:77: Time usage: HF loading: 82.243 sec; Pre-quantization mapping: 178.396 sec; Quantization: 0.000 sec
[2024-07-23 17:53:24] INFO stats.py:91: RAM usage: Peak RAM: 17.375 GB. Total bytes loaded from disk: 271.521 GB
[2024-07-23 17:53:24] INFO convert_weight.py:155: Parameter size after quantization: 131.417 GB
[2024-07-23 17:53:24] INFO convert_weight.py:160: Total parameters: 72,885,788,672
[2024-07-23 17:53:24] INFO convert_weight.py:161: Bits per parameter: 15.488
[2024-07-23 17:53:24] INFO convert_weight.py:166: Saved to directory: local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
All finished, 323 total shards committed, record saved to local_dir/Llama-3.1-70B-Instruct-q0f16-MLC/ndarray-cache.json