The `c4ai-command-r-08-2024` (*NOT* 'plus') model might need the last 2 layers excluding...
#2
by
jukofyork
- opened
For all the control vectors I have uploaded here I already exclude the last layer, but the c4ai-command-r-08-2024
model looks a bit odd:
Loading pre/post prompt stems from 'data/prompt_stems.json'... Done (50 + 50 loaded).
Loading prompt continuations from 'data/writing_style_continuations/language.json'... Done (3 classes; each with 10 continuations loaded).
Loading writing prompts from 'data/writing_prompts.txt'... Done (11835 loaded).
Generating dataset samples... Done ([3 classes x 2730 prompts] 8190 generated).
Loading '/mnt/data/c4ai-command-r-08-2024' model and tokenizer...
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 14/14 [00:18<00:00, 1.31s/it]
Tokenizing prompts: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8190/8190 [00:03<00:00, 2067.67it/s]
Sampling hidden states: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8190/8190 [43:52<00:00, 3.11it/s]
Saving to 'command-r-08-2024:32b-language__hidden_state_samples.pt'... Done.
Testing Eigenvector Directions for layers 1 to 39:
- Layer 1: [1/8192 filtered] [1/8192 selected] Ξ = 484%, ΞΟΒ² = 82.9%, Ο= (0.040, 0.039), ΞΌ = (-0.081, 0.093 [53.5%]) --> ΞΌ' = (0.006, -0.087, 0.087)
- Layer 2: [1/8192 filtered] [1/8192 selected] Ξ = 328%, ΞΟΒ² = 76.6%, Ο= (0.071, 0.062), ΞΌ = (-0.118, 0.124 [51.4%]) --> ΞΌ' = (0.003, -0.121, 0.121)
- Layer 3: [1/8192 filtered] [1/8192 selected] Ξ = 318%, ΞΟΒ² = 76.1%, Ο= (0.053, 0.058), ΞΌ = (-0.095, 0.104 [52.1%]) --> ΞΌ' = (0.004, -0.100, 0.100)
- Layer 4: [1/8192 filtered] [1/8192 selected] Ξ = 326%, ΞΟΒ² = 76.5%, Ο= (0.063, 0.059), ΞΌ = (-0.110, 0.110 [50.1%]) --> ΞΌ' = (0.000, -0.110, 0.110)
- Layer 5: [1/8192 filtered] [1/8192 selected] Ξ = 311%, ΞΟΒ² = 75.7%, Ο= (0.104, 0.092), ΞΌ = (-0.172, 0.173 [50.2%]) --> ΞΌ' = (0.001, -0.173, 0.173)
- Layer 6: [1/8192 filtered] [1/8192 selected] Ξ = 376%, ΞΟΒ² = 79.0%, Ο= (0.103, 0.093), ΞΌ = (-0.190, 0.191 [50.2%]) --> ΞΌ' = (0.001, -0.190, 0.190)
- Layer 7: [1/8192 filtered] [1/8192 selected] Ξ = 533%, ΞΟΒ² = 84.2%, Ο= (0.194, 0.189), ΞΌ = (-0.438, 0.445 [50.4%]) --> ΞΌ' = (0.004, -0.442, 0.442)
- Layer 8: [1/8192 filtered] [1/8192 selected] Ξ = 177%, ΞΟΒ² = 63.9%, Ο= (0.355, 0.327), ΞΌ = (-0.451, 0.457 [50.4%]) --> ΞΌ' = (0.003, -0.454, 0.454)
- Layer 9: [1/8192 filtered] [1/8192 selected] Ξ = 322%, ΞΟΒ² = 76.3%, Ο= (0.304, 0.256), ΞΌ = (-0.513, 0.495 [49.1%]) --> ΞΌ' = (-0.009, -0.504, 0.504)
- Layer 10: [1/8192 filtered] [1/8192 selected] Ξ = 359%, ΞΟΒ² = 78.2%, Ο= (0.361, 0.303), ΞΌ = (-0.640, 0.620 [49.2%]) --> ΞΌ' = (-0.010, -0.630, 0.630)
- Layer 11: [1/8192 filtered] [1/8192 selected] Ξ = 411%, ΞΟΒ² = 80.4%, Ο= (0.362, 0.304), ΞΌ = (-0.693, 0.663 [48.9%]) --> ΞΌ' = (-0.015, -0.678, 0.678)
- Layer 12: [1/8192 filtered] [1/8192 selected] Ξ = 362%, ΞΟΒ² = 78.4%, Ο= (0.490, 0.347), ΞΌ = (-0.819, 0.797 [49.3%]) --> ΞΌ' = (-0.011, -0.808, 0.808)
- Layer 13: [1/8192 filtered] [1/8192 selected] Ξ = 402%, ΞΟΒ² = 80.1%, Ο= (0.440, 0.336), ΞΌ = (-0.794, 0.777 [49.5%]) --> ΞΌ' = (-0.008, -0.786, 0.786)
- Layer 14: [1/8192 filtered] [1/8192 selected] Ξ = 406%, ΞΟΒ² = 80.2%, Ο= (0.432, 0.375), ΞΌ = (-0.826, 0.805 [49.4%]) --> ΞΌ' = (-0.010, -0.815, 0.815)
- Layer 15: [1/8192 filtered] [1/8192 selected] Ξ = 347%, ΞΟΒ² = 77.6%, Ο= (0.869, 0.837), ΞΌ = (-1.601, 1.577 [49.6%]) --> ΞΌ' = (-0.012, -1.589, 1.589)
- Layer 16: [1/8192 filtered] [1/8192 selected] Ξ = 348%, ΞΟΒ² = 77.7%, Ο= (0.758, 0.810), ΞΌ = (-1.478, 1.451 [49.5%]) --> ΞΌ' = (-0.014, -1.465, 1.465)
- Layer 17: [1/8192 filtered] [1/8192 selected] Ξ = 376%, ΞΟΒ² = 79.0%, Ο= (0.774, 0.767), ΞΌ = (-1.548, 1.440 [48.2%]) --> ΞΌ' = (-0.054, -1.494, 1.494)
- Layer 18: [1/8192 filtered] [1/8192 selected] Ξ = 295%, ΞΟΒ² = 74.7%, Ο= (1.847, 2.050), ΞΌ = (-3.320, 3.379 [50.4%]) --> ΞΌ' = (0.029, -3.350, 3.350)
- Layer 19: [1/8192 filtered] [1/8192 selected] Ξ = 269%, ΞΟΒ² = 72.9%, Ο= (1.918, 2.467), ΞΌ = (-3.522, 3.725 [51.4%]) --> ΞΌ' = (0.102, -3.623, 3.623)
- Layer 20: [1/8192 filtered] [1/8192 selected] Ξ = 264%, ΞΟΒ² = 72.5%, Ο= (1.966, 2.447), ΞΌ = (-3.555, 3.655 [50.7%]) --> ΞΌ' = (0.050, -3.605, 3.605)
- Layer 21: [1/8192 filtered] [1/8192 selected] Ξ = 416%, ΞΟΒ² = 80.6%, Ο= (2.123, 2.250), ΞΌ = (-4.450, 4.469 [50.1%]) --> ΞΌ' = (0.010, -4.460, 4.460)
- Layer 22: [1/8192 filtered] [1/8192 selected] Ξ = 425%, ΞΟΒ² = 80.9%, Ο= (2.139, 2.362), ΞΌ = (-4.411, 4.873 [52.5%]) --> ΞΌ' = (0.231, -4.642, 4.642)
- Layer 23: [1/8192 filtered] [1/8192 selected] Ξ = 414%, ΞΟΒ² = 80.5%, Ο= (2.409, 2.619), ΞΌ = (-4.793, 5.444 [53.2%]) --> ΞΌ' = (0.326, -5.118, 5.118)
- Layer 24: [1/8192 filtered] [1/8192 selected] Ξ = 413%, ΞΟΒ² = 80.5%, Ο= (2.571, 2.794), ΞΌ = (-4.981, 5.932 [54.4%]) --> ΞΌ' = (0.475, -5.456, 5.456)
- Layer 25: [1/8192 filtered] [1/8192 selected] Ξ = 588%, ΞΟΒ² = 85.5%, Ο= (3.041, 3.471), ΞΌ = (-7.186, 8.640 [54.6%]) --> ΞΌ' = (0.727, -7.913, 7.913)
- Layer 26: [1/8192 filtered] [1/8192 selected] Ξ = 425%, ΞΟΒ² = 81.0%, Ο= (3.390, 4.202), ΞΌ = (-6.925, 8.812 [56.0%]) --> ΞΌ' = (0.943, -7.869, 7.869)
- Layer 27: [1/8192 filtered] [1/8192 selected] Ξ = 367%, ΞΟΒ² = 78.6%, Ο= (4.587, 6.006), ΞΌ = (-8.345, 12.118 [59.2%]) --> ΞΌ' = (1.887, -10.231, 10.231)
- Layer 28: [1/8192 filtered] [1/8192 selected] Ξ = 381%, ΞΟΒ² = 79.2%, Ο= (4.967, 6.463), ΞΌ = (-9.155, 13.327 [59.3%]) --> ΞΌ' = (2.086, -11.241, 11.241)
- Layer 29: [1/8192 filtered] [1/8192 selected] Ξ = 337%, ΞΟΒ² = 77.1%, Ο= (5.460, 7.438), ΞΌ = (-9.122, 14.810 [61.9%]) --> ΞΌ' = (2.844, -11.966, 11.966)
- Layer 30: [1/8192 filtered] [1/8192 selected] Ξ = 333%, ΞΟΒ² = 76.9%, Ο= (6.468, 7.391), ΞΌ = (-9.324, 16.030 [63.2%]) --> ΞΌ' = (3.353, -12.677, 12.677)
- Layer 31: [1/8192 filtered] [1/8192 selected] Ξ = 309%, ΞΟΒ² = 75.6%, Ο= (7.178, 9.654), ΞΌ = (-10.768, 19.141 [64.0%]) --> ΞΌ' = (4.186, -14.955, 14.955)
- Layer 32: [1/8192 filtered] [1/8192 selected] Ξ = 288%, ΞΟΒ² = 74.2%, Ο= (6.315, 9.500), ΞΌ = (-9.564, 17.803 [65.1%]) --> ΞΌ' = (4.120, -13.683, 13.683)
- Layer 33: [2/8192 filtered] [1/8192 selected] Ξ = 260%, ΞΟΒ² = 72.2%, Ο= (10.247, 12.558), ΞΌ = (-12.651, 24.317 [65.8%]) --> ΞΌ' = (5.833, -18.484, 18.484)
- Layer 34: [1/8192 filtered] [1/8192 selected] Ξ = 277%, ΞΟΒ² = 73.4%, Ο= (9.566, 12.614), ΞΌ = (-12.380, 24.846 [66.7%]) --> ΞΌ' = (6.233, -18.613, 18.613)
- Layer 35: [1/8192 filtered] [1/8192 selected] Ξ = 292%, ΞΟΒ² = 74.5%, Ο= (11.524, 13.668), ΞΌ = (-15.386, 27.774 [64.4%]) --> ΞΌ' = (6.194, -21.580, 21.580)
- Layer 36: [1/8192 filtered] [1/8192 selected] Ξ = 277%, ΞΟΒ² = 73.5%, Ο= (14.924, 16.169), ΞΌ = (-18.578, 33.208 [64.1%]) --> ΞΌ' = (7.315, -25.893, 25.893)
- Layer 37: [1/8192 filtered] [1/8192 selected] Ξ = 277%, ΞΟΒ² = 73.5%, Ο= (17.302, 23.267), ΞΌ = (-23.946, 44.281 [64.9%]) --> ΞΌ' = (10.167, -34.113, 34.113)
- Layer 38: [1/8192 filtered] [1/8192 selected] Ξ = 265%, ΞΟΒ² = 72.6%, Ο= (21.397, 29.677), ΞΌ = (-29.617, 54.661 [64.9%]) --> ΞΌ' = (12.522, -42.139, 42.139)
- Layer 39: [1/8192 filtered] [1/8192 selected] Ξ = 285%, ΞΟΒ² = 74.0%, Ο= (355.582, 395.628), ΞΌ = (-453.536, 815.162 [64.3%]) --> ΞΌ' = (180.813, -634.349, 634.349)
The numbers seem to grow massively for the second to last layer too:
- Layer 38: [1/8192 filtered] [1/8192 selected] Ξ = 265%, ΞΟΒ² = 72.6%, Ο= (21.397, 29.677), ΞΌ = (-29.617, 54.661 [64.9%]) --> ΞΌ' = (12.522, -42.139, 42.139)
- Layer 39: [1/8192 filtered] [1/8192 selected] Ξ = 285%, ΞΟΒ² = 74.0%, Ο= (355.582, 395.628), ΞΌ = (-453.536, 815.162 [64.3%]) --> ΞΌ' = (180.813, -634.349, 634.349)
To exclude this second to last layer you'll need to use the --control-vector-layer-range
llama.cpp command:
--control-vector-layer-range START END
layer range to apply the control vector(s) to, start and end inclusive
So in this case you would add the command '--control-vector-layer-range 1 38'
to your other command line arguments.
NOTE: It may actually turn out to be fine, as even though all models have huge numbers in their last layer like this; including/excluding them didn't make a noticeable difference either way... I just can't test it as both my GPUs are busy creating more control vectors and I didn't notice this until just now.
jukofyork
changed discussion title from
The `c4ai-command-r-08-2024` model might need the last 2 layers excluding...
to The `c4ai-command-r-08-2024` (*NOT* 'plus') model might need the last 2 layers excluding...
jukofyork
pinned discussion
jukofyork
unpinned discussion