File size: 157,992 Bytes
a819edd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
/usr/local/lib/python3.10/dist-packages/lightning_fabric/connector.py:558: `precision=16` is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead!
INFO:pytorch_lightning.utilities.rank_zero:Using 16bit Automatic Mixed Precision (AMP)
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loggers/wandb.py:389: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name             | Type                        | Params
-----------------------------------------------------------------
0 | train_acc        | MulticlassAccuracy          | 0
1 | valid_acc        | MulticlassAccuracy          | 0
2 | test_acc         | MulticlassAccuracy          | 0
3 | val_f1_score     | MulticlassF1Score           | 0
4 | train_f1_score   | MulticlassF1Score           | 0
5 | test_f1_score    | MulticlassF1Score           | 0
6 | confusion_matrix | MulticlassConfusionMatrix   | 0
7 | gcn              | SGCN                        | 36.5 K
8 | encoder          | MoE_TransformerGraphEncoder | 6.8 M
9 | out              | Sequential                  | 18.6 K
-----------------------------------------------------------------
6.9 M     Trainable params
0         Non-trainable params
6.9 M     Total params
27.527    Total estimated model params size (MB)
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved. New best score: 0.263
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.135 >= min_delta = 1e-08. New best score: 0.398
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.008 >= min_delta = 1e-08. New best score: 0.406
Epoch 00006: reducing learning rate of group 0 to 5.0000e-04.
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.471 >= min_delta = 1e-08. New best score: 0.877
Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.016 >= min_delta = 1e-08. New best score: 0.893
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.016 >= min_delta = 1e-08. New best score: 0.909
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.006 >= min_delta = 1e-08. New best score: 0.915
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.006 >= min_delta = 1e-08. New best score: 0.920
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.002 >= min_delta = 1e-08. New best score: 0.923
Epoch 00017: reducing learning rate of group 0 to 1.2500e-04.
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.002 >= min_delta = 1e-08. New best score: 0.925
Epoch 00020: reducing learning rate of group 0 to 6.2500e-05.
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.002 >= min_delta = 1e-08. New best score: 0.927
Epoch 00023: reducing learning rate of group 0 to 3.1250e-05.
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.003 >= min_delta = 1e-08. New best score: 0.930
Epoch 00026: reducing learning rate of group 0 to 1.5625e-05.
INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.003 >= min_delta = 1e-08. New best score: 0.933
Epoch 00029: reducing learning rate of group 0 to 7.8125e-06.
Epoch 00032: reducing learning rate of group 0 to 5.0000e-06.
INFO:pytorch_lightning.callbacks.early_stopping:Monitored metric val_accuracy did not improve in the last 50 records. Best score: 0.933. Signaling Trainer to stop.
model_args ModelArgs :  ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8)
model_args =:  ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8)
model_args ModelArgs :  ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8)
model_args =:  ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8)
model_args ModelArgs :  ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8)
model_args =:  ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8)
model_args ModelArgs :  ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8)
model_args =:  ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8)
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
0 -> ('', MoE_GCN(
  (train_acc): MulticlassAccuracy()
  (valid_acc): MulticlassAccuracy()
  (test_acc): MulticlassAccuracy()
  (val_f1_score): MulticlassF1Score()
  (train_f1_score): MulticlassF1Score()
  (test_f1_score): MulticlassF1Score()
  (confusion_matrix): MulticlassConfusionMatrix()
  (gcn): SGCN(
    (conv_layers): ModuleList(
      (0): unit_gcn(
        (conv_list): ModuleList(
          (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
        )
        (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act): Mish()
      )
      (1): unit_gcn(
        (conv_list): ModuleList(
          (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
        )
        (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act): Mish()
      )
      (2): unit_gcn(
        (conv_list): ModuleList(
          (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
        )
        (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act): Mish()
      )
    )
  )
  (encoder): MoE_TransformerGraphEncoder(
    (layers): ModuleList(
      (0-3): 4 x MoE_TransformerGraphEncoderLayer(
        (attention): Residual(
          (sublayer): MultiHeadAttention(
            (heads): ModuleList(
              (0-7): 8 x AttentionHead(
                (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
                (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
                (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
              )
            )
            (linear): Linear(in_features=256, out_features=128, bias=True)
          )
          (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (feed_forward): Residual(
          (sublayer): MoeLayer(
            (experts): ModuleList(
              (0-7): 8 x FeedForward(
                (w1): Linear(in_features=128, out_features=512, bias=False)
                (w2): Linear(in_features=512, out_features=128, bias=False)
                (w3): Linear(in_features=128, out_features=512, bias=False)
              )
            )
            (gate): Linear(in_features=128, out_features=8, bias=False)
          )
          (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
      )
    )
    (positional_encoder): PositionalEncoder(
      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    )
  )
  (out): Sequential(
    (0): Linear(in_features=128, out_features=128, bias=True)
    (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (2): Linear(in_features=128, out_features=14, bias=True)
  )
))
1 -> ('train_acc', MulticlassAccuracy())
2 -> ('valid_acc', MulticlassAccuracy())
3 -> ('test_acc', MulticlassAccuracy())
4 -> ('val_f1_score', MulticlassF1Score())
5 -> ('train_f1_score', MulticlassF1Score())
6 -> ('test_f1_score', MulticlassF1Score())
7 -> ('confusion_matrix', MulticlassConfusionMatrix())
8 -> ('gcn', SGCN(
  (conv_layers): ModuleList(
    (0): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
    (1): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
    (2): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
  )
))
9 -> ('gcn.conv_layers', ModuleList(
  (0): unit_gcn(
    (conv_list): ModuleList(
      (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
    )
    (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act): Mish()
  )
  (1): unit_gcn(
    (conv_list): ModuleList(
      (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
    )
    (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act): Mish()
  )
  (2): unit_gcn(
    (conv_list): ModuleList(
      (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
    )
    (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act): Mish()
  )
))
10 -> ('gcn.conv_layers.0', unit_gcn(
  (conv_list): ModuleList(
    (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
  )
  (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act): Mish()
))
11 -> ('gcn.conv_layers.0.conv_list', ModuleList(
  (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
))
12 -> ('gcn.conv_layers.0.conv_list.0', Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)))
13 -> ('gcn.conv_layers.0.conv_list.1', Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)))
14 -> ('gcn.conv_layers.0.conv_list.2', Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)))
15 -> ('gcn.conv_layers.0.bn', BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
16 -> ('gcn.conv_layers.0.act', Mish())
17 -> ('gcn.conv_layers.1', unit_gcn(
  (conv_list): ModuleList(
    (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
  )
  (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act): Mish()
))
18 -> ('gcn.conv_layers.1.conv_list', ModuleList(
  (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
))
19 -> ('gcn.conv_layers.1.conv_list.0', Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)))
20 -> ('gcn.conv_layers.1.conv_list.1', Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)))
21 -> ('gcn.conv_layers.1.conv_list.2', Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)))
22 -> ('gcn.conv_layers.1.bn', BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
23 -> ('gcn.conv_layers.1.act', Mish())
24 -> ('gcn.conv_layers.2', unit_gcn(
  (conv_list): ModuleList(
    (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
  )
  (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act): Mish()
))
25 -> ('gcn.conv_layers.2.conv_list', ModuleList(
  (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
))
26 -> ('gcn.conv_layers.2.conv_list.0', Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)))
27 -> ('gcn.conv_layers.2.conv_list.1', Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)))
28 -> ('gcn.conv_layers.2.conv_list.2', Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)))
29 -> ('gcn.conv_layers.2.bn', BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
30 -> ('gcn.conv_layers.2.act', Mish())
31 -> ('encoder', MoE_TransformerGraphEncoder(
  (layers): ModuleList(
    (0-3): 4 x MoE_TransformerGraphEncoderLayer(
      (attention): Residual(
        (sublayer): MultiHeadAttention(
          (heads): ModuleList(
            (0-7): 8 x AttentionHead(
              (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
              (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
              (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (linear): Linear(in_features=256, out_features=128, bias=True)
        )
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (feed_forward): Residual(
        (sublayer): MoeLayer(
          (experts): ModuleList(
            (0-7): 8 x FeedForward(
              (w1): Linear(in_features=128, out_features=512, bias=False)
              (w2): Linear(in_features=512, out_features=128, bias=False)
              (w3): Linear(in_features=128, out_features=512, bias=False)
            )
          )
          (gate): Linear(in_features=128, out_features=8, bias=False)
        )
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    )
  )
  (positional_encoder): PositionalEncoder(
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  )
))
32 -> ('encoder.layers', ModuleList(
  (0-3): 4 x MoE_TransformerGraphEncoderLayer(
    (attention): Residual(
      (sublayer): MultiHeadAttention(
        (heads): ModuleList(
          (0-7): 8 x AttentionHead(
            (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
            (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
            (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          )
        )
        (linear): Linear(in_features=256, out_features=128, bias=True)
      )
      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (feed_forward): Residual(
      (sublayer): MoeLayer(
        (experts): ModuleList(
          (0-7): 8 x FeedForward(
            (w1): Linear(in_features=128, out_features=512, bias=False)
            (w2): Linear(in_features=512, out_features=128, bias=False)
            (w3): Linear(in_features=128, out_features=512, bias=False)
          )
        )
        (gate): Linear(in_features=128, out_features=8, bias=False)
      )
      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  )
))
33 -> ('encoder.layers.0', MoE_TransformerGraphEncoderLayer(
  (attention): Residual(
    (sublayer): MultiHeadAttention(
      (heads): ModuleList(
        (0-7): 8 x AttentionHead(
          (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        )
      )
      (linear): Linear(in_features=256, out_features=128, bias=True)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (feed_forward): Residual(
    (sublayer): MoeLayer(
      (experts): ModuleList(
        (0-7): 8 x FeedForward(
          (w1): Linear(in_features=128, out_features=512, bias=False)
          (w2): Linear(in_features=512, out_features=128, bias=False)
          (w3): Linear(in_features=128, out_features=512, bias=False)
        )
      )
      (gate): Linear(in_features=128, out_features=8, bias=False)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
))
34 -> ('encoder.layers.0.attention', Residual(
  (sublayer): MultiHeadAttention(
    (heads): ModuleList(
      (0-7): 8 x AttentionHead(
        (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (linear): Linear(in_features=256, out_features=128, bias=True)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
))
35 -> ('encoder.layers.0.attention.sublayer', MultiHeadAttention(
  (heads): ModuleList(
    (0-7): 8 x AttentionHead(
      (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (linear): Linear(in_features=256, out_features=128, bias=True)
))
36 -> ('encoder.layers.0.attention.sublayer.heads', ModuleList(
  (0-7): 8 x AttentionHead(
    (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  )
))
37 -> ('encoder.layers.0.attention.sublayer.heads.0', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
38 -> ('encoder.layers.0.attention.sublayer.heads.0.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
39 -> ('encoder.layers.0.attention.sublayer.heads.0.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
40 -> ('encoder.layers.0.attention.sublayer.heads.0.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
41 -> ('encoder.layers.0.attention.sublayer.heads.1', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
42 -> ('encoder.layers.0.attention.sublayer.heads.1.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
43 -> ('encoder.layers.0.attention.sublayer.heads.1.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
44 -> ('encoder.layers.0.attention.sublayer.heads.1.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
45 -> ('encoder.layers.0.attention.sublayer.heads.2', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
46 -> ('encoder.layers.0.attention.sublayer.heads.2.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
47 -> ('encoder.layers.0.attention.sublayer.heads.2.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
48 -> ('encoder.layers.0.attention.sublayer.heads.2.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
49 -> ('encoder.layers.0.attention.sublayer.heads.3', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
50 -> ('encoder.layers.0.attention.sublayer.heads.3.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
51 -> ('encoder.layers.0.attention.sublayer.heads.3.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
52 -> ('encoder.layers.0.attention.sublayer.heads.3.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
53 -> ('encoder.layers.0.attention.sublayer.heads.4', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
54 -> ('encoder.layers.0.attention.sublayer.heads.4.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
55 -> ('encoder.layers.0.attention.sublayer.heads.4.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
56 -> ('encoder.layers.0.attention.sublayer.heads.4.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
57 -> ('encoder.layers.0.attention.sublayer.heads.5', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
58 -> ('encoder.layers.0.attention.sublayer.heads.5.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
59 -> ('encoder.layers.0.attention.sublayer.heads.5.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
60 -> ('encoder.layers.0.attention.sublayer.heads.5.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
61 -> ('encoder.layers.0.attention.sublayer.heads.6', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
62 -> ('encoder.layers.0.attention.sublayer.heads.6.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
63 -> ('encoder.layers.0.attention.sublayer.heads.6.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
64 -> ('encoder.layers.0.attention.sublayer.heads.6.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
65 -> ('encoder.layers.0.attention.sublayer.heads.7', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
66 -> ('encoder.layers.0.attention.sublayer.heads.7.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
67 -> ('encoder.layers.0.attention.sublayer.heads.7.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
68 -> ('encoder.layers.0.attention.sublayer.heads.7.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
69 -> ('encoder.layers.0.attention.sublayer.linear', Linear(in_features=256, out_features=128, bias=True))
70 -> ('encoder.layers.0.attention.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
71 -> ('encoder.layers.0.attention.dropout', Dropout(p=0.1, inplace=False))
72 -> ('encoder.layers.0.feed_forward', Residual(
  (sublayer): MoeLayer(
    (experts): ModuleList(
      (0-7): 8 x FeedForward(
        (w1): Linear(in_features=128, out_features=512, bias=False)
        (w2): Linear(in_features=512, out_features=128, bias=False)
        (w3): Linear(in_features=128, out_features=512, bias=False)
      )
    )
    (gate): Linear(in_features=128, out_features=8, bias=False)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
))
73 -> ('encoder.layers.0.feed_forward.sublayer', MoeLayer(
  (experts): ModuleList(
    (0-7): 8 x FeedForward(
      (w1): Linear(in_features=128, out_features=512, bias=False)
      (w2): Linear(in_features=512, out_features=128, bias=False)
      (w3): Linear(in_features=128, out_features=512, bias=False)
    )
  )
  (gate): Linear(in_features=128, out_features=8, bias=False)
))
74 -> ('encoder.layers.0.feed_forward.sublayer.experts', ModuleList(
  (0-7): 8 x FeedForward(
    (w1): Linear(in_features=128, out_features=512, bias=False)
    (w2): Linear(in_features=512, out_features=128, bias=False)
    (w3): Linear(in_features=128, out_features=512, bias=False)
  )
))
75 -> ('encoder.layers.0.feed_forward.sublayer.experts.0', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
76 -> ('encoder.layers.0.feed_forward.sublayer.experts.0.w1', Linear(in_features=128, out_features=512, bias=False))
77 -> ('encoder.layers.0.feed_forward.sublayer.experts.0.w2', Linear(in_features=512, out_features=128, bias=False))
78 -> ('encoder.layers.0.feed_forward.sublayer.experts.0.w3', Linear(in_features=128, out_features=512, bias=False))
79 -> ('encoder.layers.0.feed_forward.sublayer.experts.1', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
80 -> ('encoder.layers.0.feed_forward.sublayer.experts.1.w1', Linear(in_features=128, out_features=512, bias=False))
81 -> ('encoder.layers.0.feed_forward.sublayer.experts.1.w2', Linear(in_features=512, out_features=128, bias=False))
82 -> ('encoder.layers.0.feed_forward.sublayer.experts.1.w3', Linear(in_features=128, out_features=512, bias=False))
83 -> ('encoder.layers.0.feed_forward.sublayer.experts.2', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
84 -> ('encoder.layers.0.feed_forward.sublayer.experts.2.w1', Linear(in_features=128, out_features=512, bias=False))
85 -> ('encoder.layers.0.feed_forward.sublayer.experts.2.w2', Linear(in_features=512, out_features=128, bias=False))
86 -> ('encoder.layers.0.feed_forward.sublayer.experts.2.w3', Linear(in_features=128, out_features=512, bias=False))
87 -> ('encoder.layers.0.feed_forward.sublayer.experts.3', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
88 -> ('encoder.layers.0.feed_forward.sublayer.experts.3.w1', Linear(in_features=128, out_features=512, bias=False))
89 -> ('encoder.layers.0.feed_forward.sublayer.experts.3.w2', Linear(in_features=512, out_features=128, bias=False))
90 -> ('encoder.layers.0.feed_forward.sublayer.experts.3.w3', Linear(in_features=128, out_features=512, bias=False))
91 -> ('encoder.layers.0.feed_forward.sublayer.experts.4', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
92 -> ('encoder.layers.0.feed_forward.sublayer.experts.4.w1', Linear(in_features=128, out_features=512, bias=False))
93 -> ('encoder.layers.0.feed_forward.sublayer.experts.4.w2', Linear(in_features=512, out_features=128, bias=False))
94 -> ('encoder.layers.0.feed_forward.sublayer.experts.4.w3', Linear(in_features=128, out_features=512, bias=False))
95 -> ('encoder.layers.0.feed_forward.sublayer.experts.5', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
96 -> ('encoder.layers.0.feed_forward.sublayer.experts.5.w1', Linear(in_features=128, out_features=512, bias=False))
97 -> ('encoder.layers.0.feed_forward.sublayer.experts.5.w2', Linear(in_features=512, out_features=128, bias=False))
98 -> ('encoder.layers.0.feed_forward.sublayer.experts.5.w3', Linear(in_features=128, out_features=512, bias=False))
99 -> ('encoder.layers.0.feed_forward.sublayer.experts.6', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
100 -> ('encoder.layers.0.feed_forward.sublayer.experts.6.w1', Linear(in_features=128, out_features=512, bias=False))
101 -> ('encoder.layers.0.feed_forward.sublayer.experts.6.w2', Linear(in_features=512, out_features=128, bias=False))
102 -> ('encoder.layers.0.feed_forward.sublayer.experts.6.w3', Linear(in_features=128, out_features=512, bias=False))
103 -> ('encoder.layers.0.feed_forward.sublayer.experts.7', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
104 -> ('encoder.layers.0.feed_forward.sublayer.experts.7.w1', Linear(in_features=128, out_features=512, bias=False))
105 -> ('encoder.layers.0.feed_forward.sublayer.experts.7.w2', Linear(in_features=512, out_features=128, bias=False))
106 -> ('encoder.layers.0.feed_forward.sublayer.experts.7.w3', Linear(in_features=128, out_features=512, bias=False))
107 -> ('encoder.layers.0.feed_forward.sublayer.gate', Linear(in_features=128, out_features=8, bias=False))
108 -> ('encoder.layers.0.feed_forward.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
109 -> ('encoder.layers.0.feed_forward.dropout', Dropout(p=0.1, inplace=False))
110 -> ('encoder.layers.0.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
111 -> ('encoder.layers.1', MoE_TransformerGraphEncoderLayer(
  (attention): Residual(
    (sublayer): MultiHeadAttention(
      (heads): ModuleList(
        (0-7): 8 x AttentionHead(
          (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        )
      )
      (linear): Linear(in_features=256, out_features=128, bias=True)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (feed_forward): Residual(
    (sublayer): MoeLayer(
      (experts): ModuleList(
        (0-7): 8 x FeedForward(
          (w1): Linear(in_features=128, out_features=512, bias=False)
          (w2): Linear(in_features=512, out_features=128, bias=False)
          (w3): Linear(in_features=128, out_features=512, bias=False)
        )
      )
      (gate): Linear(in_features=128, out_features=8, bias=False)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
))
112 -> ('encoder.layers.1.attention', Residual(
  (sublayer): MultiHeadAttention(
    (heads): ModuleList(
      (0-7): 8 x AttentionHead(
        (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (linear): Linear(in_features=256, out_features=128, bias=True)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
))
113 -> ('encoder.layers.1.attention.sublayer', MultiHeadAttention(
  (heads): ModuleList(
    (0-7): 8 x AttentionHead(
      (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (linear): Linear(in_features=256, out_features=128, bias=True)
))
114 -> ('encoder.layers.1.attention.sublayer.heads', ModuleList(
  (0-7): 8 x AttentionHead(
    (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  )
))
115 -> ('encoder.layers.1.attention.sublayer.heads.0', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
116 -> ('encoder.layers.1.attention.sublayer.heads.0.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
117 -> ('encoder.layers.1.attention.sublayer.heads.0.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
118 -> ('encoder.layers.1.attention.sublayer.heads.0.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
119 -> ('encoder.layers.1.attention.sublayer.heads.1', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
120 -> ('encoder.layers.1.attention.sublayer.heads.1.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
121 -> ('encoder.layers.1.attention.sublayer.heads.1.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
122 -> ('encoder.layers.1.attention.sublayer.heads.1.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
123 -> ('encoder.layers.1.attention.sublayer.heads.2', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
124 -> ('encoder.layers.1.attention.sublayer.heads.2.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
125 -> ('encoder.layers.1.attention.sublayer.heads.2.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
126 -> ('encoder.layers.1.attention.sublayer.heads.2.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
127 -> ('encoder.layers.1.attention.sublayer.heads.3', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
128 -> ('encoder.layers.1.attention.sublayer.heads.3.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
129 -> ('encoder.layers.1.attention.sublayer.heads.3.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
130 -> ('encoder.layers.1.attention.sublayer.heads.3.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
131 -> ('encoder.layers.1.attention.sublayer.heads.4', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
132 -> ('encoder.layers.1.attention.sublayer.heads.4.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
133 -> ('encoder.layers.1.attention.sublayer.heads.4.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
134 -> ('encoder.layers.1.attention.sublayer.heads.4.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
135 -> ('encoder.layers.1.attention.sublayer.heads.5', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
136 -> ('encoder.layers.1.attention.sublayer.heads.5.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
137 -> ('encoder.layers.1.attention.sublayer.heads.5.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
138 -> ('encoder.layers.1.attention.sublayer.heads.5.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
139 -> ('encoder.layers.1.attention.sublayer.heads.6', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
140 -> ('encoder.layers.1.attention.sublayer.heads.6.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
141 -> ('encoder.layers.1.attention.sublayer.heads.6.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
142 -> ('encoder.layers.1.attention.sublayer.heads.6.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
143 -> ('encoder.layers.1.attention.sublayer.heads.7', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
144 -> ('encoder.layers.1.attention.sublayer.heads.7.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
145 -> ('encoder.layers.1.attention.sublayer.heads.7.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
146 -> ('encoder.layers.1.attention.sublayer.heads.7.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
147 -> ('encoder.layers.1.attention.sublayer.linear', Linear(in_features=256, out_features=128, bias=True))
148 -> ('encoder.layers.1.attention.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
149 -> ('encoder.layers.1.attention.dropout', Dropout(p=0.1, inplace=False))
150 -> ('encoder.layers.1.feed_forward', Residual(
  (sublayer): MoeLayer(
    (experts): ModuleList(
      (0-7): 8 x FeedForward(
        (w1): Linear(in_features=128, out_features=512, bias=False)
        (w2): Linear(in_features=512, out_features=128, bias=False)
        (w3): Linear(in_features=128, out_features=512, bias=False)
      )
    )
    (gate): Linear(in_features=128, out_features=8, bias=False)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
))
151 -> ('encoder.layers.1.feed_forward.sublayer', MoeLayer(
  (experts): ModuleList(
    (0-7): 8 x FeedForward(
      (w1): Linear(in_features=128, out_features=512, bias=False)
      (w2): Linear(in_features=512, out_features=128, bias=False)
      (w3): Linear(in_features=128, out_features=512, bias=False)
    )
  )
  (gate): Linear(in_features=128, out_features=8, bias=False)
))
152 -> ('encoder.layers.1.feed_forward.sublayer.experts', ModuleList(
  (0-7): 8 x FeedForward(
    (w1): Linear(in_features=128, out_features=512, bias=False)
    (w2): Linear(in_features=512, out_features=128, bias=False)
    (w3): Linear(in_features=128, out_features=512, bias=False)
  )
))
153 -> ('encoder.layers.1.feed_forward.sublayer.experts.0', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
154 -> ('encoder.layers.1.feed_forward.sublayer.experts.0.w1', Linear(in_features=128, out_features=512, bias=False))
155 -> ('encoder.layers.1.feed_forward.sublayer.experts.0.w2', Linear(in_features=512, out_features=128, bias=False))
156 -> ('encoder.layers.1.feed_forward.sublayer.experts.0.w3', Linear(in_features=128, out_features=512, bias=False))
157 -> ('encoder.layers.1.feed_forward.sublayer.experts.1', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
158 -> ('encoder.layers.1.feed_forward.sublayer.experts.1.w1', Linear(in_features=128, out_features=512, bias=False))
159 -> ('encoder.layers.1.feed_forward.sublayer.experts.1.w2', Linear(in_features=512, out_features=128, bias=False))
160 -> ('encoder.layers.1.feed_forward.sublayer.experts.1.w3', Linear(in_features=128, out_features=512, bias=False))
161 -> ('encoder.layers.1.feed_forward.sublayer.experts.2', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
162 -> ('encoder.layers.1.feed_forward.sublayer.experts.2.w1', Linear(in_features=128, out_features=512, bias=False))
163 -> ('encoder.layers.1.feed_forward.sublayer.experts.2.w2', Linear(in_features=512, out_features=128, bias=False))
164 -> ('encoder.layers.1.feed_forward.sublayer.experts.2.w3', Linear(in_features=128, out_features=512, bias=False))
165 -> ('encoder.layers.1.feed_forward.sublayer.experts.3', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
166 -> ('encoder.layers.1.feed_forward.sublayer.experts.3.w1', Linear(in_features=128, out_features=512, bias=False))
167 -> ('encoder.layers.1.feed_forward.sublayer.experts.3.w2', Linear(in_features=512, out_features=128, bias=False))
168 -> ('encoder.layers.1.feed_forward.sublayer.experts.3.w3', Linear(in_features=128, out_features=512, bias=False))
169 -> ('encoder.layers.1.feed_forward.sublayer.experts.4', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
170 -> ('encoder.layers.1.feed_forward.sublayer.experts.4.w1', Linear(in_features=128, out_features=512, bias=False))
171 -> ('encoder.layers.1.feed_forward.sublayer.experts.4.w2', Linear(in_features=512, out_features=128, bias=False))
172 -> ('encoder.layers.1.feed_forward.sublayer.experts.4.w3', Linear(in_features=128, out_features=512, bias=False))
173 -> ('encoder.layers.1.feed_forward.sublayer.experts.5', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
174 -> ('encoder.layers.1.feed_forward.sublayer.experts.5.w1', Linear(in_features=128, out_features=512, bias=False))
175 -> ('encoder.layers.1.feed_forward.sublayer.experts.5.w2', Linear(in_features=512, out_features=128, bias=False))
176 -> ('encoder.layers.1.feed_forward.sublayer.experts.5.w3', Linear(in_features=128, out_features=512, bias=False))
177 -> ('encoder.layers.1.feed_forward.sublayer.experts.6', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
178 -> ('encoder.layers.1.feed_forward.sublayer.experts.6.w1', Linear(in_features=128, out_features=512, bias=False))
179 -> ('encoder.layers.1.feed_forward.sublayer.experts.6.w2', Linear(in_features=512, out_features=128, bias=False))
180 -> ('encoder.layers.1.feed_forward.sublayer.experts.6.w3', Linear(in_features=128, out_features=512, bias=False))
181 -> ('encoder.layers.1.feed_forward.sublayer.experts.7', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
182 -> ('encoder.layers.1.feed_forward.sublayer.experts.7.w1', Linear(in_features=128, out_features=512, bias=False))
183 -> ('encoder.layers.1.feed_forward.sublayer.experts.7.w2', Linear(in_features=512, out_features=128, bias=False))
184 -> ('encoder.layers.1.feed_forward.sublayer.experts.7.w3', Linear(in_features=128, out_features=512, bias=False))
185 -> ('encoder.layers.1.feed_forward.sublayer.gate', Linear(in_features=128, out_features=8, bias=False))
186 -> ('encoder.layers.1.feed_forward.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
187 -> ('encoder.layers.1.feed_forward.dropout', Dropout(p=0.1, inplace=False))
188 -> ('encoder.layers.1.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
189 -> ('encoder.layers.2', MoE_TransformerGraphEncoderLayer(
  (attention): Residual(
    (sublayer): MultiHeadAttention(
      (heads): ModuleList(
        (0-7): 8 x AttentionHead(
          (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        )
      )
      (linear): Linear(in_features=256, out_features=128, bias=True)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (feed_forward): Residual(
    (sublayer): MoeLayer(
      (experts): ModuleList(
        (0-7): 8 x FeedForward(
          (w1): Linear(in_features=128, out_features=512, bias=False)
          (w2): Linear(in_features=512, out_features=128, bias=False)
          (w3): Linear(in_features=128, out_features=512, bias=False)
        )
      )
      (gate): Linear(in_features=128, out_features=8, bias=False)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
))
190 -> ('encoder.layers.2.attention', Residual(
  (sublayer): MultiHeadAttention(
    (heads): ModuleList(
      (0-7): 8 x AttentionHead(
        (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (linear): Linear(in_features=256, out_features=128, bias=True)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
))
191 -> ('encoder.layers.2.attention.sublayer', MultiHeadAttention(
  (heads): ModuleList(
    (0-7): 8 x AttentionHead(
      (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (linear): Linear(in_features=256, out_features=128, bias=True)
))
192 -> ('encoder.layers.2.attention.sublayer.heads', ModuleList(
  (0-7): 8 x AttentionHead(
    (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  )
))
193 -> ('encoder.layers.2.attention.sublayer.heads.0', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
194 -> ('encoder.layers.2.attention.sublayer.heads.0.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
195 -> ('encoder.layers.2.attention.sublayer.heads.0.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
196 -> ('encoder.layers.2.attention.sublayer.heads.0.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
197 -> ('encoder.layers.2.attention.sublayer.heads.1', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
198 -> ('encoder.layers.2.attention.sublayer.heads.1.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
199 -> ('encoder.layers.2.attention.sublayer.heads.1.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
200 -> ('encoder.layers.2.attention.sublayer.heads.1.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
201 -> ('encoder.layers.2.attention.sublayer.heads.2', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
202 -> ('encoder.layers.2.attention.sublayer.heads.2.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
203 -> ('encoder.layers.2.attention.sublayer.heads.2.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
204 -> ('encoder.layers.2.attention.sublayer.heads.2.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
205 -> ('encoder.layers.2.attention.sublayer.heads.3', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
206 -> ('encoder.layers.2.attention.sublayer.heads.3.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
207 -> ('encoder.layers.2.attention.sublayer.heads.3.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
208 -> ('encoder.layers.2.attention.sublayer.heads.3.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
209 -> ('encoder.layers.2.attention.sublayer.heads.4', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
210 -> ('encoder.layers.2.attention.sublayer.heads.4.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
211 -> ('encoder.layers.2.attention.sublayer.heads.4.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
212 -> ('encoder.layers.2.attention.sublayer.heads.4.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
213 -> ('encoder.layers.2.attention.sublayer.heads.5', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
214 -> ('encoder.layers.2.attention.sublayer.heads.5.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
215 -> ('encoder.layers.2.attention.sublayer.heads.5.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
216 -> ('encoder.layers.2.attention.sublayer.heads.5.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
217 -> ('encoder.layers.2.attention.sublayer.heads.6', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
218 -> ('encoder.layers.2.attention.sublayer.heads.6.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
219 -> ('encoder.layers.2.attention.sublayer.heads.6.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
220 -> ('encoder.layers.2.attention.sublayer.heads.6.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
221 -> ('encoder.layers.2.attention.sublayer.heads.7', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
222 -> ('encoder.layers.2.attention.sublayer.heads.7.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
223 -> ('encoder.layers.2.attention.sublayer.heads.7.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
224 -> ('encoder.layers.2.attention.sublayer.heads.7.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
225 -> ('encoder.layers.2.attention.sublayer.linear', Linear(in_features=256, out_features=128, bias=True))
226 -> ('encoder.layers.2.attention.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
227 -> ('encoder.layers.2.attention.dropout', Dropout(p=0.1, inplace=False))
228 -> ('encoder.layers.2.feed_forward', Residual(
  (sublayer): MoeLayer(
    (experts): ModuleList(
      (0-7): 8 x FeedForward(
        (w1): Linear(in_features=128, out_features=512, bias=False)
        (w2): Linear(in_features=512, out_features=128, bias=False)
        (w3): Linear(in_features=128, out_features=512, bias=False)
      )
    )
    (gate): Linear(in_features=128, out_features=8, bias=False)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
))
229 -> ('encoder.layers.2.feed_forward.sublayer', MoeLayer(
  (experts): ModuleList(
    (0-7): 8 x FeedForward(
      (w1): Linear(in_features=128, out_features=512, bias=False)
      (w2): Linear(in_features=512, out_features=128, bias=False)
      (w3): Linear(in_features=128, out_features=512, bias=False)
    )
  )
  (gate): Linear(in_features=128, out_features=8, bias=False)
))
230 -> ('encoder.layers.2.feed_forward.sublayer.experts', ModuleList(
  (0-7): 8 x FeedForward(
    (w1): Linear(in_features=128, out_features=512, bias=False)
    (w2): Linear(in_features=512, out_features=128, bias=False)
    (w3): Linear(in_features=128, out_features=512, bias=False)
  )
))
231 -> ('encoder.layers.2.feed_forward.sublayer.experts.0', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
232 -> ('encoder.layers.2.feed_forward.sublayer.experts.0.w1', Linear(in_features=128, out_features=512, bias=False))
233 -> ('encoder.layers.2.feed_forward.sublayer.experts.0.w2', Linear(in_features=512, out_features=128, bias=False))
234 -> ('encoder.layers.2.feed_forward.sublayer.experts.0.w3', Linear(in_features=128, out_features=512, bias=False))
235 -> ('encoder.layers.2.feed_forward.sublayer.experts.1', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
236 -> ('encoder.layers.2.feed_forward.sublayer.experts.1.w1', Linear(in_features=128, out_features=512, bias=False))
237 -> ('encoder.layers.2.feed_forward.sublayer.experts.1.w2', Linear(in_features=512, out_features=128, bias=False))
238 -> ('encoder.layers.2.feed_forward.sublayer.experts.1.w3', Linear(in_features=128, out_features=512, bias=False))
239 -> ('encoder.layers.2.feed_forward.sublayer.experts.2', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
240 -> ('encoder.layers.2.feed_forward.sublayer.experts.2.w1', Linear(in_features=128, out_features=512, bias=False))
241 -> ('encoder.layers.2.feed_forward.sublayer.experts.2.w2', Linear(in_features=512, out_features=128, bias=False))
242 -> ('encoder.layers.2.feed_forward.sublayer.experts.2.w3', Linear(in_features=128, out_features=512, bias=False))
243 -> ('encoder.layers.2.feed_forward.sublayer.experts.3', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
244 -> ('encoder.layers.2.feed_forward.sublayer.experts.3.w1', Linear(in_features=128, out_features=512, bias=False))
245 -> ('encoder.layers.2.feed_forward.sublayer.experts.3.w2', Linear(in_features=512, out_features=128, bias=False))
246 -> ('encoder.layers.2.feed_forward.sublayer.experts.3.w3', Linear(in_features=128, out_features=512, bias=False))
247 -> ('encoder.layers.2.feed_forward.sublayer.experts.4', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
248 -> ('encoder.layers.2.feed_forward.sublayer.experts.4.w1', Linear(in_features=128, out_features=512, bias=False))
249 -> ('encoder.layers.2.feed_forward.sublayer.experts.4.w2', Linear(in_features=512, out_features=128, bias=False))
250 -> ('encoder.layers.2.feed_forward.sublayer.experts.4.w3', Linear(in_features=128, out_features=512, bias=False))
251 -> ('encoder.layers.2.feed_forward.sublayer.experts.5', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
252 -> ('encoder.layers.2.feed_forward.sublayer.experts.5.w1', Linear(in_features=128, out_features=512, bias=False))
253 -> ('encoder.layers.2.feed_forward.sublayer.experts.5.w2', Linear(in_features=512, out_features=128, bias=False))
254 -> ('encoder.layers.2.feed_forward.sublayer.experts.5.w3', Linear(in_features=128, out_features=512, bias=False))
255 -> ('encoder.layers.2.feed_forward.sublayer.experts.6', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
256 -> ('encoder.layers.2.feed_forward.sublayer.experts.6.w1', Linear(in_features=128, out_features=512, bias=False))
257 -> ('encoder.layers.2.feed_forward.sublayer.experts.6.w2', Linear(in_features=512, out_features=128, bias=False))
258 -> ('encoder.layers.2.feed_forward.sublayer.experts.6.w3', Linear(in_features=128, out_features=512, bias=False))
259 -> ('encoder.layers.2.feed_forward.sublayer.experts.7', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
260 -> ('encoder.layers.2.feed_forward.sublayer.experts.7.w1', Linear(in_features=128, out_features=512, bias=False))
261 -> ('encoder.layers.2.feed_forward.sublayer.experts.7.w2', Linear(in_features=512, out_features=128, bias=False))
262 -> ('encoder.layers.2.feed_forward.sublayer.experts.7.w3', Linear(in_features=128, out_features=512, bias=False))
263 -> ('encoder.layers.2.feed_forward.sublayer.gate', Linear(in_features=128, out_features=8, bias=False))
264 -> ('encoder.layers.2.feed_forward.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
265 -> ('encoder.layers.2.feed_forward.dropout', Dropout(p=0.1, inplace=False))
266 -> ('encoder.layers.2.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
267 -> ('encoder.layers.3', MoE_TransformerGraphEncoderLayer(
  (attention): Residual(
    (sublayer): MultiHeadAttention(
      (heads): ModuleList(
        (0-7): 8 x AttentionHead(
          (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
          (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        )
      )
      (linear): Linear(in_features=256, out_features=128, bias=True)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (feed_forward): Residual(
    (sublayer): MoeLayer(
      (experts): ModuleList(
        (0-7): 8 x FeedForward(
          (w1): Linear(in_features=128, out_features=512, bias=False)
          (w2): Linear(in_features=512, out_features=128, bias=False)
          (w3): Linear(in_features=128, out_features=512, bias=False)
        )
      )
      (gate): Linear(in_features=128, out_features=8, bias=False)
    )
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
))
268 -> ('encoder.layers.3.attention', Residual(
  (sublayer): MultiHeadAttention(
    (heads): ModuleList(
      (0-7): 8 x AttentionHead(
        (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
        (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (linear): Linear(in_features=256, out_features=128, bias=True)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
))
269 -> ('encoder.layers.3.attention.sublayer', MultiHeadAttention(
  (heads): ModuleList(
    (0-7): 8 x AttentionHead(
      (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (linear): Linear(in_features=256, out_features=128, bias=True)
))
270 -> ('encoder.layers.3.attention.sublayer.heads', ModuleList(
  (0-7): 8 x AttentionHead(
    (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
    (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  )
))
271 -> ('encoder.layers.3.attention.sublayer.heads.0', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
272 -> ('encoder.layers.3.attention.sublayer.heads.0.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
273 -> ('encoder.layers.3.attention.sublayer.heads.0.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
274 -> ('encoder.layers.3.attention.sublayer.heads.0.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
275 -> ('encoder.layers.3.attention.sublayer.heads.1', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
276 -> ('encoder.layers.3.attention.sublayer.heads.1.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
277 -> ('encoder.layers.3.attention.sublayer.heads.1.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
278 -> ('encoder.layers.3.attention.sublayer.heads.1.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
279 -> ('encoder.layers.3.attention.sublayer.heads.2', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
280 -> ('encoder.layers.3.attention.sublayer.heads.2.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
281 -> ('encoder.layers.3.attention.sublayer.heads.2.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
282 -> ('encoder.layers.3.attention.sublayer.heads.2.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
283 -> ('encoder.layers.3.attention.sublayer.heads.3', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
284 -> ('encoder.layers.3.attention.sublayer.heads.3.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
285 -> ('encoder.layers.3.attention.sublayer.heads.3.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
286 -> ('encoder.layers.3.attention.sublayer.heads.3.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
287 -> ('encoder.layers.3.attention.sublayer.heads.4', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
288 -> ('encoder.layers.3.attention.sublayer.heads.4.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
289 -> ('encoder.layers.3.attention.sublayer.heads.4.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
290 -> ('encoder.layers.3.attention.sublayer.heads.4.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
291 -> ('encoder.layers.3.attention.sublayer.heads.5', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
292 -> ('encoder.layers.3.attention.sublayer.heads.5.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
293 -> ('encoder.layers.3.attention.sublayer.heads.5.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
294 -> ('encoder.layers.3.attention.sublayer.heads.5.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
295 -> ('encoder.layers.3.attention.sublayer.heads.6', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
296 -> ('encoder.layers.3.attention.sublayer.heads.6.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
297 -> ('encoder.layers.3.attention.sublayer.heads.6.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
298 -> ('encoder.layers.3.attention.sublayer.heads.6.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
299 -> ('encoder.layers.3.attention.sublayer.heads.7', AttentionHead(
  (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
  (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
))
300 -> ('encoder.layers.3.attention.sublayer.heads.7.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
301 -> ('encoder.layers.3.attention.sublayer.heads.7.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
302 -> ('encoder.layers.3.attention.sublayer.heads.7.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)))
303 -> ('encoder.layers.3.attention.sublayer.linear', Linear(in_features=256, out_features=128, bias=True))
304 -> ('encoder.layers.3.attention.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
305 -> ('encoder.layers.3.attention.dropout', Dropout(p=0.1, inplace=False))
306 -> ('encoder.layers.3.feed_forward', Residual(
  (sublayer): MoeLayer(
    (experts): ModuleList(
      (0-7): 8 x FeedForward(
        (w1): Linear(in_features=128, out_features=512, bias=False)
        (w2): Linear(in_features=512, out_features=128, bias=False)
        (w3): Linear(in_features=128, out_features=512, bias=False)
      )
    )
    (gate): Linear(in_features=128, out_features=8, bias=False)
  )
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
))
307 -> ('encoder.layers.3.feed_forward.sublayer', MoeLayer(
  (experts): ModuleList(
    (0-7): 8 x FeedForward(
      (w1): Linear(in_features=128, out_features=512, bias=False)
      (w2): Linear(in_features=512, out_features=128, bias=False)
      (w3): Linear(in_features=128, out_features=512, bias=False)
    )
  )
  (gate): Linear(in_features=128, out_features=8, bias=False)
))
308 -> ('encoder.layers.3.feed_forward.sublayer.experts', ModuleList(
  (0-7): 8 x FeedForward(
    (w1): Linear(in_features=128, out_features=512, bias=False)
    (w2): Linear(in_features=512, out_features=128, bias=False)
    (w3): Linear(in_features=128, out_features=512, bias=False)
  )
))
309 -> ('encoder.layers.3.feed_forward.sublayer.experts.0', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
310 -> ('encoder.layers.3.feed_forward.sublayer.experts.0.w1', Linear(in_features=128, out_features=512, bias=False))
311 -> ('encoder.layers.3.feed_forward.sublayer.experts.0.w2', Linear(in_features=512, out_features=128, bias=False))
312 -> ('encoder.layers.3.feed_forward.sublayer.experts.0.w3', Linear(in_features=128, out_features=512, bias=False))
313 -> ('encoder.layers.3.feed_forward.sublayer.experts.1', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
314 -> ('encoder.layers.3.feed_forward.sublayer.experts.1.w1', Linear(in_features=128, out_features=512, bias=False))
315 -> ('encoder.layers.3.feed_forward.sublayer.experts.1.w2', Linear(in_features=512, out_features=128, bias=False))
316 -> ('encoder.layers.3.feed_forward.sublayer.experts.1.w3', Linear(in_features=128, out_features=512, bias=False))
317 -> ('encoder.layers.3.feed_forward.sublayer.experts.2', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
318 -> ('encoder.layers.3.feed_forward.sublayer.experts.2.w1', Linear(in_features=128, out_features=512, bias=False))
319 -> ('encoder.layers.3.feed_forward.sublayer.experts.2.w2', Linear(in_features=512, out_features=128, bias=False))
320 -> ('encoder.layers.3.feed_forward.sublayer.experts.2.w3', Linear(in_features=128, out_features=512, bias=False))
321 -> ('encoder.layers.3.feed_forward.sublayer.experts.3', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
322 -> ('encoder.layers.3.feed_forward.sublayer.experts.3.w1', Linear(in_features=128, out_features=512, bias=False))
323 -> ('encoder.layers.3.feed_forward.sublayer.experts.3.w2', Linear(in_features=512, out_features=128, bias=False))
324 -> ('encoder.layers.3.feed_forward.sublayer.experts.3.w3', Linear(in_features=128, out_features=512, bias=False))
325 -> ('encoder.layers.3.feed_forward.sublayer.experts.4', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
326 -> ('encoder.layers.3.feed_forward.sublayer.experts.4.w1', Linear(in_features=128, out_features=512, bias=False))
327 -> ('encoder.layers.3.feed_forward.sublayer.experts.4.w2', Linear(in_features=512, out_features=128, bias=False))
328 -> ('encoder.layers.3.feed_forward.sublayer.experts.4.w3', Linear(in_features=128, out_features=512, bias=False))
329 -> ('encoder.layers.3.feed_forward.sublayer.experts.5', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
330 -> ('encoder.layers.3.feed_forward.sublayer.experts.5.w1', Linear(in_features=128, out_features=512, bias=False))
331 -> ('encoder.layers.3.feed_forward.sublayer.experts.5.w2', Linear(in_features=512, out_features=128, bias=False))
332 -> ('encoder.layers.3.feed_forward.sublayer.experts.5.w3', Linear(in_features=128, out_features=512, bias=False))
333 -> ('encoder.layers.3.feed_forward.sublayer.experts.6', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
334 -> ('encoder.layers.3.feed_forward.sublayer.experts.6.w1', Linear(in_features=128, out_features=512, bias=False))
335 -> ('encoder.layers.3.feed_forward.sublayer.experts.6.w2', Linear(in_features=512, out_features=128, bias=False))
336 -> ('encoder.layers.3.feed_forward.sublayer.experts.6.w3', Linear(in_features=128, out_features=512, bias=False))
337 -> ('encoder.layers.3.feed_forward.sublayer.experts.7', FeedForward(
  (w1): Linear(in_features=128, out_features=512, bias=False)
  (w2): Linear(in_features=512, out_features=128, bias=False)
  (w3): Linear(in_features=128, out_features=512, bias=False)
))
338 -> ('encoder.layers.3.feed_forward.sublayer.experts.7.w1', Linear(in_features=128, out_features=512, bias=False))
339 -> ('encoder.layers.3.feed_forward.sublayer.experts.7.w2', Linear(in_features=512, out_features=128, bias=False))
340 -> ('encoder.layers.3.feed_forward.sublayer.experts.7.w3', Linear(in_features=128, out_features=512, bias=False))
341 -> ('encoder.layers.3.feed_forward.sublayer.gate', Linear(in_features=128, out_features=8, bias=False))
342 -> ('encoder.layers.3.feed_forward.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
343 -> ('encoder.layers.3.feed_forward.dropout', Dropout(p=0.1, inplace=False))
344 -> ('encoder.layers.3.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
345 -> ('encoder.positional_encoder', PositionalEncoder(
  (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
))
346 -> ('encoder.positional_encoder.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
347 -> ('out', Sequential(
  (0): Linear(in_features=128, out_features=128, bias=True)
  (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (2): Linear(in_features=128, out_features=14, bias=True)
))
348 -> ('out.0', Linear(in_features=128, out_features=128, bias=True))
349 -> ('out.1', LayerNorm((128,), eps=1e-05, elementwise_affine=True))
350 -> ('out.2', Linear(in_features=128, out_features=14, bias=True))
 Counting the model summary and the Number of parameters MoE_GCN model
model_summary :
model_summary
Layer_name							Number of Parameters
====================================================================================================
										
MulticlassAccuracy()			1548
MulticlassAccuracy()			128
MulticlassAccuracy()			128
MulticlassF1Score()			64
MulticlassF1Score()			1484
MulticlassF1Score()			2112
MulticlassConfusionMatrix()			2112
SGCN(
  (conv_layers): ModuleList(
    (0): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
    (1): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
    (2): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
  )
)			2112
MoE_TransformerGraphEncoder(
  (layers): ModuleList(
    (0-3): 4 x MoE_TransformerGraphEncoderLayer(
      (attention): Residual(
        (sublayer): MultiHeadAttention(
          (heads): ModuleList(
            (0-7): 8 x AttentionHead(
              (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
              (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
              (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (linear): Linear(in_features=256, out_features=128, bias=True)
        )
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (feed_forward): Residual(
        (sublayer): MoeLayer(
          (experts): ModuleList(
            (0-7): 8 x FeedForward(
              (w1): Linear(in_features=128, out_features=512, bias=False)
              (w2): Linear(in_features=512, out_features=128, bias=False)
              (w3): Linear(in_features=128, out_features=512, bias=False)
            )
          )
          (gate): Linear(in_features=128, out_features=8, bias=False)
        )
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    )
  )
  (positional_encoder): PositionalEncoder(
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  )
)			128
Sequential(
  (0): Linear(in_features=128, out_features=128, bias=True)
  (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (2): Linear(in_features=128, out_features=14, bias=True)
)			9644
====================================================================================================
Total Params:19460
model_summary
Layer_name							Number of Parameters
====================================================================================================
										
MulticlassAccuracy()			1548
MulticlassAccuracy()			128
MulticlassAccuracy()			128
MulticlassF1Score()			64
MulticlassF1Score()			1484
MulticlassF1Score()			2112
MulticlassConfusionMatrix()			2112
SGCN(
  (conv_layers): ModuleList(
    (0): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
    (1): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
    (2): unit_gcn(
      (conv_list): ModuleList(
        (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
      )
      (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): Mish()
    )
  )
)			2112
MoE_TransformerGraphEncoder(
  (layers): ModuleList(
    (0-3): 4 x MoE_TransformerGraphEncoderLayer(
      (attention): Residual(
        (sublayer): MultiHeadAttention(
          (heads): ModuleList(
            (0-7): 8 x AttentionHead(
              (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
              (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
              (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (linear): Linear(in_features=256, out_features=128, bias=True)
        )
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (feed_forward): Residual(
        (sublayer): MoeLayer(
          (experts): ModuleList(
            (0-7): 8 x FeedForward(
              (w1): Linear(in_features=128, out_features=512, bias=False)
              (w2): Linear(in_features=512, out_features=128, bias=False)
              (w3): Linear(in_features=128, out_features=512, bias=False)
            )
          )
          (gate): Linear(in_features=128, out_features=8, bias=False)
        )
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
    )
  )
  (positional_encoder): PositionalEncoder(
    (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  )
)			128
Sequential(
  (0): Linear(in_features=128, out_features=128, bias=True)
  (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (2): Linear(in_features=128, out_features=14, bias=True)
)			9644
====================================================================================================
Total Params:19460
 Counting the parameters MoE_GCN model
+------------------------------------------------------------+------------+
|                          Modules                           | Parameters |
+------------------------------------------------------------+------------+
|                   gcn.conv_layers.0.mask                   |    1452    |
|            gcn.conv_layers.0.conv_list.0.weight            |     96     |
|             gcn.conv_layers.0.conv_list.0.bias             |     32     |
|            gcn.conv_layers.0.conv_list.1.weight            |     96     |
|             gcn.conv_layers.0.conv_list.1.bias             |     32     |
|            gcn.conv_layers.0.conv_list.2.weight            |     96     |
|             gcn.conv_layers.0.conv_list.2.bias             |     32     |
|                gcn.conv_layers.0.bn.weight                 |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                   gcn.conv_layers.1.mask                   |    1452    |
|            gcn.conv_layers.1.conv_list.0.weight            |    2048    |
|             gcn.conv_layers.1.conv_list.0.bias             |     64     |
|            gcn.conv_layers.1.conv_list.1.weight            |    2048    |
|             gcn.conv_layers.1.conv_list.1.bias             |     64     |
|            gcn.conv_layers.1.conv_list.2.weight            |    2048    |
|             gcn.conv_layers.1.conv_list.2.bias             |     64     |
|                gcn.conv_layers.1.bn.weight                 |     64     |
|                 gcn.conv_layers.1.bn.bias                  |     64     |
|                   gcn.conv_layers.2.mask                   |    1452    |
|            gcn.conv_layers.2.conv_list.0.weight            |    8192    |
|             gcn.conv_layers.2.conv_list.0.bias             |    128     |
|            gcn.conv_layers.2.conv_list.1.weight            |    8192    |
|             gcn.conv_layers.2.conv_list.1.bias             |    128     |
|            gcn.conv_layers.2.conv_list.2.weight            |    8192    |
|             gcn.conv_layers.2.conv_list.2.bias             |    128     |
|                gcn.conv_layers.2.bn.weight                 |    128     |
|                 gcn.conv_layers.2.bn.bias                  |    128     |
| encoder.layers.0.attention.sublayer.heads.0.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.0.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.0.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.0.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.0.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.0.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.1.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.1.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.1.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.1.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.1.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.1.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.2.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.2.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.2.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.2.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.2.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.2.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.3.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.3.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.3.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.3.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.3.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.3.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.4.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.4.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.4.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.4.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.4.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.4.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.5.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.5.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.5.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.5.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.5.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.5.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.6.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.6.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.6.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.6.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.6.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.6.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.7.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.7.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.7.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.7.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.7.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.7.v_conv.bias   |     32     |
|     encoder.layers.0.attention.sublayer.linear.weight      |   32768    |
|      encoder.layers.0.attention.sublayer.linear.bias       |    128     |
|           encoder.layers.0.attention.norm.weight           |    128     |
|            encoder.layers.0.attention.norm.bias            |    128     |
| encoder.layers.0.feed_forward.sublayer.experts.0.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.0.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.0.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.1.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.1.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.1.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.2.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.2.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.2.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.3.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.3.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.3.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.4.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.4.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.4.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.5.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.5.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.5.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.6.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.6.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.6.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.7.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.7.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.7.w3.weight |   65536    |
|     encoder.layers.0.feed_forward.sublayer.gate.weight     |    1024    |
|         encoder.layers.0.feed_forward.norm.weight          |    128     |
|          encoder.layers.0.feed_forward.norm.bias           |    128     |
|                encoder.layers.0.norm.weight                |    128     |
|                 encoder.layers.0.norm.bias                 |    128     |
| encoder.layers.1.attention.sublayer.heads.0.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.0.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.0.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.0.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.0.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.0.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.1.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.1.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.1.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.1.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.1.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.1.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.2.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.2.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.2.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.2.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.2.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.2.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.3.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.3.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.3.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.3.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.3.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.3.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.4.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.4.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.4.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.4.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.4.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.4.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.5.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.5.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.5.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.5.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.5.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.5.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.6.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.6.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.6.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.6.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.6.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.6.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.7.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.7.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.7.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.7.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.7.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.7.v_conv.bias   |     32     |
|     encoder.layers.1.attention.sublayer.linear.weight      |   32768    |
|      encoder.layers.1.attention.sublayer.linear.bias       |    128     |
|           encoder.layers.1.attention.norm.weight           |    128     |
|            encoder.layers.1.attention.norm.bias            |    128     |
| encoder.layers.1.feed_forward.sublayer.experts.0.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.0.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.0.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.1.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.1.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.1.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.2.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.2.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.2.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.3.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.3.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.3.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.4.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.4.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.4.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.5.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.5.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.5.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.6.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.6.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.6.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.7.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.7.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.7.w3.weight |   65536    |
|     encoder.layers.1.feed_forward.sublayer.gate.weight     |    1024    |
|         encoder.layers.1.feed_forward.norm.weight          |    128     |
|          encoder.layers.1.feed_forward.norm.bias           |    128     |
|                encoder.layers.1.norm.weight                |    128     |
|                 encoder.layers.1.norm.bias                 |    128     |
| encoder.layers.2.attention.sublayer.heads.0.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.0.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.0.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.0.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.0.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.0.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.1.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.1.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.1.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.1.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.1.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.1.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.2.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.2.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.2.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.2.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.2.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.2.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.3.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.3.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.3.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.3.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.3.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.3.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.4.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.4.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.4.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.4.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.4.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.4.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.5.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.5.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.5.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.5.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.5.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.5.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.6.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.6.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.6.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.6.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.6.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.6.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.7.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.7.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.7.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.7.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.7.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.7.v_conv.bias   |     32     |
|     encoder.layers.2.attention.sublayer.linear.weight      |   32768    |
|      encoder.layers.2.attention.sublayer.linear.bias       |    128     |
|           encoder.layers.2.attention.norm.weight           |    128     |
|            encoder.layers.2.attention.norm.bias            |    128     |
| encoder.layers.2.feed_forward.sublayer.experts.0.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.0.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.0.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.1.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.1.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.1.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.2.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.2.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.2.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.3.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.3.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.3.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.4.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.4.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.4.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.5.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.5.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.5.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.6.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.6.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.6.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.7.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.7.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.7.w3.weight |   65536    |
|     encoder.layers.2.feed_forward.sublayer.gate.weight     |    1024    |
|         encoder.layers.2.feed_forward.norm.weight          |    128     |
|          encoder.layers.2.feed_forward.norm.bias           |    128     |
|                encoder.layers.2.norm.weight                |    128     |
|                 encoder.layers.2.norm.bias                 |    128     |
| encoder.layers.3.attention.sublayer.heads.0.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.0.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.0.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.0.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.0.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.0.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.1.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.1.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.1.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.1.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.1.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.1.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.2.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.2.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.2.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.2.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.2.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.2.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.3.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.3.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.3.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.3.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.3.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.3.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.4.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.4.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.4.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.4.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.4.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.4.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.5.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.5.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.5.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.5.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.5.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.5.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.6.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.6.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.6.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.6.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.6.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.6.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.7.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.7.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.7.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.7.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.7.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.7.v_conv.bias   |     32     |
|     encoder.layers.3.attention.sublayer.linear.weight      |   32768    |
|      encoder.layers.3.attention.sublayer.linear.bias       |    128     |
|           encoder.layers.3.attention.norm.weight           |    128     |
|            encoder.layers.3.attention.norm.bias            |    128     |
| encoder.layers.3.feed_forward.sublayer.experts.0.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.0.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.0.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.1.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.1.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.1.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.2.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.2.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.2.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.3.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.3.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.3.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.4.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.4.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.4.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.5.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.5.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.5.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.6.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.6.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.6.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.7.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.7.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.7.w3.weight |   65536    |
|     encoder.layers.3.feed_forward.sublayer.gate.weight     |    1024    |
|         encoder.layers.3.feed_forward.norm.weight          |    128     |
|          encoder.layers.3.feed_forward.norm.bias           |    128     |
|                encoder.layers.3.norm.weight                |    128     |
|                 encoder.layers.3.norm.bias                 |    128     |
|           encoder.positional_encoder.norm.weight           |    128     |
|            encoder.positional_encoder.norm.bias            |    128     |
|                        out.0.weight                        |   16384    |
|                         out.0.bias                         |    128     |
|                        out.1.weight                        |    128     |
|                         out.1.bias                         |    128     |
|                        out.2.weight                        |    1792    |
|                         out.2.bias                         |     14     |
+------------------------------------------------------------+------------+
Total Trainable Params: 6881810
|                   gcn.conv_layers.1.mask                   |    1452    |
|            gcn.conv_layers.1.conv_list.0.weight            |    2048    |
|             gcn.conv_layers.1.conv_list.0.bias             |     64     |
|            gcn.conv_layers.1.conv_list.1.weight            |    2048    |
|             gcn.conv_layers.1.conv_list.1.bias             |     64     |
|            gcn.conv_layers.1.conv_list.2.weight            |    2048    |
|             gcn.conv_layers.1.conv_list.2.bias             |     64     |
|                gcn.conv_layers.1.bn.weight                 |     64     |
|                 gcn.conv_layers.1.bn.bias                  |     64     |
|                   gcn.conv_layers.2.mask                   |    1452    |
|            gcn.conv_layers.2.conv_list.0.weight            |    8192    |
|             gcn.conv_layers.2.conv_list.0.bias             |    128     |
|            gcn.conv_layers.2.conv_list.1.weight            |    8192    |
|             gcn.conv_layers.2.conv_list.1.bias             |    128     |
|            gcn.conv_layers.2.conv_list.2.weight            |    8192    |
|             gcn.conv_layers.2.conv_list.2.bias             |    128     |
|                gcn.conv_layers.2.bn.weight                 |    128     |
|                 gcn.conv_layers.2.bn.bias                  |    128     |
| encoder.layers.0.attention.sublayer.heads.0.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.0.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.0.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.0.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.0.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.0.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.1.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.1.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.1.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.1.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.1.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.1.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.2.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.2.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.2.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.2.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.2.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.2.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.3.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.3.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.3.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.3.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.3.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.3.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.4.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.4.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.4.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.4.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.4.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.4.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.5.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.5.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.5.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.5.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.5.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.5.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.6.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.6.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.6.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.6.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.6.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.6.v_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.7.q_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.7.q_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.7.k_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.7.k_conv.bias   |     32     |
| encoder.layers.0.attention.sublayer.heads.7.v_conv.weight  |    4096    |
|  encoder.layers.0.attention.sublayer.heads.7.v_conv.bias   |     32     |
|     encoder.layers.0.attention.sublayer.linear.weight      |   32768    |
|      encoder.layers.0.attention.sublayer.linear.bias       |    128     |
|           encoder.layers.0.attention.norm.weight           |    128     |
|            encoder.layers.0.attention.norm.bias            |    128     |
| encoder.layers.0.feed_forward.sublayer.experts.0.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.0.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.0.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.1.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.1.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.1.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.2.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.2.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.2.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.3.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.3.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.3.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.4.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.4.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.4.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.5.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.5.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.5.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.6.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.6.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.6.w3.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.7.w1.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.7.w2.weight |   65536    |
| encoder.layers.0.feed_forward.sublayer.experts.7.w3.weight |   65536    |
|     encoder.layers.0.feed_forward.sublayer.gate.weight     |    1024    |
|         encoder.layers.0.feed_forward.norm.weight          |    128     |
|          encoder.layers.0.feed_forward.norm.bias           |    128     |
|                encoder.layers.0.norm.weight                |    128     |
|                 encoder.layers.0.norm.bias                 |    128     |
| encoder.layers.1.attention.sublayer.heads.0.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.0.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.0.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.0.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.0.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.0.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.1.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.1.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.1.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.1.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.1.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.1.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.2.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.2.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.2.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.2.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.2.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.2.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.3.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.3.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.3.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.3.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.3.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.3.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.4.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.4.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.4.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.4.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.4.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.4.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.5.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.5.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.5.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.5.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.5.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.5.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.6.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.6.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.6.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.6.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.6.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.6.v_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.7.q_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.7.q_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.7.k_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.7.k_conv.bias   |     32     |
| encoder.layers.1.attention.sublayer.heads.7.v_conv.weight  |    4096    |
|  encoder.layers.1.attention.sublayer.heads.7.v_conv.bias   |     32     |
|     encoder.layers.1.attention.sublayer.linear.weight      |   32768    |
|      encoder.layers.1.attention.sublayer.linear.bias       |    128     |
|           encoder.layers.1.attention.norm.weight           |    128     |
|            encoder.layers.1.attention.norm.bias            |    128     |
| encoder.layers.1.feed_forward.sublayer.experts.0.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.0.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.0.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.1.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.1.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.1.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.2.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.2.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.2.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.3.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.3.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.3.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.4.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.4.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.4.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.5.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.5.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.5.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.6.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.6.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.6.w3.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.7.w1.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.7.w2.weight |   65536    |
| encoder.layers.1.feed_forward.sublayer.experts.7.w3.weight |   65536    |
|     encoder.layers.1.feed_forward.sublayer.gate.weight     |    1024    |
|         encoder.layers.1.feed_forward.norm.weight          |    128     |
|          encoder.layers.1.feed_forward.norm.bias           |    128     |
|                encoder.layers.1.norm.weight                |    128     |
|                 encoder.layers.1.norm.bias                 |    128     |
| encoder.layers.2.attention.sublayer.heads.0.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.0.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.0.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.0.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.0.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.0.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.1.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.1.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.1.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.1.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.1.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.1.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.2.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.2.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.2.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.2.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.2.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.2.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.3.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.3.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.3.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.3.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.3.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.3.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.4.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.4.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.4.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.4.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.4.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.4.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.5.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.5.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.5.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.5.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.5.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.5.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.6.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.6.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.6.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.6.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.6.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.6.v_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.7.q_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.7.q_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.7.k_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.7.k_conv.bias   |     32     |
| encoder.layers.2.attention.sublayer.heads.7.v_conv.weight  |    4096    |
|  encoder.layers.2.attention.sublayer.heads.7.v_conv.bias   |     32     |
|     encoder.layers.2.attention.sublayer.linear.weight      |   32768    |
|      encoder.layers.2.attention.sublayer.linear.bias       |    128     |
|           encoder.layers.2.attention.norm.weight           |    128     |
|            encoder.layers.2.attention.norm.bias            |    128     |
| encoder.layers.2.feed_forward.sublayer.experts.0.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.0.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.0.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.1.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.1.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.1.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.2.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.2.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.2.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.3.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.3.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.3.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.4.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.4.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.4.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.5.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.5.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.5.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.6.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.6.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.6.w3.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.7.w1.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.7.w2.weight |   65536    |
| encoder.layers.2.feed_forward.sublayer.experts.7.w3.weight |   65536    |
|     encoder.layers.2.feed_forward.sublayer.gate.weight     |    1024    |
|         encoder.layers.2.feed_forward.norm.weight          |    128     |
|          encoder.layers.2.feed_forward.norm.bias           |    128     |
|                encoder.layers.2.norm.weight                |    128     |
|                 encoder.layers.2.norm.bias                 |    128     |
| encoder.layers.3.attention.sublayer.heads.0.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.0.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.0.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.0.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.0.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.0.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.1.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.1.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.1.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.1.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.1.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.1.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.2.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.2.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.2.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.2.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.2.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.2.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.3.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.3.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.3.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.3.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.3.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.3.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.4.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.4.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.4.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.4.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.4.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.4.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.5.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.5.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.5.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.5.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.5.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.5.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.6.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.6.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.6.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.6.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.6.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.6.v_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.7.q_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.7.q_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.7.k_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.7.k_conv.bias   |     32     |
| encoder.layers.3.attention.sublayer.heads.7.v_conv.weight  |    4096    |
|  encoder.layers.3.attention.sublayer.heads.7.v_conv.bias   |     32     |
|     encoder.layers.3.attention.sublayer.linear.weight      |   32768    |
|      encoder.layers.3.attention.sublayer.linear.bias       |    128     |
|           encoder.layers.3.attention.norm.weight           |    128     |
|            encoder.layers.3.attention.norm.bias            |    128     |
| encoder.layers.3.feed_forward.sublayer.experts.0.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.0.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.0.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.1.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.1.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.1.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.2.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.2.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.2.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.3.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.3.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.3.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.4.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.4.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.4.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.5.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.5.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.5.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.6.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.6.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.6.w3.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.7.w1.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.7.w2.weight |   65536    |
| encoder.layers.3.feed_forward.sublayer.experts.7.w3.weight |   65536    |
|     encoder.layers.3.feed_forward.sublayer.gate.weight     |    1024    |
|         encoder.layers.3.feed_forward.norm.weight          |    128     |
|          encoder.layers.3.feed_forward.norm.bias           |    128     |
|                encoder.layers.3.norm.weight                |    128     |
|                 encoder.layers.3.norm.bias                 |    128     |
|           encoder.positional_encoder.norm.weight           |    128     |
|            encoder.positional_encoder.norm.bias            |    128     |
|                        out.0.weight                        |   16384    |
|                         out.0.bias                         |    128     |
|                        out.1.weight                        |    128     |
|                         out.1.bias                         |    128     |
|                        out.2.weight                        |    1792    |
|                         out.2.bias                         |     14     |
+------------------------------------------------------------+------------+
Total Trainable Params: 6881810
FLOPs of the MoE_GCN model using OpenAI_flops : =
2083328 FLOPs
FLOPs of the MoE_GCN model using DeepMind  : =
20748288 FLOPs
Collecting torchstat
  Downloading torchstat-0.0.7-py3-none-any.whl (11 kB)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from torchstat) (2.1.0+cu121)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from torchstat) (1.23.5)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from torchstat) (1.5.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->torchstat) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->torchstat) (2023.3.post1)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (2.1.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->torchstat) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->torchstat) (2.1.4)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->torchstat) (1.3.0)
+------------------------------------------------------------+------------+
|                          Modules                           | Parameters |
+------------------------------------------------------------+------------+
|                   gcn.conv_layers.0.mask                   |    1452    |
|            gcn.conv_layers.0.conv_list.0.weight            |     96     |
|             gcn.conv_layers.0.conv_list.0.bias             |     32     |
|            gcn.conv_layers.0.conv_list.1.weight            |     96     |
|             gcn.conv_layers.0.conv_list.1.bias             |     32     |
|            gcn.conv_layers.0.conv_list.2.weight            |     96     |
|             gcn.conv_layers.0.conv_list.2.bias             |     32     |
|                gcn.conv_layers.0.bn.weight                 |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
|                 gcn.conv_layers.0.bn.bias                  |     32     |
          [ 0.4739, -0.4411,  0.5949],.bias                  |     32     |
          ...,
          [ 0.4923, -0.3621,  0.5645],
          [ 0.5081, -0.3883,  0.5798],
          [ 0.5182, -0.3990,  0.5934]]],
        [[[ 0.4553, -0.4093,  0.5347],
          [ 0.4465, -0.3465,  0.5095],
          [ 0.4286, -0.3852,  0.5241],
          ...,
          [ 0.4509, -0.3077,  0.4728],
          [ 0.4479, -0.3254,  0.4838],
          [ 0.4530, -0.3408,  0.4978]],
         [[ 0.4037, -0.3051,  0.4236],
          [ 0.3883, -0.2474,  0.4012],
          [ 0.3734, -0.2841,  0.4104],
          ...,
          [ 0.4092, -0.2127,  0.3748],
          [ 0.4185, -0.2356,  0.3855],
          [ 0.4252, -0.2515,  0.3991]],
         [[ 0.3537, -0.2618,  0.3555],
          [ 0.3258, -0.2090,  0.3282],
          [ 0.3217, -0.2452,  0.3408],
          ...,
          [ 0.3415, -0.1797,  0.2958],
          [ 0.3506, -0.2016,  0.3046],
          [ 0.3611, -0.2148,  0.3186]],
         ...,
         [[ 0.4537, -0.3549,  0.5033],
          [ 0.4227, -0.3063,  0.4904],
          [ 0.4194, -0.3419,  0.4946],
          ...,
          [ 0.4334, -0.2780,  0.4736],
          [ 0.4372, -0.3001,  0.4873],
          [ 0.4425, -0.3181,  0.4995]],
         [[ 0.4640, -0.3862,  0.5401],
          [ 0.4427, -0.3413,  0.5315],
          [ 0.4335, -0.3730,  0.5314],
          ...,
          [ 0.4738, -0.3113,  0.5228],
          [ 0.4929, -0.3358,  0.5369],
          [ 0.5051, -0.3468,  0.5482]],
         [[ 0.4655, -0.4041,  0.5552],
          [ 0.4422, -0.3530,  0.5404],
          [ 0.4380, -0.3914,  0.5487],
          ...,
          [ 0.4567, -0.3171,  0.5157],
          [ 0.4776, -0.3411,  0.5297],
          [ 0.4967, -0.3589,  0.5422]]],
        ...,
        [[[ 0.4761, -0.3570,  0.5141],
          [ 0.4648, -0.3141,  0.5161],
          [ 0.4576, -0.3426,  0.5166],
          ...,
          [ 0.5142, -0.2463,  0.5095],
          [ 0.5202, -0.2300,  0.5076],
          [ 0.5223, -0.2150,  0.5060]],
         [[ 0.3927, -0.2960,  0.4408],
          [ 0.3717, -0.2445,  0.4256],
          [ 0.3685, -0.2817,  0.4376],
          ...,
          [ 0.3918, -0.1666,  0.3913],
          [ 0.3900, -0.1506,  0.3827],
          [ 0.3875, -0.1388,  0.3751]],
         [[ 0.3311, -0.2876,  0.3770],
          [ 0.3134, -0.2340,  0.3532],
          [ 0.3020, -0.2676,  0.3640],
          ...,
          [ 0.3366, -0.1689,  0.3178],
          [ 0.3309, -0.1637,  0.3046],
          [ 0.3255, -0.1634,  0.2930]],
         ...,
         [[ 0.3970, -0.3313,  0.4459],
          [ 0.3739, -0.2776,  0.4274],
          [ 0.3668, -0.3141,  0.4362],
          ...,
          [ 0.3846, -0.2333,  0.3969],
          [ 0.3707, -0.2444,  0.3857],
          [ 0.3743, -0.2543,  0.3905]],
         [[ 0.4111, -0.3530,  0.4816],
          [ 0.3957, -0.3066,  0.4776],
          [ 0.3829, -0.3379,  0.4756],
          ...,
          [ 0.4256, -0.2702,  0.4716],
          [ 0.4197, -0.2851,  0.4655],
          [ 0.4262, -0.3008,  0.4746]],
         [[ 0.4676, -0.4057,  0.5600],
          [ 0.4560, -0.3730,  0.5605],
          [ 0.4467, -0.3971,  0.5617],
          ...,
          [ 0.4902, -0.2986,  0.5512],
          [ 0.4911, -0.2836,  0.5483],
          [ 0.5034, -0.2781,  0.5599]]],
        [[[ 0.4721, -0.4069,  0.5961],
          [ 0.4707, -0.3673,  0.6005],
          [ 0.4602, -0.3962,  0.6026],
          ...,
          [ 0.4986, -0.2846,  0.5858],
          [ 0.5057, -0.2652,  0.5839],
          [ 0.5095, -0.2495,  0.5822]],
         [[ 0.4048, -0.3119,  0.4769],
          [ 0.3951, -0.2639,  0.4670],
          [ 0.3832, -0.2988,  0.4758],
          ...,
          [ 0.4350, -0.1855,  0.4479],
          [ 0.4395, -0.1692,  0.4425],
          [ 0.4435, -0.1551,  0.4377]],
         [[ 0.3602, -0.2710,  0.3874],
          [ 0.3429, -0.2231,  0.3582],
          [ 0.3350, -0.2542,  0.3804],
          ...,
          [ 0.3495, -0.1652,  0.3141],
          [ 0.3383, -0.1565,  0.3000],
          [ 0.3286, -0.1488,  0.2876]],
         ...,
         [[ 0.4653, -0.3848,  0.5433],
          [ 0.4530, -0.3342,  0.5315],
          [ 0.4418, -0.3660,  0.5397],
          ...,
          [ 0.4616, -0.3031,  0.5096],
          [ 0.4674, -0.3284,  0.5250],
          [ 0.4782, -0.3437,  0.5373]],
         [[ 0.4766, -0.4024,  0.5700],
          [ 0.4672, -0.3583,  0.5674],
          [ 0.4549, -0.3893,  0.5701],
          ...,
          [ 0.4827, -0.3129,  0.5544],
          [ 0.4868, -0.3336,  0.5710],
          [ 0.4939, -0.3441,  0.5810]],
         [[ 0.4893, -0.4281,  0.5998],
          [ 0.4710, -0.3846,  0.5879],
          [ 0.4665, -0.4158,  0.5996],
          ...,
          [ 0.4830, -0.3421,  0.5596],
          [ 0.4835, -0.3640,  0.5749],
          [ 0.4894, -0.3746,  0.5868]]],
        [[[ 0.4661, -0.4360,  0.6053],
          [ 0.4644, -0.3942,  0.6059],
          [ 0.4469, -0.4223,  0.6035],
          ...,
          [ 0.5182, -0.3283,  0.6059],
          [ 0.5178, -0.3333,  0.6037],
          [ 0.5292, -0.3455,  0.6152]],
         [[ 0.4357, -0.3562,  0.5250],
          [ 0.4243, -0.3055,  0.5140],
          [ 0.4129, -0.3416,  0.5214],
          ...,
          [ 0.4613, -0.2225,  0.4948],
          [ 0.4654, -0.2058,  0.4900],
          [ 0.4691, -0.1910,  0.4858]],
         [[ 0.3920, -0.3078,  0.4274],
          [ 0.3592, -0.2533,  0.4066],
          [ 0.3542, -0.2891,  0.4083],
          ...,
          [ 0.3804, -0.2067,  0.3905],
          [ 0.3754, -0.2042,  0.3783],
          [ 0.3706, -0.2030,  0.3676]],
         ...,
         [[ 0.4513, -0.4002,  0.5616],
          [ 0.4358, -0.3517,  0.5431],
          [ 0.4258, -0.3837,  0.5538],
          ...,
          [ 0.4320, -0.3158,  0.5094],
          [ 0.4378, -0.3351,  0.5229],
          [ 0.4490, -0.3474,  0.5371]],
         [[ 0.4701, -0.4148,  0.5681],
          [ 0.4479, -0.3735,  0.5515],
          [ 0.4469, -0.4041,  0.5673],
          ...,
          [ 0.4323, -0.2901,  0.5100],
          [ 0.4419, -0.3081,  0.5235],
          [ 0.4596, -0.3278,  0.5353]],
         [[ 0.4701, -0.4338,  0.5783],
          [ 0.4548, -0.3844,  0.5665],
          [ 0.4445, -0.4182,  0.5731],
          ...,
          [ 0.4607, -0.3052,  0.5426],
          [ 0.4555, -0.3033,  0.5344],
          [ 0.4608, -0.3103,  0.5418]]]]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0])])
2
dict_keys(['skeleton', 'label'])
tensor([[[[ 0.4939, -0.4103,  0.5915],
          [ 0.4737, -0.3553,  0.5707],
          [ 0.4599, -0.3896,  0.5775],
          ...,
          [ 0.4801, -0.2952,  0.5397],
          [ 0.4706, -0.3054,  0.5279],
          [ 0.4703, -0.3167,  0.5310]],
         [[ 0.4864, -0.3711,  0.5602],
          [ 0.4609, -0.3071,  0.5350],
          [ 0.4513, -0.3446,  0.5424],
          ...,
          [ 0.4731, -0.2591,  0.5014],
          [ 0.4591, -0.2642,  0.4880],
          [ 0.4564, -0.2782,  0.4883]],
         [[ 0.4212, -0.2986,  0.4492],
          [ 0.3987, -0.2458,  0.4205],
          [ 0.3858, -0.2760,  0.4282],
          ...,
          [ 0.4163, -0.2065,  0.3856],
          [ 0.3996, -0.2124,  0.3711],
          [ 0.3905, -0.2104,  0.3684]],
         ...,
         [[ 0.4576, -0.3808,  0.5104],
          [ 0.4329, -0.3245,  0.4925],
          [ 0.4201, -0.3548,  0.4903],
          ...,
          [ 0.4615, -0.2862,  0.4808],
          [ 0.4557, -0.3022,  0.4882],
          [ 0.4561, -0.3109,  0.4997]],
         [[ 0.4845, -0.4035,  0.5531],
          [ 0.4633, -0.3530,  0.5352],
          [ 0.4509, -0.3827,  0.5375],
          ...,
          [ 0.4786, -0.3131,  0.5140],
          [ 0.4782, -0.3320,  0.5252],
          [ 0.4785, -0.3434,  0.5350]],
         [[ 0.5020, -0.4166,  0.5738],
          [ 0.4699, -0.3629,  0.5480],
          [ 0.4667, -0.3985,  0.5586],
          ...,
          [ 0.4755, -0.3057,  0.5077],
          [ 0.4767, -0.3267,  0.5179],
          [ 0.4811, -0.3479,  0.5321]]],
        [[[ 0.4939, -0.4563,  0.5789],
          [ 0.4638, -0.3953,  0.5506],
          [ 0.4620, -0.4382,  0.5657],
          ...,
          [ 0.4632, -0.3589,  0.5063],
          [ 0.4661, -0.3812,  0.5163],
          [ 0.4745, -0.4018,  0.5311]],
         [[ 0.4456, -0.3599,  0.5136],
          [ 0.4279, -0.2952,  0.4838],
          [ 0.4181, -0.3321,  0.5008],
          ...,
          [ 0.4411, -0.2501,  0.4356],
          [ 0.4369, -0.2676,  0.4452],
          [ 0.4389, -0.2807,  0.4601]],
         [[ 0.3839, -0.2790,  0.4051],
          [ 0.3551, -0.2230,  0.3712],
          [ 0.3479, -0.2560,  0.3870],
          ...,
          [ 0.3680, -0.1768,  0.3253],
          [ 0.3622, -0.1935,  0.3308],
          [ 0.3616, -0.2029,  0.3452]],
         ...,
         [[ 0.4551, -0.3857,  0.5213],
          [ 0.4337, -0.3419,  0.5154],
          [ 0.4222, -0.3709,  0.5111],
          ...,
          [ 0.4649, -0.3003,  0.5134],
          [ 0.4720, -0.3259,  0.5276],
          [ 0.4770, -0.3510,  0.5386]],
         [[ 0.4612, -0.4040,  0.5458],
          [ 0.4535, -0.3696,  0.5527],
          [ 0.4339, -0.3905,  0.5403],
          ...,
          [ 0.5015, -0.3442,  0.5705],
          [ 0.5089, -0.3708,  0.5856],
          [ 0.5125, -0.3900,  0.5944]],
         [[ 0.5000, -0.4536,  0.5998],
          [ 0.4767, -0.4069,  0.5885],
          [ 0.4739, -0.4411,  0.5949],
          ...,
          [ 0.4923, -0.3621,  0.5645],
          [ 0.5081, -0.3883,  0.5798],
          [ 0.5182, -0.3990,  0.5934]]],
        [[[ 0.4553, -0.4093,  0.5347],
          [ 0.4465, -0.3465,  0.5095],
          [ 0.4286, -0.3852,  0.5241],
          ...,
          [ 0.4509, -0.3077,  0.4728],
          [ 0.4479, -0.3254,  0.4838],
          [ 0.4530, -0.3408,  0.4978]],
         [[ 0.4037, -0.3051,  0.4236],
          [ 0.3883, -0.2474,  0.4012],
          [ 0.3734, -0.2841,  0.4104],
          ...,
          [ 0.4092, -0.2127,  0.3748],
          [ 0.4185, -0.2356,  0.3855],
          [ 0.4252, -0.2515,  0.3991]],
         [[ 0.3537, -0.2618,  0.3555],
          [ 0.3258, -0.2090,  0.3282],
          [ 0.3217, -0.2452,  0.3408],
          ...,
          [ 0.3415, -0.1797,  0.2958],
          [ 0.3506, -0.2016,  0.3046],
          [ 0.3611, -0.2148,  0.3186]],
         ...,
         [[ 0.4537, -0.3549,  0.5033],
          [ 0.4227, -0.3063,  0.4904],
          [ 0.4194, -0.3419,  0.4946],
          ...,
          [ 0.4334, -0.2780,  0.4736],
          [ 0.4372, -0.3001,  0.4873],
          [ 0.4425, -0.3181,  0.4995]],
         [[ 0.4640, -0.3862,  0.5401],
          [ 0.4427, -0.3413,  0.5315],
          [ 0.4335, -0.3730,  0.5314],
          ...,
          [ 0.4738, -0.3113,  0.5228],
          [ 0.4929, -0.3358,  0.5369],
          [ 0.5051, -0.3468,  0.5482]],
         [[ 0.4655, -0.4041,  0.5552],
          [ 0.4422, -0.3530,  0.5404],
          [ 0.4380, -0.3914,  0.5487],
          ...,
          [ 0.4567, -0.3171,  0.5157],
          [ 0.4776, -0.3411,  0.5297],
          [ 0.4967, -0.3589,  0.5422]]],
        ...,
        [[[ 0.4761, -0.3570,  0.5141],
          [ 0.4648, -0.3141,  0.5161],
          [ 0.4576, -0.3426,  0.5166],
          ...,
          [ 0.5142, -0.2463,  0.5095],
          [ 0.5202, -0.2300,  0.5076],
          [ 0.5223, -0.2150,  0.5060]],
         [[ 0.3927, -0.2960,  0.4408],
          [ 0.3717, -0.2445,  0.4256],
          [ 0.3685, -0.2817,  0.4376],
          ...,
          [ 0.3918, -0.1666,  0.3913],
          [ 0.3900, -0.1506,  0.3827],
          [ 0.3875, -0.1388,  0.3751]],
         [[ 0.3311, -0.2876,  0.3770],
          [ 0.3134, -0.2340,  0.3532],
          [ 0.3020, -0.2676,  0.3640],
          ...,
          [ 0.3366, -0.1689,  0.3178],
          [ 0.3309, -0.1637,  0.3046],
          [ 0.3255, -0.1634,  0.2930]],
         ...,
         [[ 0.3970, -0.3313,  0.4459],
          [ 0.3739, -0.2776,  0.4274],
          [ 0.3668, -0.3141,  0.4362],
          ...,
          [ 0.3846, -0.2333,  0.3969],
          [ 0.3707, -0.2444,  0.3857],
          [ 0.3743, -0.2543,  0.3905]],
         [[ 0.4111, -0.3530,  0.4816],
          [ 0.3957, -0.3066,  0.4776],
          [ 0.3829, -0.3379,  0.4756],
          ...,
          [ 0.4256, -0.2702,  0.4716],
          [ 0.4197, -0.2851,  0.4655],
          [ 0.4262, -0.3008,  0.4746]],
         [[ 0.4676, -0.4057,  0.5600],
          [ 0.4560, -0.3730,  0.5605],
          [ 0.4467, -0.3971,  0.5617],
          ...,
          [ 0.4902, -0.2986,  0.5512],
          [ 0.4911, -0.2836,  0.5483],
          [ 0.5034, -0.2781,  0.5599]]],
        [[[ 0.4721, -0.4069,  0.5961],
          [ 0.4707, -0.3673,  0.6005],
          [ 0.4602, -0.3962,  0.6026],
          ...,
          [ 0.4986, -0.2846,  0.5858],
          [ 0.5057, -0.2652,  0.5839],
          [ 0.5095, -0.2495,  0.5822]],
         [[ 0.4048, -0.3119,  0.4769],
          [ 0.3951, -0.2639,  0.4670],
          [ 0.3832, -0.2988,  0.4758],
          ...,
          [ 0.4350, -0.1855,  0.4479],
          [ 0.4395, -0.1692,  0.4425],
          [ 0.4435, -0.1551,  0.4377]],
         [[ 0.3602, -0.2710,  0.3874],
          [ 0.3429, -0.2231,  0.3582],
          [ 0.3350, -0.2542,  0.3804],
          ...,
          [ 0.3495, -0.1652,  0.3141],
          [ 0.3383, -0.1565,  0.3000],
          [ 0.3286, -0.1488,  0.2876]],
         ...,
         [[ 0.4653, -0.3848,  0.5433],
          [ 0.4530, -0.3342,  0.5315],
          [ 0.4418, -0.3660,  0.5397],
          ...,
          [ 0.4616, -0.3031,  0.5096],
          [ 0.4674, -0.3284,  0.5250],
          [ 0.4782, -0.3437,  0.5373]],
         [[ 0.4766, -0.4024,  0.5700],
          [ 0.4672, -0.3583,  0.5674],
          [ 0.4549, -0.3893,  0.5701],
          ...,
          [ 0.4827, -0.3129,  0.5544],
          [ 0.4868, -0.3336,  0.5710],
          [ 0.4939, -0.3441,  0.5810]],
         [[ 0.4893, -0.4281,  0.5998],
          [ 0.4710, -0.3846,  0.5879],
          [ 0.4665, -0.4158,  0.5996],
          ...,
          [ 0.4830, -0.3421,  0.5596],
          [ 0.4835, -0.3640,  0.5749],
          [ 0.4894, -0.3746,  0.5868]]],
        [[[ 0.4661, -0.4360,  0.6053],
          [ 0.4644, -0.3942,  0.6059],
          [ 0.4469, -0.4223,  0.6035],
          ...,
          [ 0.5182, -0.3283,  0.6059],
          [ 0.5178, -0.3333,  0.6037],
          [ 0.5292, -0.3455,  0.6152]],
         [[ 0.4357, -0.3562,  0.5250],
          [ 0.4243, -0.3055,  0.5140],
          [ 0.4129, -0.3416,  0.5214],
          ...,
          [ 0.4613, -0.2225,  0.4948],
          [ 0.4654, -0.2058,  0.4900],
          [ 0.4691, -0.1910,  0.4858]],
         [[ 0.3920, -0.3078,  0.4274],
          [ 0.3592, -0.2533,  0.4066],
          [ 0.3542, -0.2891,  0.4083],
          ...,
          [ 0.3804, -0.2067,  0.3905],
          [ 0.3754, -0.2042,  0.3783],
          [ 0.3706, -0.2030,  0.3676]],
         ...,
         [[ 0.4513, -0.4002,  0.5616],
          [ 0.4358, -0.3517,  0.5431],
          [ 0.4258, -0.3837,  0.5538],
          ...,
          [ 0.4320, -0.3158,  0.5094],
          [ 0.4378, -0.3351,  0.5229],
          [ 0.4490, -0.3474,  0.5371]],
         [[ 0.4701, -0.4148,  0.5681],
          [ 0.4479, -0.3735,  0.5515],
          [ 0.4469, -0.4041,  0.5673],
          ...,
          [ 0.4323, -0.2901,  0.5100],
          [ 0.4419, -0.3081,  0.5235],
          [ 0.4596, -0.3278,  0.5353]],
         [[ 0.4701, -0.4338,  0.5783],
          [ 0.4548, -0.3844,  0.5665],
          [ 0.4445, -0.4182,  0.5731],
          ...,
          [ 0.4607, -0.3052,  0.5426],
          [ 0.4555, -0.3033,  0.5344],
          [ 0.4608, -0.3103,  0.5418]]]])
skeleton
label
Tensor_dataT.size() =  torch.Size([32, 8, 22, 3])
Tensor_dataT [[[ 0.45649411 -0.44376922  0.64408398]
  [ 0.45470198 -0.38724606  0.63845301]
  [ 0.43893905 -0.42612109  0.64874703]
  [ 0.41352988 -0.38762114  0.65398598]
  [ 0.38978973 -0.35307541  0.65376902]
  [ 0.38361855 -0.32759178  0.65515703]
  [ 0.4284792  -0.32936773  0.63953203]
  [ 0.42679544 -0.27440213  0.63265598]
  [ 0.42789929 -0.24478968  0.62921703]
  [ 0.42692376 -0.22101636  0.62609202]
  [ 0.45129005 -0.32722124  0.63175601]
  [ 0.44714241 -0.26338693  0.619479  ]
  [ 0.50353895 -0.34977931  0.62299418]
  [ 0.3156789  -0.27295206  0.60414994]
  [ 0.47331994 -0.33073168  0.62587798]
  [ 0.47041377 -0.27057905  0.61708701]
  [ 0.42616162 -0.25311683  0.71024507]
  [ 0.46730407 -0.21198767  0.60850102]
  [ 0.49502148 -0.34408698  0.619892  ]
  [ 0.49964735 -0.29375331  0.61635399]
  [ 0.50096575 -0.2707146   0.61472797]
  [ 0.45703796 -0.25597774  0.53799719]]
 [[ 0.47197036 -0.53988416  0.62220198]
  [ 0.4580784  -0.48336113  0.60461497]
  [ 0.44658563 -0.52154911  0.619412  ]
  [ 0.41508607 -0.48072571  0.61257303]
  [ 0.38082476 -0.44323452  0.589674  ]
  [ 0.36590875 -0.41295902  0.57904899]
  [ 0.41487425 -0.41403426  0.58618098]
  [ 0.4140427  -0.3613101   0.57492298]
  [ 0.41350103 -0.33505483  0.56929302]
  [ 0.4128616  -0.3113708   0.56417602]
  [ 0.44144987 -0.41669524  0.58362198]
  [ 0.43370049 -0.3579911   0.57625699]
  [ 0.49153033 -0.44347535  0.58263618]
  [ 0.30505913 -0.3713189   0.56583893]
  [ 0.46648639 -0.42415859  0.58316201]
  [ 0.45920816 -0.35960788  0.55863303]
  [ 0.41443498 -0.33983203  0.64373803]
  [ 0.44764333 -0.31478569  0.53467399]
  [ 0.49675979 -0.44231811  0.58563203]
  [ 0.48508369 -0.37770261  0.560574  ]
  [ 0.47726461 -0.3720665   0.549061  ]
  [ 0.41848824 -0.3831209   0.46360517]]
 [[ 0.48442138 -0.62001068  0.627913  ]
  [ 0.49168067 -0.57390347  0.60613197]
  [ 0.46249449 -0.58999824  0.617248  ]
  [ 0.46103121 -0.54664181  0.59524101]
  [ 0.47081903 -0.50676016  0.58063197]
  [ 0.47703809 -0.47600041  0.568829  ]
  [ 0.46432632 -0.4830178   0.574269  ]
  [ 0.45322754 -0.45161847  0.54255003]
  [ 0.44803504 -0.43584942  0.52669102]
  [ 0.46494869 -0.43861626  0.532435  ]
  [ 0.49320529 -0.50320813  0.58013397]
  [ 0.4726163  -0.49277866  0.54636502]
  [ 0.49780852 -0.61624818  0.53734219]
  [ 0.31374727 -0.59338886  0.52973294]
  [ 0.51733116 -0.52547568  0.58699799]
  [ 0.49490084 -0.51341313  0.55295902]
  [ 0.42256253 -0.5291977   0.63319808]
  [ 0.45177362 -0.54055221  0.54193801]
  [ 0.54645841 -0.56252827  0.598836  ]
  [ 0.51718837 -0.54657465  0.56708002]
  [ 0.50381804 -0.53947116  0.55249   ]
  [ 0.45658055 -0.56645892  0.4813922 ]]
 [[ 0.42587508 -0.67696014  0.56173801]
  [ 0.46481211 -0.66463394  0.56095499]
  [ 0.42960141 -0.64680746  0.55488998]
  [ 0.45645956 -0.617573    0.54098803]
  [ 0.48047119 -0.60097324  0.52756703]
  [ 0.49974634 -0.59675506  0.51798499]
  [ 0.49845378 -0.61679659  0.55171102]
  [ 0.51597109 -0.64597169  0.54589701]
  [ 0.5046869  -0.66281403  0.54299003]
  [ 0.5010224  -0.68822165  0.55855399]
  [ 0.5076827  -0.64872309  0.56002003]
  [ 0.52990845 -0.67580205  0.55221999]
  [ 0.58918724 -0.83106116  0.5905692 ]
  [ 0.40330692 -0.7874534   0.58834797]
  [ 0.5126011  -0.67765428  0.56693202]
  [ 0.52397628 -0.69367545  0.55389702]
  [ 0.4822849  -0.73224239  0.67502707]
  [ 0.51807494 -0.72636618  0.59556901]
  [ 0.51159101 -0.71302973  0.57522798]
  [ 0.51961776 -0.71390895  0.562195  ]
  [ 0.52242911 -0.72957377  0.580863  ]
  [ 0.4730504  -0.7378266   0.51739216]]
 [[ 0.28875737 -0.70936456  0.60423601]
  [ 0.31463972 -0.6756929   0.588386  ]
  [ 0.28418101 -0.67239651  0.58934498]
  [ 0.30820475 -0.66839428  0.57322299]
  [ 0.32074619 -0.62880076  0.561692  ]
  [ 0.34317953 -0.6224781   0.55596298]
  [ 0.33162814 -0.60459652  0.556265  ]
  [ 0.3104165  -0.60353749  0.52642602]
  [ 0.28594978 -0.60147305  0.51150697]
  [ 0.27226425 -0.61648274  0.51690602]
  [ 0.34866905 -0.63791372  0.56946599]
  [ 0.33020677 -0.64519455  0.54184502]
  [ 0.38262178 -0.79788928  0.57402021]
          [ 0.4739, -0.4411,  0.5949],.bias                  |     32     |
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
<ipython-input-186-f54b70cf0824>:8: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  Tensor_dataT = torch.tensor(dataT['skeleton']);
<ipython-input-186-f54b70cf0824>:9: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  Tensor_labelsT = torch.tensor(dataT['label']);
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
          [ 0.4739, -0.4411,  0.5949],.bias                  |     32     |
<ipython-input-187-dfd265fbff9e>:8: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  Tensor_dataT = torch.tensor(dataT['skeleton']);
<ipython-input-187-dfd265fbff9e>:9: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  Tensor_labelsT = torch.tensor(dataT['label']);
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
          [ 0.4739, -0.4411,  0.5949],.bias                  |     32     |
          [ 0.4739, -0.4411,  0.5949],.bias                  |     32     |
          [ 0.4739, -0.4411,  0.5949],.bias                  |     32     |