OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints · Differences between Lightning Attention1 and Lightning Attention2 code implementations

hello, I have two questions I’d like to ask:

In this repository, I noticed that the implementations of lightning attention1 and lightning attention2 appear identical
The implementation of lightning attention2 in this repository differs from the code provided at this GitHub link(https://github.com/OpenNLPLab/lightning-attention). By testing the computational efficiency of these two implementations, I found that this repository’s version of lightning attention2 has lower computational efficiency than the one from that GitHub link.