Any plan for bigger model such as 30B?
Thanks to the effort of developers, for now I have experienced the powerful performance of rwkv model.
Will there be bigger models like 30B, 65B or even 130B in the near future? Then the relationship between performance and the model size can be fully tested.
I hope there will be an emergent phenomenon as the size of model increases, that is, the performance is greatly improved on bigger models.
感谢大佬们的工作。
请问未来对于更大的模型,比如30B,65B甚至130B有计划做吗?
大的模型估计能带来一些提升,其实我更期待看到的是出现涌现现象,也就是模型大小增加到一定程度时,模型会显示出一些新的能力。。。
the plan is 24B -> 50B -> 100B this year :)
Let's make sure as a community, that we can run all of those models on normal desktop hardware! Need some good runtimes 👀
if it realy works the 100B model with 4 bit quantization would probably be possible to run on a desktop with 128GB ram :-) That would be so amazing... But unknown if it would still work with nearly the same accuracy. ChatGLM did that to their model and the decrease in accuracy is absolutely minimal on the 130B model, but they where only able to reduce the weights and not the activations. (https://github.com/THUDM/GLM-130B/blob/main/docs/quantization.md) would be interesting to see if RWKV will face the same problem on a big model.
@BlinkDL I really want to say awesome work dude, you aren't a fork or a mod of someone else's models but putting out the original I'm amazed. Out of curiosity what are you using to train your models? Do you have access to the hardware just laying around to get these done so fast?
I'm asking since I'm wondering if I have anything I might be able to do to help that effort, even if it's just a couple bucks over paypal or whatever.
Are there signs of CoT emergence observed in recent RWKV models? I don't see any in the Q8 7B model.