Any plan for bigger model such as 30B?

#10

by lpy86786 - opened Apr 14, 2023

Apr 14, 2023

Thanks to the effort of developers, for now I have experienced the powerful performance of rwkv model.
Will there be bigger models like 30B, 65B or even 130B in the near future? Then the relationship between performance and the model size can be fully tested.
I hope there will be an emergent phenomenon as the size of model increases, that is, the performance is greatly improved on bigger models.

感谢大佬们的工作。
请问未来对于更大的模型，比如30B，65B甚至130B有计划做吗？
大的模型估计能带来一些提升，其实我更期待看到的是出现涌现现象，也就是模型大小增加到一定程度时，模型会显示出一些新的能力。。。

BlinkDL

Owner Apr 14, 2023

the plan is 24B -> 50B -> 100B this year :)

Raspbfox

Apr 15, 2023

Let's make sure as a community, that we can run all of those models on normal desktop hardware! Need some good runtimes 👀

snapo

Apr 16, 2023

•

edited Apr 17, 2023

if it realy works the 100B model with 4 bit quantization would probably be possible to run on a desktop with 128GB ram :-) That would be so amazing... But unknown if it would still work with nearly the same accuracy. ChatGLM did that to their model and the decrease in accuracy is absolutely minimal on the 130B model, but they where only able to reduce the weights and not the activations. (https://github.com/THUDM/GLM-130B/blob/main/docs/quantization.md) would be interesting to see if RWKV will face the same problem on a big model.

trahloc

Apr 16, 2023

@BlinkDL I really want to say awesome work dude, you aren't a fork or a mod of someone else's models but putting out the original I'm amazed. Out of curiosity what are you using to train your models? Do you have access to the hardware just laying around to get these done so fast?

I'm asking since I'm wondering if I have anything I might be able to do to help that effort, even if it's just a couple bucks over paypal or whatever.

Verah

Apr 17, 2023

@BlinkDL This is great news, I will look forward to trying 24B in the future!

@trahloc There is a ko-fi link on the github pages if you want to support the developer. The compute is sponsored by Stability-AI and EleutherAi

trahloc

Apr 17, 2023

@Verah Ah, it's within the projects themselves. I'm used to just see it as part of folks profile / about me.

ZhangRC

May 11, 2023

Are there signs of CoT emergence observed in recent RWKV models? I don't see any in the Q8 7B model.

Raspbfox

May 15, 2023

@ZhangRC , maybe, because it already processes inputs token-by-token, due to executing like an RNN, instead of "all at once", like Transformers do. If that's the reason, the model should display some "CoT" improvements by default 👀

BlinkDL

Owner May 16, 2023

@Raspbfox @ZhangRC CoT is simple. Simply tune it with more CoT data.

lpy86786 changed discussion status to closed Jul 21, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment