thank you for briefly explaining the difference between dense and moe as well as active parameters. from all these ai channels, i think you might be the best one. love your content and explainations!
"Since it's open source everyone can fix it" > illegal to use in the EU
Fireship's video about llama 4 was so disappointing, lazy, and half of it was an ad. I'm starting to like bycloud a lot more, he really does go in-depth and explains things better. (Although he should stop using outdated memes 😭)
As much as I like the fact that a big company keeps supporting the open source llm landscape, I just gotta be honest that I've never been impressed with the llama models. They're bloated and just never performed well for anything I tried on them. And now they've gotten so fkin huge that if you manage to get them to run at all, you'll def be able to run smth like deepseek comfortably as well and I'd rather use that.
1.5B model outperforming 109B is funny
You've been the best in AI news and overviews for years now and with great humor. glad I came across you
Well at 6:03 the reason why LLama 4 400B is bad at instruction following is because it isn't instruction fine tuned yet, it is still pre-training. Llama 4 109B was. Also, as far as I know, Llama 4 400B was a teacher model for Maverick and Scout only for the pre-training process.
It feels like there's a secret sauce to make models better at coding. I have a small piece of JavaScript (originally created by an older LLM and faulty), that I use to measure LLM bug fix performance. There have been a lot of open weight models since around beginning last year, that manage to create the fixed code just fine, but they're usually 8B or above and not fully open source. I even tested some 12B-14B reasoning models and they mostly failed spectacularly after thousands of tokens. Either these models have never seen a piece of JavaScript, are overtrained on Python or there is a secret training set, the open source community doesn't have.
Llama-4 is already asking for vacation days?
Technically, 17B active parameters could be runnable on RAM at relatively OK speeds. This could work for the Digits/Ryzen AI/Mac.
Im loving these - keep it up!
I found something funny: - It seems to be excellent in answering/explaining questions. - But is is atrocious in completing any slightly complex task.
I agree it is odd. And where are there smaller models? What's going on?
Having context is good, but if it commit mistakes to what was in that context it is kind of pointless isn't it?
4:03 I thought only Gemini-2.0-flash could output images?
Hello bycloud. It'll be great if you share your sources with us. Your videos are pretty detailed and really good. It'll be great to get access to some of your sources.
Maybe we reached the peak usecases of transformer? Maybe newer improved algorithms like MAMBA??
I can recommend them some fanfiction that'll crack the 1M token limit.
I've found all of the Llama models/generations to be generally really underwhelming tbf, so not surprised by this.
@bycloudAI