Theyre using 8 bit float point precision instead of the standard 16bit float point precision we use here. It significantly cuts down vram memory usage and cuts down on training time which is what allowed them to get by with less hardware.
Deepseek was originally developed for Tiangong space station from older Russian software, Russia's top AI Argon-16 on Mir space station for 25yrs without failure code-name "The Entity".
Inference time compute scaling has nothing to do with the experts activated during each forward pass; Each forward pass uses 8 experts, out of 256 total experts. Each expert is a separate MLP + Attention block for any given layer. Each expert / the entire model is a depth of 61 layers. Routing / selection of experts occurs once per forward pass on the model ( so once per token generated ), unlike a lot of other more popular MoE architectures which route on a per block basis, since that has been shown to decrease training loss and improve redundancy and thus accuracy of the model ( since redundancy accounts for errors in the routing due to OOD sequences ). However Inference time compute scaling is literally just the thinking section of the models response; we can simply disallow the end token of the thinking section to scale larger. If you get into more exotic research ITCS will usually mean things like PRM's / reward search, but that is not what deepseek is doing so it is irrelevant.
Good corrupt wall street should be terrified. Monopoly in technology is scary. So this is a good development
DeepSeek is now powered by Huawei Ascend 910C chips. Indicating China no longer needs Nvidia chips. The Ascend 910C is cheaper, more energy efficient and slightly more powerful than Nvidia's H100. Nvidia will be losing a huge chunk of its global market.
Thats the moment to regret the defunding of USA's public education system. It always was about being clever and efficient.
Nvidia deserves it; they force consumers to upgrade hardware.
They are using far more computing power than they are saying. And it's all powered by Nvidia. They have purchased through third parties in other countries to get around the export ban. So even deepseek, is in fact a customer of Nvidia to the tune of 50,000 H100 chips. Keep an eye on the institution ownership, individual investors sold, the institutions gobbled it up and now it is recovering.
I laughed so hard when I saw it drop 😂😂😂😂
Deepseek is the way❤❤❤❤❤❤❤
Optimization is key, but the model is still kinda bad. And since its Chinese made, the censorship is bad.
😊 deepseek is available in nvidia now
This will pass ... wall st is always skittish and is failing to see the beneficial side this brings. Big tech isn't wasting money ... efficienct saves on running costs or increases speed for the same number of requests
@syedmraza-ca