Will NVIDIA Survive The Era of 1-Bit LLMs?

Did you read the paper "The Era of 1-Bit LLMs"? In this video, I'll explain how this groundbreaking research changed the game for NVIDIA that's currently focused on floating point model training.
Finxter is about disruptive innovation, AI, and technology. I founded Finxter to help students be on the right side of change. With all the disruptions, we need to work together to stay ahead. Join us!
👇👇👇
♥️ Join my free email newsletter to stay on the right side of change:
👉 blog.finxter.com/email-academy/
Also, make sure to check out the AI and prompt engineering courses on the Finxter Academy:
👉 academy.finxter.com
🚀 Prompt engineers can scale their reach, success, and impact by orders of magnitude!
You can get access to all courses by becoming a channel member here:
👉 / @finxter

Пікірлер: 146

  • @anthonyrepetto3474
    @anthonyrepetto347420 күн бұрын

    IMPORTANT CONCEPT: PHOTONIC chips for inference are MUCH simpler to engineer, if they are only performing a mask and addition operators; compared to matrix-multiply in photonics! And those photonic chips have been stuck in the lab, *because of the difficulty of engineering the matrix multiply* - so, 1.58-bit photonic networks could be made sooner, now. Most importantly, photonic chips are hundreds of times FASTER, so using them for inference is ideal. "If 1.58-bit networks ==> Then Photonic chips for inference"

  • @walkerjian

    @walkerjian

    19 күн бұрын

    yup, silicon is dead, fab lines are dead, NVIDIA is dead. Make the inference fast enough and you have general computation on optical AI, the CPU is dead, the PC is dead, long live the optical AGI ...

  • @default2826

    @default2826

    18 күн бұрын

    Where can I learn more about this

  • @MCA0090

    @MCA0090

    18 күн бұрын

    The chip is fast, but memory speed is still an issue, DRAM chips are slow and on-chip SRAM memory is way faster, but very expensive (that's why groq chips are so fast to do inference, the model is all loaded into SRAM memory). The biggest bottleneck in LLM inference is memory speed, unlike diffusion models for image/videos that uses more processing power and less memory bandwidth than LLMs.

  • @anthonyrepetto3474

    @anthonyrepetto3474

    18 күн бұрын

    @@default2826 I wrote "1-bit Neural Networks & Photonic Chip Inference" on Medium, to explain the idea in more detail :)

  • @delight163

    @delight163

    18 күн бұрын

    Anyone know of any publicly traded companies working on photonic chips or the infrastructure around them?

  • @gunsarrus7836
    @gunsarrus783619 күн бұрын

    This could allow far larger models on local systems paving the way for wife bots

  • @ineffige

    @ineffige

    19 күн бұрын

    just keep physical mute button please

  • @finxter

    @finxter

    18 күн бұрын

    No comment

  • @john_blues
    @john_blues20 күн бұрын

    I wasn't expecting an Nvidia diss track. 😂 I'm waiting for an Nvidia Digital Human to respond.

  • @finxter

    @finxter

    20 күн бұрын

    Haha AI doing the dissing and defending sounds like a reasonable prediction of the future

  • @AnthonyGoubard
    @AnthonyGoubard18 күн бұрын

    I think something similar can be done with {2, 1, 0, -1, -2} has multiplication by 2 is very cheap in CPU, maybe even cheaper than addition. It's just 1 bit shifting. Another possibility would be quantization with {4, 2, 1, 0, -1, -2, -4}. This would use more bits but still wouldn't require matmul.

  • @PaulSpades

    @PaulSpades

    18 күн бұрын

    I really like this for current systems, but I don't know how to store your sets in hardware. Are you assuming BCD? 8bit signed integer is too wasteful for a set of 7 values, even signed 4bit ints store 16 values (8 positive and 8 negative). I'm not sure, because I don't have the hardware, but supposedly nVidia have implemented a 4bit datatype. I assume it encodes some flavour of a balanced value.

  • @finxter

    @finxter

    18 күн бұрын

    This sounds interesting. Can you elaborate a bit on this? Maybe write a blog or so and share it here. Who knows - it might kick off some research in this direction for the benefit of humanity.

  • @vincewestin
    @vincewestin21 күн бұрын

    It seems a real jump to decide that 1, 0, and -1 are an equal probability. Given the data set for a given LLM, it is highly unlikely that the values will be used at the same rate. Most CPUs can perform any basic (non-floating point) operation in a single cycle. So one integer multiplication is faster than doing the compares of case logic. Doing this 2 bit math (since we can’t do 1.5+ bit math directly) will also have overhead of masking those bits out from the larger memory structures that the data will be stored in.

  • @kazedcat

    @kazedcat

    20 күн бұрын

    Nope using 2bits is very easy the key is to create 2 1bit matrix data structure. One of the bit matrix represents the significand either 0 or 1 and another matrix represents the sign + or - . With the weights stored in bit matrix structure processing can be done with simple bitwise operations. Converting a floating point matrix into a bit matrix is also very easy.

  • @finxter

    @finxter

    20 күн бұрын

    I'd love to checkout some comparisons between 1 and 1.58 bit LLMs performance wise. The overhead and technical complexity of this approach doesn't seem optimal yet. For instance we could double the matrix for 1 bit LLMs compared to this 2 bit solution. Will 2 bit still be better than that? Last word wasn't spoken.

  • @larsnystrom6698

    @larsnystrom6698

    17 күн бұрын

    Not as easy as a one bit math would be, I guess. One bit and a bias input, perhaps, would replace that signed bit. Just an idea! But I didn't watched the video. That music part put me off. I will read the paper, though, so thanks for the video anyway.

  • @CharlesVanNoland

    @CharlesVanNoland

    15 күн бұрын

    You're thinking about this in terms of conventional Von Neumann compute hardware. Even on conventional hardware though, memory bandwidth AND capacity are the current limitations of DNNs.

  • @DavidSaintloth
    @DavidSaintloth20 күн бұрын

    What will NVIDIA do? Simple, design a dedicated 1.5bit optimized neural core for their next generation family. This is not even an issue for them. Not in the slightest.

  • @finxter

    @finxter

    20 күн бұрын

    Agree. There will be more competition because it's easier to do and I'm not confident they will maintain their insane 50% net margin but they will likely thrive

  • @ckmichael8

    @ckmichael8

    20 күн бұрын

    ​@@finxterI think if we are doing 1.58 bit networks, which would require much.more parameters to achieve the same accuracy, then it is actually an advantage for NVIDIA. NVIDIA's moat is not really on inference,. these days all of the big clouds + amd + Intel + Qualcomm can easily make chips that do inferencing, and even the largest LLM available (gpt4) can be easily ran on 8 mi300x. The moat of NVIDIA is its cuda ecosystem and its nvlink support for distributed training, which is required if. anyone want to train bigger networks. Give that training still have to be done with 16bit and just quantizied to 1-bit, their training moat is not really affected.

  • @fifty6737

    @fifty6737

    16 күн бұрын

    it will prevent NVidia from monopolizing AI hardware, their 3T $ inflated value will scale back

  • @dc2778
    @dc277820 күн бұрын

    I dropped out in 9th grade and while I don’t understand hardly a word you’re saying, i unequivocally say, with full certainty and peak Dun-Kruger Mt. Stupid level of conviction, give you the answer of yes! I think.

  • @finxter

    @finxter

    20 күн бұрын

    That's actually a very intelligent comment. Dropping off in 9th grade seems to have played out to your benefit. I'm not very bullish on the education taught in the traditional system. That's why I founded the Finxter academy in the first place. Keep up the great work!

  • @ineffige

    @ineffige

    19 күн бұрын

    knowing that you don't know something requires intelligence so you're not stupid :) there's plenty of videos on YT explaining this paper simply, even for dummies like me, you will probably find one that fits you too. Thats why i love internet, you can always find something explained in a way suitable for you. And then just go deeper from that.

  • @esra_erimez

    @esra_erimez

    18 күн бұрын

    I don't believe you. I think you're very smart

  • @stephenkolostyak4087

    @stephenkolostyak4087

    15 күн бұрын

    I may as well have done the same because "sure?"

  • @entwine
    @entwine15 күн бұрын

    OMG, that song is 🔥

  • @finxter

    @finxter

    15 күн бұрын

    😂🙏

  • @PL_chochlikman
    @PL_chochlikman21 күн бұрын

    It will increase hotness just like that - but with use of this mostly we already add it - so it's not a bug but a feature!

  • @sebbbi2
    @sebbbi218 күн бұрын

    Nvidia could just add special ternary tensor units, like they added the current tensor units in existing GPU design. Minimal changes. Current GPUs already have fixed function hardware for non binary number decoding for ASTC texture compression for example. It there’s need, Nvidia will add hardware for it. Also if you don’t want 100% efficiency, you can just store ternary in 2 bits. Sign + value (0 or 1). Sign bits are combined with XOR and value bits with AND. That implements the multiply operation.

  • @finxter

    @finxter

    18 күн бұрын

    Agree. 2-bit training and inference precision for weights will effectively take care of it.

  • @tomcraver9659
    @tomcraver965920 күн бұрын

    Why use 1.58 bits? In theory, ANY function can be composed from arbitrarily complex XOR operations on single bits. Not even addition - just a sea of bits and XORs. As a bonus, it doesn't need to be at all synchronous any more - you can let the whole thing run continuously until the function settles out.

  • @finxter

    @finxter

    19 күн бұрын

    Can you elaborate on this idea? It sounds to me like some kind of pipeline parallelism. One needs to make sure that there is convergence in the first place. Not sure we can guarantee this for any function - but your input would be valuable in this regard!

  • @juandesalgado

    @juandesalgado

    19 күн бұрын

    If it was only about inference, you'd be right. The question is, how would you train these weights.

  • @tomcraver9659

    @tomcraver9659

    19 күн бұрын

    ​@@finxter Caveat - I certainly haven't worked this all out, I'm just thinking this way: The code that 'runs' an LLM is essentially a 'function', transforming digital inputs (the model parameters and token inputs) to digital outputs (new tokens). Any digital function can - in theory - be implemented as a bunch of fundamental boolean operations. NOR operations (I mistakenly said XOR before) are sufficient to implement any boolean operation, though they may not be the most efficient way to do that (e.g. no point using an NOR gate to implement a NOT operation). Large language models are essentially the same function applied over and over with different parameters and inputs, so the whole model can be considered a very large and complex function itself, with embedded parameter values simply being constants. So if you had enough resources, you could theoretically build a huge 'sea of gates' that implement that huge function. The constant parameters in such a implementation would all be 0 or 1 - basically hard-wiring an input to an NOR gate low or high, and connecting in layer input bits as needed. Of course, that's a LOT of gates (maybe a handful for each parameter in your 1.58 bit model). So more likely you'd instead implement a generalized model of one layer, actually feed in the 1 bit parameter values of the 1 bit model instead of hard wiring it, as well as 1 bit values from the previous layer - then apply that hardware to layer after layer in sequence. Not saying I have any idea how one would 'train' such a model - or convert a trained model efficiently to this form. There's a good chance it won't be more compact than the 1.58 bit model, and even one layer of hardware would be pretty big. But it might be faster and lower power, as it could be made close to ideally minimal operations.

  • @idiomaxiom

    @idiomaxiom

    17 күн бұрын

    @@tomcraver9659 Its a much smaller sea of gates if they're trits. Also the LLMs capture normal attractors and strange attractors in their nets during training. The "function' being activated changes chaotically based on the entire sequence and its order. A MOV operation is turing complete and there are compilers that will compile to x86 using only the MOV command. You would be nuts to actually use it though.

  • @patrickmchargue7122
    @patrickmchargue712219 күн бұрын

    Anyone know how I can run a 1-bit model in LM Studio? What I tried only outputs garbage. I'm looking for a model and what setting to use.

  • @dreamphoenix
    @dreamphoenix20 күн бұрын

    Thank you.

  • @finxter

    @finxter

    20 күн бұрын

    You're welcome!

  • @vokuh
    @vokuh19 күн бұрын

    2:30 - banger :D

  • @finxter

    @finxter

    18 күн бұрын

    Thanks NVIDIA for creating the hardware to generate that song. ;)

  • @entwine
    @entwine15 күн бұрын

    11:35 "Most time and effort in traditional LLM training is spent on matmul" -- I think for training we still need to use longer floats. I'm pretty sure backpropagation won't work with just -1, 0, 1. And yes, in inference this will be super-cheap.

  • @finxter

    @finxter

    15 күн бұрын

    Yeah currently training is still done in floating point space. But the mega trend in AI training is to use lower and lower precision for training weights. For instance, the new Blackwell now supports FP4 as well, coming down from FP8 and FP16 and FP32. Big picture prediction is that training algorithms are now developed that work for 1-bit LLMs as well. Why not? There's no first-principles reason why it won't work.

  • @test5095
    @test509518 күн бұрын

    WHAT A BANGER OMG

  • @TheZEN2011
    @TheZEN201120 күн бұрын

    last I looked about a month ago it was going kind of slow in the development of the 1.58 b model. But yeah I see the potential.

  • @finxter

    @finxter

    20 күн бұрын

    So you mean the research on top of 1.58b was slow or the development of new hardware? At this point thousands of researchers worldwide are working in parallel to improve on 1.58b. And I bet we'll see new hardware very soon, given the potential (trillion (!) USD market).

  • @blengi
    @blengi10 күн бұрын

    what do biological examples of neural nets like the human brain and encoding variances across the cortex, imply about optimizing AI bit encoding, given nature surely converged toward some analogically preferred requirements ?

  • @finxter

    @finxter

    9 күн бұрын

    Great question! In nature, everything is more gradual. So, I suppose nature would use higher precision neurons (=weights) instead of lower precision neurons. Nature would not use 1-bit LLMs. ;)

  • @NdxtremePro
    @NdxtremePro17 күн бұрын

    Looking at the first slide, it almost seems like the sign is more important than the numbers, by a factor of 2/3 importance. I wonder if we could reduce it to just the sign?

  • @idiomaxiom

    @idiomaxiom

    17 күн бұрын

    It is just the sign. -/0/+

  • @finxter

    @finxter

    17 күн бұрын

    Yeah, the original research was just about the sign (1-bit).

  • @BangkokBubonaglia
    @BangkokBubonaglia12 күн бұрын

    This is actually a perfect application for the old Russian ternary computers. It will be interesting if we start to see a resurgence in that discarded technology. In ternary logic, these are called trits. So not 1.58 bits. 1 trit.

  • @finxter

    @finxter

    12 күн бұрын

    Agree - nothing new under the sun. Yet - I don't think there's a first-principle reason to actually create a new 1.58-bit architecture. The performance gains mostly don't come from the trit architecture per se but from the reduced precision (generally). That can be accomplished with bit-architecture as well.

  • @cem_kaya
    @cem_kaya19 күн бұрын

    i mean this is not that far away from FP4 which Nvidia already has. This might be better but fp4 has the momentum. All this reminds me of Posits yeah they are better but who got the Tera-ops of these ?

  • @finxter

    @finxter

    19 күн бұрын

    Yeah, agree partly. The thing is that the whole trend has been moving towards fewer and fewer bits per weight. 1-bit LLM is just taking it to the extreme it seems to converge to anyways.

  • @alexeykulikov5661

    @alexeykulikov5661

    15 күн бұрын

    It is not very far away in terms of memory, but it opens up ways to design hardware that has potentially 100-1000X AI inference performance/watt, while at the same time is more simple. And inference speed can be traded for inference quality, there are many approaches already existing and being developed, but they are all limited by the inference speed (or rather, cost, to serve it to users at large scales).

  • @cem_kaya

    @cem_kaya

    15 күн бұрын

    ​@@alexeykulikov5661 This is not 100-1000X faster then FP4 . you should take a look at what operations are possible with fp4 it is very restricted.

  • @larsnystrom6698
    @larsnystrom669817 күн бұрын

    If this is viable, I guess Nvidia can do it too! So the title was unneccesary off a little bit.

  • @finxter

    @finxter

    17 күн бұрын

    Yes. Guilty of clickbait.

  • @perceptron-1
    @perceptron-118 күн бұрын

    This 'new computation paradigm' is already hundreds of years old, the first mechanical computers worked like this, but there is not a sample piece of them left, because they were made of wood, only a glass painting preserves that they existed.

  • @finxter

    @finxter

    18 күн бұрын

    Haha, when in doubt zoom out. Interesting perspective. Yes - I tend to agree, it's not as revolutionary. It is, however, an interesting little trick to implement AI Models in many practical scenarios previously impossible.

  • @perceptron-1

    @perceptron-1

    18 күн бұрын

    @@finxter I have no doubt that single-bit LLM works, since I did it already in the 80s, when computers only had a few kbytes of memory, Megabytes at most, and you had to save on the bits, today you don't have to. In the case of LLM, however, they once again realized that it is necessary to save on the bits of how many valuable tokens a link is represented on. 1-2-3 bits are enough, it can be analog weighting anyway, which is even better (I'm working on it now). More than 40 years ago, I made such neural systems on a practical level, which was not public, which is now served as world news by the paper workers.

  • @Clammer999
    @Clammer99918 күн бұрын

    That was a pretty cool song. Can point to where I can get it?

  • @elivegba8186

    @elivegba8186

    18 күн бұрын

    Ai generated

  • @finxter

    @finxter

    18 күн бұрын

    Here's the link - I used Suno to generate this with AI. suno.com/song/519b9099-21bf-4598-a48a-ad11d6b3295e

  • @Clammer999

    @Clammer999

    18 күн бұрын

    @@finxter Thanks! Not against Nvidia or anything but really love the lyrics and the tune😁

  • @macaquinhopequeno
    @macaquinhopequeno20 күн бұрын

    im sure a trillion dollar company has enough money to invest whatever works to continue competitive, they already have chip engineers, this should be very easy for them to make their own if they are 100% that it is better then this will just be their new "grace hopper" architecture

  • @finxter

    @finxter

    20 күн бұрын

    Agree.

  • @jon9103

    @jon9103

    19 күн бұрын

    The question isn't whether Nvidia is able to adapt, it's whether this will open the door for competition and therefore reduce their market share.

  • @PaulSpades
    @PaulSpades18 күн бұрын

    -1, 0, 1 is *Balanced Ternary*. Why the holly heavens must you use such awkward naming for a well known set of values in computing? In electronics, a floating gate can output negative, positive and floating/not-connected; if the inputs are negative and positive. Floating gates are used in all logic circuits that talk trough a shared buss or line including: registers, memory and all buss controllers for serial and parallel communication. This is not new technology, it's just not used for other purposes. Proper ternary gates (that connect to ground rather than stay floating) are not more complicated. Obviously, a ternary memory cell needs 4 states: neg, zero, pos and floating.

  • @finxter

    @finxter

    18 күн бұрын

    Haha, yeah "1.58b" is a bit awkward naming. I'm also not sure that it is sufficiently proven that ternary is better than, say, 4 states using traditional 2 bit architecture that would not incur significant overhead cost in terms of creating new compute hardware etc.

  • @qiyuzhong4287
    @qiyuzhong428710 күн бұрын

    It might be true when it comes to the light of quantization.

  • @qiyuzhong4287

    @qiyuzhong4287

    10 күн бұрын

    Suno gives the rap.

  • @finxter

    @finxter

    9 күн бұрын

    Haha, yeah.

  • @Anders01
    @Anders0118 күн бұрын

    Interesting. I have been thinking that Nvidia is already in a shaky position even with ordinary microchips because of Chinese companies such as Huawei coming to release AI solutions with much better price-performance.

  • @finxter

    @finxter

    18 күн бұрын

    Yeah but compute remains the scarce resource of our time: kzread.info/dash/bejne/lZeArceFk9qsqLQ.htmlsi=3F_gznEEdV3z93lb

  • @Anders01

    @Anders01

    18 күн бұрын

    @@finxter I saw a presentation of how a 1 bit architecture (or trinary rather) performed equally well as a 16 bit architecture for LLMs! Then no multiplication is needed and some much simpler microchip solution can be used. I don't know if it will work in practice, but it could make Nvidia's current solutions obsolete. Also things like photonic and graphene microchips may be something that Chinese companies will soon release (the Chinese tech industry has a massive momentum, scale and speed).

  • @Raoden01
    @Raoden0119 күн бұрын

    i don't know what the authors of this paper smoked, but you can't just take a bit and assign three values to it. not even theoretically. there is a physical device underneath that simply does not work that way. you can either use a 2-bit llm or a 1-trit llm, trits are used in ternary computers, iirc last built in the 60s in russia.

  • @perceptron-1

    @perceptron-1

    18 күн бұрын

    Right.🤫 SETUN-70🤐😛

  • @finxter

    @finxter

    18 күн бұрын

    Maybe they smoked the same as the Russians in the 60s.

  • @telluricscout
    @telluricscout20 күн бұрын

    Ithink the real question is whether chips that can simply change a sign or replace a number by zero , on a large parallel scale, can be made cheaply by many other companies. I don't see why not. They made special hardware for mining bitcoin, so why not for 1 bit quantized language models? In which case nvidia would just go back to being what it was a couple years ago.

  • @finxter

    @finxter

    19 күн бұрын

    This is the threat model, yes, but I don't see why using floating points instead of 1-bits should be the one thing that stands between NVIDIA and all the other players. There's also the developer ecosystem, innovation culture, partnerships with all major cloud providers, brand, software ecosystems (Omniverse, Gr00t), CUDA, ... Even with 1-bit LLMs much of this moat will remain. And we'd still need AI Factories.

  • @alexeykulikov5661

    @alexeykulikov5661

    15 күн бұрын

    They will keep their superiority in AI training though, as these models are, at least for now, still trained in high precision and with matrix multiplications, if I get it correctly. And they will likely specialize in training as much as possible, while there will be many competitors for inference I guess, chips made cheaper, on older process nodes.

  • @anthonyrepetto3474
    @anthonyrepetto347420 күн бұрын

    4:06 - "...times x0, as you know from school." Americans, who don't pass Trigonometry, let alone Linear Algebra and Discrete Mathematics: "because SOH, CAH, TOA, right?!"

  • @finxter

    @finxter

    20 күн бұрын

    Haha 😂

  • @Ianochez
    @Ianochez19 күн бұрын

    I believe FP16 need 16 bits.

  • @finxter

    @finxter

    18 күн бұрын

    💯

  • @ai._m
    @ai._m20 күн бұрын

    Why did you make us listen to wrap? Not cool!

  • @finxter

    @finxter

    20 күн бұрын

    You don't like it?

  • @ArunKumarSah

    @ArunKumarSah

    20 күн бұрын

    @@finxter It is cool...

  • @wolfydoes

    @wolfydoes

    18 күн бұрын

    Silicon valley tv show moment

  • @DugganSean

    @DugganSean

    18 күн бұрын

    Wrap.. Rap 🤔

  • @FirstNullLast
    @FirstNullLast14 күн бұрын

    Is your gpu dying at the end lol???

  • @finxter

    @finxter

    12 күн бұрын

    lol, yeah my GPU is garbage.

  • @UltraK420
    @UltraK42019 күн бұрын

    Do you not think Nvidia engineers are aware of this? They're also working on quantum processors. They're doing everything and they're buying out more companies. -ARM is owned by Nvidia now, that's a big deal.-

  • @finxter

    @finxter

    18 күн бұрын

    Yeah, they're 100% aware of it. They are all over it. They will provide the best hardware requested by AI researchers. It's their bread and butter. Interesting, didn't know about NVIDIA's stake in Arm.

  • @vncstudio

    @vncstudio

    13 күн бұрын

    ARM is still owned by Softbank

  • @UltraK420

    @UltraK420

    13 күн бұрын

    @@vncstudio You're right, I'm not sure where I got that idea from but it is indeed false. Apparently Softbank has 90% ownership and Nvidia _tried_ to buy it from them without success. Thanks.

  • @RonMar
    @RonMar19 күн бұрын

    It's not copy, negate, or skip. It's copy, negate, or set to zero.

  • @finxter

    @finxter

    18 күн бұрын

    It's the same, no?

  • @CapsAdmin
    @CapsAdmin19 күн бұрын

    5:40 the answer is right there in front of your cursor, floating point 16 bit! 😄 Also known as half, meaning it's half the size of the ordinary float/fp32 which is (or was) more common on gpus.

  • @finxter

    @finxter

    18 күн бұрын

    Yes 😅

  • @wanfuse
    @wanfuse18 күн бұрын

    so storage wise (3 states x group of 3, 2 bits) or 4 states 0 , -1, 1 , null for 2 value), or jyst dont use 4th , or 4th is a 3 set flip example -1,0,1 becomes 1,0,-1 or 0, 1,-1 ??? 1.58 bit seems wasteful, there are even better ones, will share later

  • @finxter

    @finxter

    18 күн бұрын

    Yeah, there's no first-principle reason I'm aware off that would necessitate a use of 3 states over 2 or 4 states. Evaluation results on a single model using a single quantization technique is not enough to warrant a complete disruption of our computing hardware.

  • @wanfuse

    @wanfuse

    18 күн бұрын

    @@finxter changing to 3,4,5,6,7 ... state machine would be fantastic in a way but it is likely a task for AGI, which could do the billion man hours in a single month, then again choosing analog with infinite states would probably be its' first choice

  • @dan-cj1rr
    @dan-cj1rr9 күн бұрын

    they got gazillions dollars and enginers, pretty sure they can surpass anyone if this becomes the norm lol

  • @finxter

    @finxter

    9 күн бұрын

    Yeah but don't forget about the innovator's dilemma. It's always easy (in theory) to avoid getting disrupted as the mega corporation. But they still get disrupted because they cannot afford to lose their revenue streams in the old paradigm.

  • @falklumo
    @falklumo20 күн бұрын

    Another video showing a lack of knowledge. First, an 1.58 bit architecture can only be used for inference, not training. Llama:70b runs just fine on M3 Max anyway. Most Nvidia stuff gies into training clusters. Second, Nvidia was always first to embrace fewer bits cores. Third, the number of operations is only halved, not reduced by an order of magnitude. Fourth, the actual „1-bit matmul unit“ would still need to be word-length to avoid buffer overruns. Nvidia already perfected this down to „4-bit matmul units“.

  • @glitchedpixelscriticaldamage

    @glitchedpixelscriticaldamage

    20 күн бұрын

    and fifth : get over it.

  • @handsanitizer2457

    @handsanitizer2457

    19 күн бұрын

    So 70b definitely doesnt run that well on m3 it's ok... but I agree with the rest of your points.

  • @Alice_Fumo

    @Alice_Fumo

    19 күн бұрын

    Wow.. Damn. I was under the horrible misassumption that the thing they figured out with BitNet was how to train such a network. If they didn't, then that explains a lot of what I was previously confused about and I'm back to being not very hyped. Unfortunate.

  • @finxter

    @finxter

    19 күн бұрын

    (1) Disagree. 1.58 bit architecture can also be used for training. You need to think beyond what's literally in the paper. With current hardware, it wouldn't make a lot of sense due to FP4 weights but that's kind of the point of the paper (new hardware needed!) (2) As I said in the video - using fewer bits for weights has been a long-term trend. Even before with Google TPU, etc. (3) Not true at all. You ignore communication overhead in your argument, energy use has been improved by not only one but almost two orders of magnitude. And operations have been reduced by more than half given that you cannot count one "multiplication" op as a single "addition" op. That's the point! (4) ... and they will keep going down. The trend to reduce weight memory overhead won't stop at 4-bits.

  • @finxter

    @finxter

    19 күн бұрын

    The paper specifically takes a trained matrix and uses a process called quantization to reduce it to 1.58-bit weights. However, it's research - future work (as proposed in the paper) would be to train 1.58-bit networks on AI ASICs in which case we don't even need quantization. People need to look beyond this paper. It opened a whole new world of new research in hardware and theory of AI training and inference. It is a big deal.

  • @webgpu
    @webgpu12 күн бұрын

    MULTIPLYING "METRICS" ? ( "METRICS.... METRICS..... METRICS..... METRICS.... " ) FIRST TIME I SEE SOMEONE MULTIPLYING "METRICS" 🤣🤣

  • @finxter

    @finxter

    12 күн бұрын

    Not sure what you're referring to?

  • @marat61
    @marat6115 күн бұрын

    I pretty much do not like that TS talking about benefits of moving to lower representation at the start of the video. Benefits are OBVIOUS for anyone who has at least one brain cell. TS should explain why it is possible in terms of llms performance to move to 1.58 representation.

  • @finxter

    @finxter

    15 күн бұрын

    I can think of two reasons. First because the model representation was more powerful than the "knowledge" encoded in it. That's why reducing the model representation complexity didn't show much change. Second because a slight benefit in performance costs a lot in terms of model complexity and data size (=scaling laws). The other side of the coin is that reducing the model complexity doesn't have a lot of negative impact on performance. Roughly speaking.

  • @marat61

    @marat61

    15 күн бұрын

    @@finxter I am old enough to rember "binarization" hype started by yolo guys. This time I want to see strong arguments of transition to lower bit representations like benchmarks on well-known datasets.

  • @ringpolitiet
    @ringpolitiet19 күн бұрын

    Thank you for bringing the report to my attention. Work on the presentation next time, this was atrocious. The "music", the unfinished slides, the unmotivated Mythbusters clip. It took me about 2 minutes to arrow through this half an hour video, there was almost nothing here.

  • @finxter

    @finxter

    18 күн бұрын

    Thanks for the feedback.

  • @HamguyBacon
    @HamguyBacon19 күн бұрын

    its not 1 bit llm, its 1 trit llm.

  • @perceptron-1

    @perceptron-1

    18 күн бұрын

    Right, but others don't know that.🤐🤫🤪

  • @finxter

    @finxter

    18 күн бұрын

    Okay - this paper was actually built on the 1-bit llm paper. That's why it's called the era of 1-bit llms. The idea is definitely publication worthy given that AI practitioners have been reducing floating point precision for ages (e.g., Google's TPU many years ago).

  • @perceptron-1

    @perceptron-1

    18 күн бұрын

    @@finxter Because the paper workers cannot even imagine that it is possible to implement a computer not only on binary bases, but also with three-valued logic, at the hardware level. The theoretical mathematicians (who think in terms of logarithms log2(3)=1.58) also have no idea that, what they describe with this formula, which is incomprehensible to almost everyone else, can in reality be implemented at the hardware level with only 2 transistors, and in the end everyone would still understand it in this case.