Creel
Күн бұрын
455,799
1

Top 10 Craziest Assembly Language Instructions

Support What's a Creel? on Patreon: / whatsacreel
Office merch store: whats-a-creel-3.creator-sprin...
FaceBook: / whatsacreel
In this video we’ll look at some of the most complex instructions available in x86/64 Assembly language.
I have checked against the manuals from Intel and AMD and results from hardware, but it is almost impossible to create a video without any mistakes. Please test the instructions yourself before you apply them in code.
Happy programming :)
References/Sources:
W. A. Mozart, Rondo from “Eine Kleine Nachtmusik”, Vienna Philharmonic, conducted by Bruno Walter: publicdomain4u.com/wolfgang-a...
Instruction timings: www.agner.org/optimize/#manuals
AMD programmer’s manuals: developer.amd.com/resources/d...
Intel programmer’s manuals: software.intel.com/content/ww...
Jim Morrison image from Wikipedia: en.wikipedia.org/wiki/Jim_Mor...
RDSEED object is based on “The Object” from Led Zeppelin’s Presence album.
Background images from animations from HDRI Haven: hdrihaven.com/
Software used to make this vid:
Visual Studio 2019 Community: www.visualstudio.com/downloads/
Blender: www.blender.org/
Audacity: www.audacityteam.org/
Davinci Resolve 16: www.blackmagicdesign.com/prod...
OpenOffice: www.openoffice.org/
Gimp: www.gimp.org/

Пікірлер: 1 300

@NotDwight3 жыл бұрын
TIL I learned there's an audience for top 10 videos about assembly instructions. Cool.
@thegrandnil764
2 жыл бұрын
I'm surprised our community is so large
@TheActualDP
2 жыл бұрын
I'm surprised this has > 10^5 views.
@icedragon769
2 жыл бұрын
having only ever worked with RISC assembly like MIPS in school, seeing the extremes of what you poor poor x86 driver authors have to deal with is entertaining and enlightening.
@jimviau327
2 жыл бұрын
Sojit , in this case It doesn't appear that this video content will ever be of service to the quality of life you are seeking. Did I just wrote that ? I'm not even sure I understand myself. :)
2 жыл бұрын
@@TheActualDP It has 2#10_1111_0010_1010_0010# views (I love ADA's based integers :D)
@electroflame61883 жыл бұрын
Dot product of packed singles in your area
@Rudxain
3 жыл бұрын
I would like it in my boot sector
@TheLightningStalker
3 жыл бұрын
The probability of finding a project worth uploading commits of my sus code is very low.
@molybd3num823
2 жыл бұрын
@@TheLightningStalker but never zero
@dubbynelson
2 жыл бұрын
dot product of deez nuts packed on your chin
@sumuduranathunga
2 жыл бұрын
I think 🤔 it's must be cross product
@davidjohnston42403 жыл бұрын
RdSeed - It's not always slow. There's a FIFO on the output of the RNG. RdSeed pulls from that FIFO. If you haven't just pulled a bunch of values from the FIFO, the value will be available immediately because the FIFO is not empty. If you try to continuously pull from Rdseed and measure the average time per instruction, it will appear slower because you are limited to the physical rate of generation of full entropy numbers from the RNG, which requires a whole lot of computation - Generate 512 bits from the entropy source, AES-CBC-MAC them together to get 128 bits (that's two RdRand result's worth) XOR it with and output from the DRBG (another 3 AES operations, just like SP800-90C describes) stuff the two 64 bit numbers from the 128 bit result into the output FIFO. How do I know all that? I designed it.
@xelaxander
2 жыл бұрын
The true gold is down in the comments
@LKRaider
2 жыл бұрын
Oh cool. When did you design it? Care to share some history?
@davidjohnston4240
2 жыл бұрын
@@LKRaider It was around 2009 I started. It ended up first in the Ivy Bridge processors with the RdRand instruction. I had been working on writing cryptographic protocols in standard committees (802.11i, 802.16 etc) and they all needed cryptographically secure random numbers and when I looked at the SP800-90 specification back then, it was not sufficient. It described DRBGs (aka PRNGs) but not entropy extraction or physical entropy sources. A small team of 4 people was assembled, myself, a mathematician, an analog designer and a corporate cat herder. the math guy came up with some of the mathematical principles and identified the best papers describing how to quantify the entropy, the analog guy did the physical entropy source, the cat herder got it into silicon and I designed the digital logic that takes the partially random bits, turns them into full random bits with an entropy extractor and seeds a PRNG/DRBG with that full entropy data to make the resulting stream of random numbers fast enough. Since then the other three left (2 retired and one died) and I've been the main owner of the RNGs since. RdSeed which gives full entropy output as per SP800-90C and X9.82 was added with Broadwell. This was so you could make arbitrarily large keys from it. Faster and slower versions were created (fast for servers, slower for energy efficient chips) also I've designed a few other types of RNG for specific needs, like super small ones, non uniform ones and floating point ones. I contributed to the development of SP80090B and SP800-90C and the revision of SP800-90A which now cover most of what you need in a secure RNG. A couple of years ago I finished a book on random numbers which was published (Random Number Generators, Principles and Practices). So getting involved to solve my problem of where do I get random numbers has turned into the defining part of my career. The standard are still changing. Certification requirements are still evolving and the need for new RNGs that fit in different contexts keeps up apace, so it has become a full time job for myself and a small number of colleagues.
@luxsomething
2 жыл бұрын
Wow that's amazing
@ohchristusername
2 жыл бұрын
@@davidjohnston4240 What a lovely comment chain to stumble upon, great read! May your random continue to prosper!
@luck39493 жыл бұрын
Wow, so the task I was given in a job interview was actually an assambler one-liner. Good to know.
@DOSeater
3 жыл бұрын
If you'd said that in the job interview you'd get instantly hired
@luck3949
3 жыл бұрын
@@DOSeater I wish I knew this 2 months ago. I got that job anyway, but it took a little more interview iterations. Now I'm a happy developer of a delivery robot :)
@DOSeater
3 жыл бұрын
@@luck3949 Nice! I'm happy it worked out for you
@guywithknife
3 жыл бұрын
"Oh, that's easy, you can do it in one cycle using the PSCMPXCHGFMADDRABCXYZUW instruction"
@mika2666
3 жыл бұрын
Which one was it?
@ChildOfTheLie963 жыл бұрын
Lol, this guy has that kind of voice that makes it sound like he's constantly on the brink of laughter
@KanaalMTS
3 жыл бұрын
The way you write sounds very British 😂😂
@douwehuysmans5959
3 жыл бұрын
He sounds like BuzzFeeds IT guy
@julian-xy7gh
3 жыл бұрын
I have the same feeling with Tim from the Unmade Podcast. Maybe it's the Australian accent haha
@bakedbeings
2 жыл бұрын
@@julian-xy7gh Australian here: it's not universal for Aussies, he's just a gem 💎
@2112jonr
2 жыл бұрын
More like madness. Assembly language has that effect... .
@zrebbesh3 жыл бұрын
"HCF" -- Halt and Catch Fire. On a lot of early CPUs (1970s/1980s, yes damnit I am old) the manual gave the bit pattern for each instruction - and the the rest of the bit patterns did undocumented things. Some were just a different way to spell NOP, some did deeply bizarre unintended things that happened because the bits randomly activated chunks of the CPU circuitry that mixed and matched chunks that were used in different combinations for other commands, and some did things that were only ever intended to be done in the factory, during QA testing. We used to hunt through these "undocumented instructions" looking for anything interesting or cool that we could then figure out uses for. But this was a bit risky. A fair number of CPUs had at least one undocumented instruction that would immediately cause the machine to lock up and, a few seconds later, destroy the CPU. Sometimes they caught fire, sometimes they melted through the PCB. Sometimes they desoldered themselves from the board and fell out. Whenever we found it we called it a "Halt And Catch Fire" instruction and patched the name 'HCF' into our macro assembler for that bit pattern, in order to avoid accidentally finding it again. Naturally when I saw the title of this video I figured HCF would be at the top of the list. Finding an HCF usually meant a new version of the chip as soon as the company could mask it off. We thought of ourselves as contributing to their QA efforts, although very few of them thanked us for it.
@ducksonplays4190
2 жыл бұрын
That is ridiculous, thank you for this comment.
@rty1955
2 жыл бұрын
Write while rewind Eject disc Read & write while ripping tape Disable console active emergency power off Electrocute operator Sense card deck on printer and open cover Write past EOT Read and scramble data I have a huge list of them along with my green cards
@Safyire_
2 жыл бұрын
Can you give some examples of interesting undocumented instructions you came across with?
@zrebbesh
2 жыл бұрын
@@Safyire_ We found things like 'compare while swapping' that swapped the values in two registers while writing 1 to the comparison bit if the first was higher than the second. That was actually a little bit useful. We found a lot of things that tried to do two or three things at once but did them in a random-ish order because of race conditions. One of those was useful because it consistently did xor before swap if the CPU was hot and swap before xor if the CPU was cold, so we could write code that monitored the CPU and shut things down if it got too hot. We found instructions that connected multiple registers to the bus for output, meaning the result of the instruction would be written to four different registers at once. We also found instructions that connected multiple registers to the bus for input, which was useless and sometimes damaged the CPU. It was a real crapshoot. Also a very expensive hobby if you damaged the machine and your professor wasn't ready to write it off to "research." CPUs were not cheap.
@morgwai667
2 жыл бұрын
@@rty1955 @Zrebbesh you crazy old hackers! ;-) you are legends! :)
@Requiem1005002 жыл бұрын
I love how hyped this guy is about CPU instructions. Really fun to listen to.
@tkeleth2931
2 жыл бұрын
This dude could describe paint drying on a wall and I'd be entertained. I've never seen an assembly instruction before this video lol
@ChristopherGray00
2 жыл бұрын
i don't know why but for me it's quite annoying.
@HuntingKingYT
Жыл бұрын
I'm also hyped when I learn something truly revolutionary
@MichaelMantion
10 ай бұрын
I am surprised he wasn't more excited.
@____________________________.x
9 ай бұрын
Are you kidding me? I hate his voice with every fibre of my being. I've subbed only because he has subtitles and the other videos look interesting. That first 30 seconds was excruciating, I may need a lie down in a dark room
@11117573 жыл бұрын
I can't get over this presentation. That's the kind of nerdy content you expect to find in a recording of a 10 year old talk that was given to 50 people in a tent :D
@ethanpayne4116
2 жыл бұрын
make that a 20 year old talk
@francoisloriot2674
2 жыл бұрын
what were you expecting with this title??
@MartinMurray1966
10 ай бұрын
@@ethanpayne4116 make that 40, i was there :)
@Andrath3 жыл бұрын
You'd almost think silicon makers like to mess with compiler writers.
@kestasjk
3 жыл бұрын
I doubt these instructions were aimed at people writing compilers, they'd be aimed at people doing things with encryption, low-level synchronization, multimedia.. I think these days people would first try and come up with a GPU based way to tackle these large data-processing problems, but before GPUs were general purpose parallel computers you had to do these single instruction multiple data things on the CPU
@toboterxp8155
3 жыл бұрын
@@kestasjk Also, doing stuff with a good CPU instruction is generally more efficient than doing it on the GPU, simply because you have to send across the data and get the result back on a GPU.
@kestasjk
3 жыл бұрын
@@toboterxp8155 Sort of.. The thing is if you’ve got enough data the GPU is so much faster it’s worth the overhead (and the memory space is getting more integrated / unified all the time), and if you’ve not got enough data to make sending to the GPU worthwhile the speed up for processing a small amount of data on the CPU more efficiently probably isn’t worth it. Perhaps for certain encryption or compression tasks where it can’t be parallelised very well on the GPU but it still needs lots of processing power they may still be useful, but I doubt these sorts of instructions are used in modern software very often
@toboterxp8155
3 жыл бұрын
@@kestasjk Your generally correct, but those instructions are a standard way of making programs faster, used to this day. If your task isn't easily converted to the GPU, you don't want the extra work, or you don't want the program to require a GPU, using some complex instructions is an easy, fast and simple way to optimize for some extra speed when needed.
@kestasjk
3 жыл бұрын
@@toboterxp8155 True.. but I think you can probably attribute ARM/NVIDIA’s ability to keep improving by leaps and bounds while Intel is reaching a plateau to its need to maintain a library of instructions that aren’t really necessary in modern software. If it gets rid of them old software breaks, if it keeps them any improvement it wants to make to the architecture needs to work with all these. Intel went for making the fastest possible CPU, but we now know a single thread can only go so fast (and the tricks like branch prediction have exposed gaping security holes in CPUs, forcing users to choose a pretence of security or turning branch prediction off and getting a huge performance hit). So parallelism is the future: In the 00s this meant multi-core CPUs, today this means offloading massive jobs to the GPU, but the breakthrough will come with CPUs and GPUs merging into one. Not to an SoC, like we already have, but with GPU-like programmable shaders as a part of the CPU instruction set and compiler chain, so that talking about CPU/GPU will be like talking about CPU/ALU. You’ll be able to do the operations like these instructions do in a single cycle, but by setting up a “CUDA-core” with general purpose instructions that can access the same memory.
@flowerpt3 жыл бұрын
Intel: One cycle Bioinformaticists: lemme reimplement that in Python and take 300,000 cycles to compute the same thing.
@kestasjk
3 жыл бұрын
Don't worry; as long as computer time remains far more valuable than developer time, and no alternative graphics-based technology appears for custom parallel processing operations, Intel will be just fine
@SimonBuchanNz
3 жыл бұрын
@@kestasjk eh, emulation of x86 on ARM on both Windows and Mac is apparently good enough now that I'd be seriously worried if I was Intel. AMD at least have their GPUs...
@JayOhm
3 жыл бұрын
@@SimonBuchanNz I think AMD wouldn't mind going ARM too much, if they have to. Maybe even will design dual-instruction-set chips for the transition period. Good thing that China won't let Nvidia buy ARM. In general, nowadays there is a tendency towards "crossplatform" software design practices, so the question of "Can it run widespread software fast?" would soon become irrelevant. For example, Adobe Lightroom already works on ARM on Windows and their other products will follow soon. Itanium might not have flopped if it happened a few years from now, at least not for the reason it did, which was poor x86 emulation performance.
@codycast
3 жыл бұрын
@@JayOhm how exactly can China stop a US company from buying a UK company? Should we find out what Italy and Argentina think too?
@JayOhm
3 жыл бұрын
@@codycast The short answer is Qualcomm. They are banned by US so if ARM becomes US-owned, Qualcomm will no longer be able to legally produce ARM chips. Possible political implications of that are just too painful to risk so regulators almost certainly won't allow it.
@0xABADCAFE3 жыл бұрын
So the most amazing thing about these instructions to me is the fact so many of them run in single digit cycles. You have to marvel at the engineering effort that has gone into it. Also, a compiler has to basically be sentient to know when and how to use some of these.
@MrHaggyy
3 жыл бұрын
Yes there went millions of hours of engineering into getting to the point where you could write Hallo World in Python etc.
@altaroffire56
3 жыл бұрын
No. If the compiler was sentient, it would kill itself.
@swarnavasamanta2628
3 жыл бұрын
@@altaroffire56 LOL
@swarnavasamanta2628
3 жыл бұрын
@@MrHaggyy And billions of hours for a javascript hello world. i think capable computer engineers brought this upon their selves by providing layers and layers of abstraction and burying need for internal necessary concepts to get something done. No wonder the developers now are too shallow in their concepts, probably not their fault if they get hired only after 6 months of python for data structures (they have no incentive to learn the deeper internals if they get paid shitload for sitting in a desk). Hell i would say most people choose programming or development for making bucks, learning and interest comes later. There only a few people now who are truly interested and curious in the core of things and it might just be that after 10 years understanding these would just be luxury and not necessity. Also no wonder why most programmers hate their jobs and want to die after getting one.
@MrHaggyy
3 жыл бұрын
@@swarnavasamanta2628 mhm i think the horizon of programmer/developer/engineer in this field got much broader. Yes, there are many abstraction layers we have invented and standardized over the years. I have a mechatronics degree with a microsystem-technology specialization. Most of my field works on improving the hardware for existing assembly code. But we also introduce new things in hardware which we map to assembly or C/C++ code. On that layer, you have the guys who are building assemblers, linkers, and compilers. These are the programs you need to actually execute code on a machine. On top of that, you have the Microsoft, Android, Apple, Linux, etc guys who write an operating system that provides useability with that stuff. And on that foundation, you can start building languages, IDEs, or any program you can open on your computer. And if we finally have these higher-level languages and programs we can start building frameworks or things like python. That field can write very powerful applications that millions of people can use, or that run on many machines at the same time, or all the things these cloud-native guys are doing. The interest in these fields is widely different. I personally love hardware, and the guys I work with love building hardware or building systems with hardware. Systems can be the new Intel i3-i7, over to raspberry pi or smartphone processor, to small controllers like an STM32 which are used in smartwatches, cars, microwaves, freezers down to something like an Arduino which is easy to learn. There are a lot of people working on those layers. Many of them being the stereotype white europeon/north-american older man. But this field is one of the most global out there. With Korea, Taiwan, Japan and China being the "most" impactful. The amount of things you could learn about computer and software layers is way beyond one's reach. 99.99% of all programmers don't have a clue how transistors are formed into bite logic, scaled to 16-32-64-86-128bit wide memory, how this memory became a register with a specific purpose and how you address this register so you can call it. But you don't need to know it in order to write a program. :-) we have you covered in that one :-) So even assembly can teach you a lot about how a computer works, you don't need to write it. In fact you shouldn't write it for any used code. Use a compiler and write it in a higher-level language. All the smart people from the compiler department will cover you there. And so on and so on. Until the hip young facebook star engineer can write his php or python code for his next new feature. And if we do something amazing down the layers he will get a new version that will make his software even better than before. And the only thing he needs to do is trust the work of other people. The unpleasant truth about why so many programmers want to die or really do it is a mismatch between management, expectations and skills pared with bad working environments. Coding and engineering computers is a mentally very hard and demanding task. You have to know your tools, get to know the problem, which I like to call a puzzle, identify the pieces of your puzzle, sometimes create a new piece that fits, and solve the puzzle. This takes time. A good time is anything from 2 to 4 hours. Less is only sufficient for really easy tasks, longer is better but you need to train for it and you need to go to the toilet, move, eat, sleep etc. In most companies, this deep focus session gets corrupted by meetings, telephone, angry managers, or people that think they are important to the problem. These corruptions drain a lot of willpower and unless you are an (senior) engineer and prepared for this kind of stuff it will depress you. You need to get your routines in place in order to sustain. The other part is once you solved the puzzle your company needs to give you a reward for this. If your management doesn't like your result and lets you feel their miss liking, you need someone holding you on the bridge. That's why many companies in this field like Facebook and Intel don't have 9 to 5 jobs. You get paid to work for them. There are recommendations on how you should set up your routines and there are people helping you. But you can come and go as you like. But you get certain tasks and a timeframe. Once the timeframe is over people all over the world are counting on you getting the job done in time. So very wide, very different, and very interesting domain. And it's very rewarding if you know that you did something that all of mankind will use and benefit from in a view month after you finished your work.
@zactron19973 жыл бұрын
Good lord that poor silicon. I can't even begin to imagine how you'd design chips to implement some of these instructions. I'd love to see a followup video showing some examples of using these instructions, and if they're superceded, what should be used instead!
@Noctew
3 жыл бұрын
They committed the cardinal sin in the 1970s with REP MOVx and it went downhill from there.
@fake12396
3 жыл бұрын
microcode, lots of microcode
@shinyhappyrem8728
3 жыл бұрын
I'd think that there are massive groups of "one circuit per operation", and they all work in parallel. From all the results only the specified one is selected.
@Lukas-er4nd
3 жыл бұрын
Microcode. Lots and lots of microcode.
@polypolyman
3 жыл бұрын
A long time ago, they actually gave up on x86, and have been making much simpler chips that convert x86 to that simpler system using "microcode"
@DukePaprikar3 жыл бұрын
Yeah, watch-mojo really dropped the ball by not covering this one.
@redsmith99533 жыл бұрын
I just remember, porting the torque game engine to PSP, and from all the work, the CMPXCHG instruction for the mutex, i implemented some native PSP intrinsic to do that, good memories, the best optimization trick also, the game was doing 10 fps at the best, the problem was matrix transposition, between the engine and PSP "opengl", so i made a transposition on the fly changing the order of reading and writing of the registers in the VFPU instructions, kicking the Sony engineers 'axe' ; ), and getting 30 fps, enough to pass their performance standards.
@KangJangkrik
3 жыл бұрын
Wow you made PSP games?
@redsmith9953
3 жыл бұрын
@@KangJangkrik , i made the Torque game engine port, and on top of that another team was developing games using it.
@DiThi
3 жыл бұрын
Nice, but wouldn't it have been better to change which indices of matrices are used in vector and matrix functions? E.g. using m[4] instead of m[1] and vice versa.
@kyrylmelekhin2667
3 жыл бұрын
Marix transpose is the dumbest operation ever, you shouldn't be doing that, ever.
@redsmith9953
3 жыл бұрын
@@DiThi that implementation costs 20 fps in that platform, you need to swap the entire matrix operations for every calculation, sounds trivial but was not for a 333 Mhz processor with slow RAM. before was: matrix.transpose(); // bloated operation vector.mul(matrix); after optimization was: vector.mul(matrix); // due to the trick no transpose needed
@ZILtoid19913 жыл бұрын
PMADDWD is quite useful for fast affine transformation functions. On SSE2, I can even calculate two pixels at once
@bobbymorelli97633 жыл бұрын
alright guys lets brainstorm what kind of algorithm could benefit from all 10...maybe search for a specific font in an image by comparing each glyphs bitmap to the image using MPSADBW and search for words within identified glyphs using the last instruction?
@AlexanderBukh
2 жыл бұрын
careful, or you might ending up creating another awfully named megainstruction
@bakedbeings
2 жыл бұрын
@@AlexanderBukh ALRTGYSBSTRM
@nyanpasu64
2 жыл бұрын
Needs moar threads.
@abebuckingham8198
2 жыл бұрын
MPSADBW can be used for all sorts of optimization problems as the sum of absolute differences is a metric. It's often faster than using the Euclidean metric which requires a square root and you can substitute one for the other in many situations.
@gazehound
3 ай бұрын
you could feasibly use a good chunk of these by implementing a fancy video encoding
@soranuareane3 жыл бұрын
CMPXCHG is how mutual-exclusion, locks, and semaphores are implemented in systems like QEMU. I remember having to fix a bug with a race condition in the QEMU Sparc interpreter by adding judicious use of CMPXCHG locking. It's an amazing instruction and, with its guaranteed atomic behavior, can be used to trivialize mutexes.
@FinaISpartan3 жыл бұрын
Can't wait till you remake this vid in 10 years with all the custom RISC-V extension instructions. Gonna be pretty wild to see what people come up with.
@ritteradam
3 жыл бұрын
The big mistake Intel made is to create fixed width vector instructions. The V in RISC-V points to the importance of the variable width vector instructions where the assembly code doesn’t need to know the vector register size (V extension), and a similar matrix extension is coming for machine learning I think (though V is already a great improvement)
@canaDavid1
3 жыл бұрын
@@ritteradam The V in risc-v is a roman numeral standing for 5, as it is the 5th iteration of risc from Berkeley (i think).
@ritteradam
3 жыл бұрын
@@canaDavid1 Officially yes, but you can find videos of the people who developed RISC- on KZread, and they mentioned that they originally developed it because they wanted to get the vector extension right, and that's why they called it RISC-V at the start.
@bFix
3 жыл бұрын
Also it's a reduced instruction set (risc) and not a complex instruction set (cisc) like x86 So why should risc-v even get some of these? just do them in software and let the compiler do it's magic.
@TheMixedupstuff
3 жыл бұрын
The point of risc-v is to have a common set of instructions understood by many cpus and to be extended with application specific extensions where needed. So you can be 100% sure there will be many wild instruction extensions.
@icarvs_vivit3 жыл бұрын
#1 is the definition of insane and incredibly useful. Thank you for translating the Enginese into English. Now I can delete my string comparison macros forever.
@ishdx93742 жыл бұрын
the last one seems so damn complex it's unbelievable it takes 3-4 cycles
@Kyrelel2 жыл бұрын
Bear in mind that some instructions were not designed, they are a by-product of the design process. In essence, take any bit-pattern that is not assigned to an instruction and look at what the processor will do. Most often it will do nothing (which his why there are so many NOP's in instruction sets) or it may crash, but sometimes it will do something weird and wonderful and be included as an "official" instruction while the designers pretend it was intentional.
@Rudxain
Жыл бұрын
That's like exploiting hardware-level undefined-behavior
@lPlanetarizado
10 ай бұрын
there is a comment that mentions HCF -Halt and Catch Fire- , "undocumented instruction" that sometimes could catch fire...damn, thats amazing lol
@appelnonsurtaxe
10 ай бұрын
@@lPlanetarizadohat wouldn't happen today on your PC's x86. Or this would be a terrible security issue. On modern systems userspace processes should be able to (try to) run any instruction they want without the CPU melting down.
@NormanVN
9 ай бұрын
All of the instructions in this video were quite intentional, but niche. Well, only some are niche. cmpxchg is a _foundational_ instruction whose importance cannot be understated, while pshufb is going to be in pretty much every vector codebase. dpps is pretty well known, parallel dot product. not a fan of dpps tbh.
@ukyoize3 жыл бұрын
The string instructions seem like half of grep implemintation.
@islandfireballkill3 жыл бұрын
I wonder how complicated it would be to try to formulate compiler autorecognition for instruction selection for these. That last one is easily a couple hundred lines of C code.
@FinaISpartan
3 жыл бұрын
Very complicated. Most of these optimizations are often missed by c compilers and have to be manually implemented in assembly. In some cases (video de/encoding) up to 50% of the codebase has to be rewritten in asm for these reasons.
@Abu_Shawarib
3 жыл бұрын
Your only hope is to use a library that already has fast paths coded in assembly to do this for you.
@jfwfreo
3 жыл бұрын
The best way to do this would be to implement these as compiler intrinsics that would then be substituted with the correct ASM instructions.
@bootmii98
3 жыл бұрын
@@jfwfreo what if some other arch doesn't have them? most compiler suites support at least one other architecture.
@jfwfreo
3 жыл бұрын
@@bootmii98 Most compilers for x86/x64 (including GCC and Microsoft) already support a boatload of compiler intrinsics for SSE and all sorts of things.
@simracing80553 жыл бұрын
I feel bad for the CPU engineers who will need to add compatibility for this stuff in 20 years Edit: finished watching the video. This was pretty fascinating, and the 3D text made it very nice to watch. I hope you gain more subscribers!
@vylbird8014
3 жыл бұрын
They'll do it in microcode, I imagine. Apart from the RNG, they can all be done purely in heaps of microcode if you don't care about performance, no dedicated hardware needed.
@gorilladisco9108
3 жыл бұрын
If you ever learn about microprocessors, it's all about microcode. Every assembly instruction are function call to microcode. The design will basically the same, with microcode printed in ROM inside the chip. You just have to be creative using that microcode to come up with a new instruction.
@johnbrown9181
3 жыл бұрын
@@gorilladisco9108 There's definitely a lot more to it than just microcode. Things that are both easy and compact in hardware - such as a linear-list search or swizzling - and microcode won't get you there. Also I'm not aware of any major RISC implementations that use a significant amount of microcode, very much unlike x86.
@gorilladisco9108
3 жыл бұрын
@@johnbrown9181 And that's why you won't see any instruction like the ones listed on this video on any RISC microprocessors. The thing about x86 and other CISC microprocessors is they use microcode liberally. Microcode is how a microprocessor work. All you have to do is to have imagination.
@Waccoon
3 жыл бұрын
Depends on how fast it needs to be. Optimizing complex instructions to use all of a core's hardware is difficult, but just getting older instructions to work for the sake of compatibility isn't that hard. Hence, x86 code from a couple decades ago will work fine on a modern x64 chip, while ARM, PowerPC, and other RISC designs have suffered mountains of compatibility issues over time.
@MjuMeli2 жыл бұрын
This getting recommended to people is almost as oddly specific as the sound of sorting algorithms
@dkosmari2 жыл бұрын
The carryless multiplication is polynomial multiplication modulo 2. It's used to implement things like CRC computation, and Reed-Solomon error correction codes.
@jgunther3398
9 ай бұрын
i was disturbed to find any mul instruction. i loved my homemade multiplication and division routines
@gazehound
3 ай бұрын
Yes, it's useful for all kinds of codes. It's a direct implementation of a field theory concept
@GaryBickford3 жыл бұрын
Don't forget the Motorola 6800 "Halt and catch fire" instruction. It was an unpublished byte code that caused a branch to itself until the chip overheated.
@BrianG61UK
3 жыл бұрын
No. en.wikipedia.org/wiki/Halt_and_Catch_Fire_(computing)
@GaryBickford
3 жыл бұрын
@@BrianG61UK Long ago a computer center I worked in had a list created by IBMers in the 1960s of amusing opcodes, including HCF. But I didn't want to complicate the text, and the MC6800 item is there in the Wikipedia description, though I did have the details incorrect😊.
@tomysshadow
2 жыл бұрын
This video is about x86 though. Given, it does have the HLT instruction, and if you use it in your user mode application it will catch fire (if by catching fire you mean cause a privileged instruction exception) :0)
@rty1955
2 жыл бұрын
HCF was around in the 60s way before the 6800
@GaryBickford
2 жыл бұрын
@@rty1955 yes, I recall on the wall of a data center I worked at, a paper list of spoof IBM machine instructions that included this HCF instruction. Iirc there was also BAH, Branch And Hang😂. The only CPU that actually did this that I'm aware of was the early 6800, but it's possible there were others. The 6800 was an "unimplemented" instruction bit pattern that unbeknownst to Motorola effectively branched to itself immediately and repeatedly until the heat built up enough to burn the logic. I also personally knew experienced the result of two amusing (to me) episodes - at a college I was attending, a kid running a canned BASIC business program that managed somehow to overwrite the entire disk map, effectively erasing everything, and a kid looking for a job used social engineering to get the guy running jobs to dive and hit the Big Red Halt button. Each of those events caused the Computer Center to be offline for more than a week. And an entire computer center at a company where I worked got completely fried including three mainframes due to a lightning strike right at the pole outside the Center. The senior manager had resisted spending the $5 million required for a motor generator to isolate the computers from the world. We had 400 engineers twiddling thumbs for two weeks. He got a new job.
@rockercas3 жыл бұрын
wow, that were 1010 assembly language instructions, not a mere 10!
@i_am_aladeen
3 жыл бұрын
I actually crunched these numbers in my head before I realized what you did. I feel ashamed. +1
@bbq1423
3 жыл бұрын
There are 10 kinds of people in this world. Those who know binary, and those who do not.
@threepointonefour607
3 жыл бұрын
@@bbq1423 there are 10 kinds of people in the world: those who understand hexadecimal and F the rest
@skilz8098
3 жыл бұрын
@@threepointonefour607 0000 0000b - 1111 1111b == 0x00 - 0xFF since log2(x) is a factor of log16(x)! If you are doing simple programming, then 90% of the time you'll only need hexadecimal. If you are actually building and designing hardware and implementing it's data paths, control lines and control bits... You are not going to get very far without binary and Boolean Algebra! If you get into Cryptography, or Signal Analysis you might want to know binary as you'll end up performing a lot of bit manipulation!
@alg3n320
3 жыл бұрын
@@bbq1423 and those who didn't expect a trinary joke
@WarpRulez3 жыл бұрын
You know that an instruction is complex if implementing it in a higher-level programming language would take literally hundreds of lines of code.
@quadroninja27089 ай бұрын
This video has such an unique editing. The topic isn't any less obscure, and it's really cool to hear the author being so enthusiastic about those instructions. It's a really interesting experience
@helleye3112 жыл бұрын
I just did an assembly presentation for uni. Super basic, the most advanced thing I covered were loops. I did it with mips because simple easy setup and stuff. I knew "proper" CISC assembly had more instructions but holy moly I do not understand half of these. And apparently some of those work in one clock cycle? I'm currently questioning everything I ever knew. These madmen made sand do regex.
@neolordie3 жыл бұрын
when the recommendation are about instructions set you know you are on another nerd level
@quickstartprojects21623 жыл бұрын
Finally SSE 4.2 string compare is understandable. I wish we had the Australian version, Creel version, of the intel instruction set manuals.
@deppy2165
3 жыл бұрын
if you're struggling with the intel manuals I personally find the amd manuals more comprehensible
@SaHaRaSquad3 жыл бұрын
Not gonna lie, string comparison on the instruction set level actually sounds pretty useful. Not a fan of the absolutely insane arguments though.
@WhatsACreel
3 жыл бұрын
Yes, they are magnificent instructions!! Assembly can be super fiddly to code, but very powerful if you have the time to make sure it is correct.
@Gulleization
2 жыл бұрын
Yeah, as an accountant by profession I still wonder how mathematical reconciliation of bank statements and checking accounts can be so complicated to program and usually buggy. I guess that last instruction combined with machine learning techniques really could speed up the process.
@SaHaRaSquad
2 жыл бұрын
@@Gulleization You absolutely don't want machine learning near anything that requires accurate numbers. ML has its place but it isn't nearly as useful or reliable as the hype often makes it appear.
@somdudewillson
10 ай бұрын
@@SaHaRaSquad It depends on they type of ML. Neural networks are generally fuzzy, but there are lots and lots of other kinds of machine learning implementations, and some of them work very well for accurate numbers.
@jgunther3398
9 ай бұрын
it would only be four or five instructions in a loop. but if it was four or five times faster and all you did was compare strings, very valuable!
@sasas8453 жыл бұрын
I've worked with or in close proximity of most of these. If you do high performance number crunching or data crunching, the value logistics (i.e. which value needs to be in what operand in which SIMD position) very quickly becomes a major issue and for that all these shuffle/rotate/select/ are a godsend, especially since they tend to be just rewiring of existing ALU functionality so AFAIK should be easy to implement in silicon. Number 1 on the list is the only instruction family I'd put into "space magic" territory, but I might just not have seen its use case yet.
@galier23 жыл бұрын
TMS-9900 also has a very unique instruction: X Rn . Execute the instruction in register n. It's the only CPU I know of that has the equivalent of an eval() function (as the registers are stored in external RAM, it's clear that it's not difficult to implement in that case).
@Rudxain
3 жыл бұрын
It has SEVERE security issues. But hey, at least it can be used for self-modifying programs
@galier2
3 жыл бұрын
@@Rudxain for a CPU that doesn't have priviledge levels or memory protection, I don't think that security is an issue with the X instruction.
@peterfireflylund
11 ай бұрын
S/360 had the EX instruction for that. The instruction wasn’t in a register but in memory (S/360 was variable length, 2/4/6 bytes). This kind of instruction was fairly common in the 50’s and 60’s.
@galier2
11 ай бұрын
@@peterfireflylund interesting. Btw in the TMS-9900 the instruction is also in memory because the register window is in memory.
@superblaubeere272 жыл бұрын
7:30 Btw, the carryless multiply is extremely useful when making parsers
@mohammedjawahri5726
2 жыл бұрын
:o, can u elaborate pls xD
@superblaubeere27
2 жыл бұрын
@@mohammedjawahri5726 here is a video about it, you will need the context: kzread.info/dash/bejne/qaCqraONZ7bAebQ.html
@mohammedjawahri5726
2 жыл бұрын
@@superblaubeere27 thanks!
@0MoTheG
2 жыл бұрын
@@superblaubeere27 You mean at 35:00 ?
@superblaubeere27
2 жыл бұрын
@@0MoTheG exactly.
@adamengelhart51593 жыл бұрын
The other day I learned about the POLY instruction on the VAX. That's POLY as in polynomial, so when I heard of it I thought "well, I guess there could be a use for it in numerical apps, maybe? It's not like it's going to be more than a few coefficients. Maybe a cubic; that's only four." I was only off by twenty-eight! That's right--the VAX can, with a single terrible opcode, compute the value of up to a thirty-first degree polynomial, to either float or double precision.
@TotalImmort7l
3 жыл бұрын
Isn't assembly strangely awesome?
@romannasuti25
3 жыл бұрын
...wouldn't a 31 degree polynomial just smash the value to negative infinity, positive infinity, or zero? What the hell is even the use of that lol
@juanthehorse420
3 жыл бұрын
@@romannasuti25 nope, if you need to do some crazy ass Taylor series or something and just look at a certain portion
@meneldal
3 жыл бұрын
@@juanthehorse420 Outside of bragging about computing Pi faster, is there any use for 10+ long Taylor series in practice?
@tanszism
2 жыл бұрын
@@meneldal approximating any function with nicer ones and then being able to calculate that fast on the fly can be useful, though most of those often-used functions have fast instructions themselves at this point.
@elietheprof56783 жыл бұрын
Excellent visualizations btw. Way more straightforward than instruction manuals that try to explain everything with just words.
@KazeN648 ай бұрын
I've used MIPS excessively and never looked at X86 much. This feels like when you were playing yugioh in 1999 and you were summoning and setting 1 card every turn and then you get teleported to 2023 where people play their entire deck in one turn and have cards with effects that are 7 paragraphs
@jhgvvetyjj6589
4 ай бұрын
Even when cutting off all SSE and up instructions (making it useful for legacy x86 device targetting) there is still a lot of complexity, including very precise x87 floating point and MMX vectorization. What makes it especially fascinating is how compatible it has become; a 640×480 60fps renderer on a very old x86 processor with MMX might very well be the exact same program that does 3840×2160 60fps on a modern PC.
@jimviau3273 жыл бұрын
I'm no programmer but it appears to me that programming these instructions into a CPU is just about as complicated and fascinating as quantum physics.
@BrightBlueJim
2 жыл бұрын
Then you don't know much about quantum physics. The point is that these instructions were added because doing these operations (which are needed in very specific cases) in software is otherwise very inefficient. In fact, in a microcoded CPU, they aren't that difficult to implement. If you really had to do these things in "hardware" (i.e., dedicated logic gates), that would be a whole lot of square microns.
@mage3690
2 жыл бұрын
@@BrightBlueJim man, what a day and age we live in, to have real estate measured in microns! I'm only 20 years old, and I'm already living in the future. Imagine what the _actual_ future holds!
@Roxor128
10 ай бұрын
@@mage3690 Microns? If you want an actual comparison to real estate in terms of cost for high-end parts, you're going to want something a little bigger. Your unit will be the nanohectare (10mm^2). Your typical big complicated chip will therefore be around 20-40 nanohectares in size and will have cost Intel or AMD the equivalent of buying 20-40 hectares of actual land to develop.
@jgunther3398
9 ай бұрын
half of the fun was all the bizarre "words" that mystified everybody else. it made you feel special. it's not as complicated as it looks. abstracting the problem into code is harder
@jimviau327
9 ай бұрын
@mage3690 here is a hint. In the future you will be a borg. With NeuralLink, all will be connected to the WEB and our reality will be online. Disconnecting from it would represent another phase of consciousness. Then, you will be able to experiment with 5 phases of consciousness , sleep, awake, dream, WEB and illumination. The latter being the most fantastic of all.
@salainen68503 жыл бұрын
PEXT is so useful! I can finally get the correct bits from a 4X 1R 1G 1B 1I 8-bit color buffer to the "layers" in mode 12h easily!
@WhatsACreel
3 жыл бұрын
Mode 12h? Are you coding EGA? That's awesome!
@salainen6850
3 жыл бұрын
@@WhatsACreel Yup! I think I should also do something on UEFI though, as it gives higher resolutions.
@ivanbrezina7632
3 жыл бұрын
Also DES, RC4 and other cyphers based on Feistel's schema would ridiculously slow without this.
@glikar13 жыл бұрын
Exciting! Love your enthusiasm. Almost makes c redundant. There is something about machine code that feels right.
@bootmii98
3 жыл бұрын
did you know that ++ and -- were VAX intrinsics?
@seneca983
3 жыл бұрын
There is something about machine code that feels right. I dunno. I've not done any actual assembly programming so maybe my opinion doesn't matter but x86 just seems so bloated and inelegant.
@swarnavasamanta2628
3 жыл бұрын
@@seneca983 you would be partially right. Bloated or not depends on the way of implementation, if these instructions were to be implemented by microcode, yes absolutely, better let the programmer handle them. But if they are direct on chip Hardware implementation of these instructions then it's a different story, it takes the opposite route of bloat. Takes 1 instruction instead of writing a 100 line function in C and hoping compiler would get the translation right. Also x86 being firmly established the engineers have to make sure they are compatible all the way. Support for languages will drop eventually, while x86 is going to stay.
@seneca983
3 жыл бұрын
@@swarnavasamanta2628 One advantage of a simpler and smaller instruction set is that microcoding might not then be necessary and the chip could be simpler. Indeed x86 would be rather difficult to supplant. However, it seems possible that ARM could do it though it's uncertain and would probably take a long time if it happened.
@swarnavasamanta2628
3 жыл бұрын
@@seneca983 ARM is definitely a beast, and their methodology is completely different from other CISC approaches. It began first as a project to see if a computer really needs large complex instructions, they thought they would come at a halt problem but nothing really came up and they could make everything work with 1 cycle simple instructions (although with a bit of microcode). At this point hard to tell what the future holds, maybe there will be standardization when one architecture has so many advantages that renders other architectures almost useless or unworthy of learning curve. Who knows what the future holds but up until that the architecture land of computers is like wild wild west and i kind of love it that way.
@realhet3 жыл бұрын
PUNPCKLDQD is sad and disappointed not being able to get on the list ;D
@juliankandlhofer7553
3 жыл бұрын
Gesundheit.
@WhatsACreel
3 жыл бұрын
I am sorry, PUNPCKLQDQ... :( if we do a follow-up video, I will be sure to include the unpacking instructions in that :)
@realhet
3 жыл бұрын
@@WhatsACreel I remember doing a 8x8 16bit matrix transpose for a jpeg decoder with only 8 sse regs and 2 memory temp 'regs' with these crazy-named instructions. It was so satisfying when it finally started working correctly. :D
@WhatsACreel
3 жыл бұрын
@@realhet Wow!! Things were certainly tough when we only had 8 regs :)
@SuperSmashDolls
3 жыл бұрын
At some point that stopped being an x86 instruction and started being a DooM cheatcode.
@OzoneGrif3 жыл бұрын
I wonder which language compilers are able to detect these patterns and use the ASM operand instead of doing the slow imperative way.
@WhatsACreel
3 жыл бұрын
I love Clang! It does a lot of optimisations. You might have to use intrinsics, but these things are available in C++. Best way to know if the compiler is using decent instructions is to disassemble and check what it's doing. Or use the ‘Godbolt Compiler Explorer’ website. I don't think there's any compilers that are better at applying these instructions than humans. The gap is narrowing, and maybe one day, we'll get AI compilers that can do these things better.
@OzoneGrif
3 жыл бұрын
@@WhatsACreel Right, I guess the best bet would be to use/create libraries providing these functions as interfaced tooling; the librairies making use of ASM internally if possible (since it depends on the CPU type)
@Winnetou17
3 жыл бұрын
@@WhatsACreel AI compilers that can do things better than humans! NEVER! Maybe just faster... (insecure human signing off)
@mthf5839
3 жыл бұрын
@@Winnetou17 I might be wooshing rn, but there are quite a few examples of AI doing better than humans. Google has some wild stuff for recognizing numbers from blured photos for its street view stuff.
@swarnavasamanta2628
3 жыл бұрын
@@OzoneGrif please no more abstraction by library interfaces at low level. It is a nightmare, i say let good prpgrammers handle this.
@davannaleah2 жыл бұрын
I remember the old Intel 8085 had some hidden instructions we used in our projects we knew they would not be changed because the instruction were used in some of the development tools for the MDS (Microprocessor Development System). There were instructions like LDHLISP with an 8 offset parameter. Basically it was "Load the HL register Indirectly with the Stack Pointer with the offset added" it was essential for writing re-entrant code (in 8085 assembler!). BTW this was way back in 1980!
@boptillyouflop3 жыл бұрын
All of them are SIMD/vector instructions except CMPXCHG (which you need for doing atomics and is arguably less complex than alternatives such as load-lock + store-unlock). Plus, aside from CMPXCHG, none of them write to more than one register (except for flags) or even interact with memory (aside the normal x86 thing of being available in load-modify and load-modify-store forms). None of them can cause special exceptions (aside from the normal page faults from memory addressing). Compared to stuff like 286 protected mode instructions, x87 register stack insanity, decimal instructions, arguably block operations, these are tame (even moreso compared to Itanium ALAT insanity and load checked and reg file rotation).
@T33K3SS3LCH3N3 жыл бұрын
My little brother is doing a similar major as I did and will have a course with some practical work in assembly next year. Your video just gave me the inspiration to help him find some more "creative" solution to those assignments.
@educate99463 жыл бұрын
I love this presentation, it fits the weirdness of the ops! Great job!
@kingtutthefirst2 жыл бұрын
I've always loved the absurdity of the PA-RISC2 instructions SET/RESET/MOVE TO SYSTEM MASK and the PSW E-bit. By changing it, you change the endianness of the entire CPU... And, because of pipelining, the instruction has to be followed by 7 palindromic NOP instructions. That's just always cracked me up.
@MoosesValley8 ай бұрын
Appreciate the tour. Did quite a lot of Assembly coding in my earlier years, and quickly grew to love it - it's a lot of fun when you get up and running, but you need to keep so much more information in your brain / at your finger tips compared to higher level languages.
@FlorianEagox2 жыл бұрын
I love that I can tell how much fun you were having with this!
@xymaryai82832 жыл бұрын
god, not even cryptographers would bother figuring these instructions out nowadays. no wonder RISC instruction sets are so much faster for the same electrons, they don't need to snake around the dark winding alleys of the ALU
@mduckernz
2 жыл бұрын
They absolutely do, though. Crypto nearly exclusively is written in assembler, and prioritises code that always takes the same amount of time to execute (to prevent timing attacks), and code that also otherwise doesn't leak state (the amount of time something takes to execute is a leak, but if it's always the same you can't extract any data from it)
@SvetlinAnkov3 жыл бұрын
@Creel, I love how you slipped in DNA nucleotide bases in the string match example 😃
@allmycircuits8850
3 жыл бұрын
As soon as genetic scientists move from Excel to ASM, we are DOOMED!
@Molotom9 ай бұрын
You have great energy and enthusiasm in this video! Keep it up :)
@furyzenblade35583 жыл бұрын
Woa, high quality video, I love it! And the 3d visuals really help to represent the instructions
@kippers12isOG3 жыл бұрын
I love your vids mate. You’re such a god dam likeable character
@lx2222x3 жыл бұрын
Very cool video with very good animations, pls continue making this videos 👍, I just love ur channel
@first-thoughtgiver-of-will24562 жыл бұрын
This and 2 minute papers are the most important channels on my KZread thank you for your service.
@gazehound3 ай бұрын
Wow that carryless multiplication instruction took me straight back to my Information & Coding Theory class.
@Chrisuan2 жыл бұрын
Found this randomly in my suggestions. Insane content, great stuff. As a C++ programmer this assembly stuff scares me lol
@GogiRegion
2 жыл бұрын
I’ve never done programming in assembly on any newer hardware, so to be I always thought of assembly operations as stuff like move this to there, add, subtract, compare two registers, so even as someone who’s used assembly this is absurd to me.
@bbq14233 жыл бұрын
Wouldn’t it be better to call them functions instead of instructions at this point?
@jjoonathan7178
3 жыл бұрын
Needs a RUNDOOM instruction.
@allmycircuits8850
3 жыл бұрын
@@jjoonathan7178 At least IDDQD seems plausible, integer divide quads by double, store results as double :)
@oldxuyoutube1
3 жыл бұрын
They have their own implementation circuitry therefore they should be called instruction, and this is also one of the most important feature of x86 ISA, we make complex operation into an instruction to shorten the execution time and make program smaller.
@yadt
3 жыл бұрын
@@oldxuyoutube1 well, there is microcode...
@microcolonel
3 жыл бұрын
No, because they are not functions; maybe you could call them routines but not functions.
@CrittingOut9 ай бұрын
one of the assembly instruction video's of all time.
@NogCube3 жыл бұрын
I love your style bro! This is a great one. 👌 Back to 2000.
@JohnCLiberte3 жыл бұрын
Just imagine pitch meetings to decide which instructions should go in the set :D. I'm surprised they don't have a 'calculate your taxes and clean the house' instruction
@boptillyouflop
3 жыл бұрын
These instructions do have a couple really solid selling points: (1) they don't write to multiple registers (2) they don't do special memory accesses (3) they don't cause any weird special interrupts.
@jerradn3 жыл бұрын
I felt like I had to clean my glasses several times during this video, haha.
@intvnut3 жыл бұрын
Carryless multiplication also comes up in error correcting codes and checksums. And, of course, it can implement INTERCAL's unary bitwise XOR if you multiply by 3.
@intvnut
3 жыл бұрын
Hmm... my other comment about PEXT got deleted, probably because I included a link. PEXT implements INTERCAL's _select_ operator. And I believe PDEP can implement INTERCAL's _mingle_ operator. It's good to see Intel catching up with the amazing INTERCAL language!
@louistournas1209 ай бұрын
It is great having a visual of these operations. Intel had once made an app that showed how each SSE instruction worked. I used that to learn and to write assembly code.
@colinstu3 жыл бұрын
that glow around the bright text on dark background is driving my eyeballs crazy.
@WhatsACreel
3 жыл бұрын
Noted! Thanks for letting me know and cheers for watching :)
@colinstu
3 жыл бұрын
@@WhatsACreel interesting vid / instructions nonetheless. but yeah, the glow reminds me of when my eyes are wet from crying, I kept having to pause and rub my eyes to "dry" them only to see it's still foggy looking lol.
@WhatsACreel
3 жыл бұрын
@@colinstu Ha! I felt the same way while making it! I toned down the glow from 6 to 2.5. It was still hard to look at, but I’d already rendered half the animations, so had to settle. I’m hoping to use animations resembling construction paper in the future. They are very easy to look at, but more time consuming to create. We will have to see how we go.
@snoozy355
3 жыл бұрын
@@WhatsACreel what software did you create your animations in?
@soonts3 жыл бұрын
addsubps was probably made for complex numbers packed into these vectors. mpsadbw and similar psadbw indeed were made for video codecs, to estimate errors. You should avoid mpsadbw because too slow, but psadbw is good. I think the craziest of them are for cryptography, like aeskeygenassist or sha1rnds4. Good luck explaining what they do. Another notable mentions are insertps (SSE 4.1; inserts a lane into vector + selectively zeroes out lanes; I used for lots of things), pmulhrsw (SSSE3; hard to explain what it does but I used it to apply volume to 16-bit PCM audio), and all of the from FMA3 set (easy to explain what they do, that’s ±(a*b)±c in one instruction for float numbers, but the throughput is so good).
@WhatsACreel
3 жыл бұрын
Great points mate! Cheers for watching :)
@TomStorey963 жыл бұрын
Would love to see more of these!
@patrickpinholt5030 Жыл бұрын
Fun to hear about the rarely seen instructions 🎉🎉🎉
@SimGunther3 жыл бұрын
EIEIO I know it's a PPC instruction, but still... Seriously, the craziest ASM instructions are the ones not documented in any of the instruction manuals, but are only found by the sandsifter program (written by xoreaxeaxeax)
@sebastiaanpeters2971
3 жыл бұрын
Any proof for your second claim?
@danyildiabin4953
3 жыл бұрын
@@sebastiaanpeters2971 kzread.info/dash/bejne/kZmHo6iYobfFdrw.html kzread.info/dash/bejne/k56XxbxwfMfcn7Q.html This guy had a few talks about undocumented instructions or whole undocumented cpu hardware blocks
@SimGunther
3 жыл бұрын
@@sebastiaanpeters2971 Any of Chris Domas' talks around unlocking God Mode or breaking x86 should suffice
@StannyObelisk
3 жыл бұрын
Old McDonald had an assembler, EIEIO.
@desmond-hawkins10 ай бұрын
About *CMPXCHG* being "absolutely bizarre" (6:22), this is not only used for mutexes and semaphores as explained, but is also the most common primitive used for "lock-free" concurrent data structures (see for example Doug Lea's amazing ConcurrentSkipListMap implementation). It is so useful that many languages export it in some core library, like in C++ or java.util.concurrent in Java. Most programs you use every day likely rely on it or its equivalent in another architecture, unlike some of the other weird instructions listed in this video.
@michaelcederberg7937
9 ай бұрын
And it is not very useful as presented where all operands were registers. You want to executed this on a piece of memory.
@SmoothCode3 жыл бұрын
lol you sound so excited about this. I too caught your enthusiasm in the subject over video. I hope you make more intro to understanding wth assembly language is and how it works in relation to microprocessor/bitwise operands would be really helpful for struggling CS students.
@scorch8553 жыл бұрын
Very cool video, loved all the animations!
@spacewolfjr3 жыл бұрын
Creel, you are most excellent!
@alienrenders3 жыл бұрын
Is it bad that I've used most of these and consider them perfectly normal? Glad you didn't get into OS level instructions that set up descriptors and gates. Now those are weird.
@keokawasaki7833
2 жыл бұрын
bruh that shit fucks with my head, i tried getting into it but then the whole GDT, protected mode, gates and shit just knocked the air out of me by punching my brain in the balls (figuratively)
@ethanpayne4116
2 жыл бұрын
considering these instructions normal is like knowing the difference between the ruddy northeastern gray-banded ant and the ruddy northeastern gray-striped ant. The world of CISC is truly a jungle
@gabrote422 жыл бұрын
I haven't watched a video like this ever. Saving it for arguments. Thanks!
@HowieDue4169 ай бұрын
This video makes no sense to me, but my uncles used to code in assembly language. It just truly gives me awe and appreciation for the pioneers who used this language (WITHOUT DEBUGGING) and makes me see them in a new light as men of math. Thanks for humbling me and thanking god that there are higher level languages
@monad_tcp3 жыл бұрын
This made me realize that X86 is more abstract than the C language, each of those instructions are like 4 or 5 lines of C.
@andersjjensen
3 жыл бұрын
Now imagine having to teach a compiler to take your 5 lines of C code.... and figuring out which of the five thousand different x86 instructions is the perfect fit :P
@codahighland
3 жыл бұрын
That's the opposite of more abstract. Being more abstract means you have tools that are more general-purpose in order to handle a variety of different uses. These instructions are not abstract; they are intended for specific purposes and aren't especially useful at all otherwise. Consider that these instructions are actually implemented as microcode inside the CPU -- miniature programs built out of primitive building blocks.
@sunnohh
3 жыл бұрын
@@codahighland i guess what he is really trying to say is that x86 is so bloated you can implement the same thing a billion different ways
@codahighland
3 жыл бұрын
@@davestephens3246 Was the ad hominem even necessary? I wasn't judging. I was just giving information.
@monad_tcp
3 жыл бұрын
@@codahighland "they are more general-purpose in order to handle a variety of different uses" that's why I said what I said. "X86 is more abstract than C" x86 has lots and lots of complexity, the instruction set has lots of arguments and things that happen in some state and not in others, the instruction is variable length. So, the instructions can be used for lots of different purposes, with different modes, different registers, and so on, and so forth. The instructions are actually implemented as microcode should be more than enough evidence that assembly is more abstract than the machine itself. Assembly is much more complex than the abstract machine that defines C and which you program to. C is basically a macro-assembler for the PDP11, X86 is a monster near it, it can do a lot, much more things, you can fine control memory load/store ordering, lots of abstract things that you can't even do in C, like barriers, for example. One practical example, there are SIMD instructions that a single instruction will to an entire for loop with sum and comparative to a variable, but in a register, like 4 or 5 lines of C is just a single asm in x86, and the compilers know how to translate that, because you can't even declare data-paralelism in C, the compilers have to pretty much guess so otherwise the CPU would be idling because C programs are sequential, but what we care about is how data relates to itself, not the control-flow of the program, the CPU couldn't care less about it (speculative execution for the win!), all because the C has less abstraction power than the machine itself. C is really, really outdated.
@DogsRNice3 жыл бұрын
Of all the thousands of videos I’ve watched this is the one that went farthest over my head
@GeneralKenobi69420
2 жыл бұрын
Furry cringe
@DogsRNice
10 ай бұрын
@@GeneralKenobi69420you have 69420 in your username
@Erizo_9 ай бұрын
I never knew i needed this, until now.
@reirei_tk3 жыл бұрын
Honestly it's amazing how much work PCMPxSTRx can do in 3 or 4 clock cycles.
@overcritical3043 жыл бұрын
Honestly, 2 days ago I was trying to figure out what the hell does MPSADBW do!. Love you Creel, I hope you will make videos on in-depth explanation of these instruction.
@WhatsACreel
3 жыл бұрын
Hahaha, that's awesome! Thank you for watching :)
@diegonayalazo2 жыл бұрын
Thanks. Amazing. Even the background classical music :)
@apreviousseagle836 Жыл бұрын
This is the list I needed in my life!!!
@ProjectPhysX3 жыл бұрын
Fantastic video! Such exotic instructions can insanely speed up / shorten certain algorithms. Back when I did MPASM (has only 35ish instructions), there are some rarely used ones that magically do exactly what you can also emulate in 10 more common instructions. From the instructions in the video I so far only used cmpxchg to emulate floating-point atomic addition in OpenCL.
@Nesetalis3 жыл бұрын
Very interesting, though I don't speak Assembly I recognized a lot of the terms (C/C++ here). However, that glow effect you're using drove my eyes batty.
@swoopskee3 жыл бұрын
whoah, this is some premium content right here, thank you! Subbed and notifications on
@SamB-gn7fw3 жыл бұрын
This is very interesting, I'd love to hear more
@elietheprof56783 жыл бұрын
I have to admit, when CPUs changed from 32 bit to 64 bit, I was skeptical. Like how often do you really need to count beyond 2 billion anyway? But now I see why 64-bit instruction sets can be useful as fuck, and faster for the same clock speed.
@haraldsbaumanis3 жыл бұрын
It would be very interesting to talk to the people who designed these chips
@GogiRegion
2 жыл бұрын
I’m just imagining that the entire design team for #1 probably go into extreme PTSD flashbacks any time they see the letters PCMP anywhere near STR. I just can’t imagine what the proposal Idea was like that led to the instruction being considered.
@adamwieckowski60823 жыл бұрын
pmaddwd is my all time favorite instruction. Totally priceless for video coding!
@distrologic29252 жыл бұрын
Love how excited he is constantly
2 жыл бұрын
It's like watching golden globes for nerds
@mojeimja3 жыл бұрын
I can not imagine a compiler that utilizes these fully! Use asm, optimize by hand!
@soonts
3 жыл бұрын
I agree it’s borderline impossible for compilers to emit them automatically. I saw clang’s auto-vectorizer emitting vpshufb but that was very simple code. I disagree about ASM. All these instructions can be used in C or C++ as compiler intrinsics, way more practical.
@mojeimja
3 жыл бұрын
@@soonts yes, but if one can understand and use intrinsic properly, then heshe can just write entire function in ASM too (right there inside C code), so it not about how exactly to use it, its about to use it efficiently at all.
@soonts
3 жыл бұрын
@@mojeimja The code I write often has both SIMD and scalar parts, interleaved tightly. Modern compilers are quite good at scalar stuff, they abuse LEA instruction for integer math because it’s faster, and do many more non-obvious things. Just because they suck at automatic vectorization doesn’t mean they suck generally. For SIMD code, manually allocating registers, and conforming to the ABI (i.e. which registers to backup/restore when doing function calls) is not fun. With intrinsics, the compiler takes care about these boring pieces.
@Cthulch9 ай бұрын
gosh what a great vid. Mate you rock :)
@lt38809 ай бұрын
This was in my recommendations dozens of times in the last year. I finally watched, and I dont know what to do with this information
@n33to3 жыл бұрын
x86 Assembly is one of the coolest things ever