The Evolution Of CPU Processing Power Part 4: The 32 Bit Processor - Pipelines and Caches

SERIES LINK - • Computing Technology
The rapid expansion of software from simple text-based tools to massively complex, feature-rich, highly visual products would dominate the mass-market computing world during the 1980s and 90s. And with this push, came a higher demand on processors to both efficiently utilize more memory and grow in computing power, all while keeping costs at consumer accessible levels.
RISE OF 32-BIT
During the mid-1980s, in response to the growing demands of software, the opening moves towards the mainstream adoption of 32-bit processor architecture would begin. While 32-bit architectures have existed in various forms as far back as 1948, particularly in mainframe use, at the desktop level only a few processors had full 32-bit capabilities. Produced in speeds ranging from 12Mhz to 33Mhz, the 68020 had 32 bit internal and external data buses as well as 32-bit address buses. It’s arithmetic logic unit was also now natively 32-bit, allowing for single clock cycle 32-bit operations.
One year later, Intel would introduce its own true 32-bit processor family, the 80386. Not only did it offer a new set of 32-bit registers and a 32-bit internal architecture, but also built-in debugging capabilities as well as a far more powerful memory management unit, that addressed many of the criticisms of the 80286.
This allowed most of the instruction set to target either the newer 32-bit architecture or perform older 16-bit operations. With 32-bit architecture, the potential to directly address and manage roughly 4.2 GB of memory proved to be promising. This new scale of memory addressing capacity would develop into the predominant architecture of software for the next 15 years.
On top of this, protected mode can also be used in conjunction with a paging unit, combining segmentation and paging memory management. The ability of the 386 to disable segmentation by using one large segment effectively allowed it to have a flat memory model in protected mode. This flat memory model, combined with the power of virtual addressing and paging is arguably the most important feature change for the x86 processor family.
PIPLINING
CPUs designed around pipelining can also generally run at higher clock speeds due to the fewer delays from the simpler logic of a pipeline’s stage. The instruction data is usually passed in pipeline registers from one stage to the next, via control logic for each stage.
Data inconsistency that disrupts the flow of a pipeline is referred to as a data hazard. Control hazards are when a conditional branch instruction is still in the process of executing within the pipeline as the incorrect branch path of new instructions are being loaded into the pipeline.
One common technique to handle data hazards is known as pipeline bubbling. Operand forwarding is another employed technique in which data is passed through the pipeline directly before it’s even stored within the general CPU logic. In some processor pipelines, out-of-order execution is use to helps reduce underutilization of the pipeline during data hazard events.
Control hazards are generally managed by attempting to choose the most likely path a conditional branch will take in order to avoid the need to reset the pipeline.
CACHING
In caching a small amount of high-speed static memory, is used to buffer access to a larger amount of lower-speed but less expensive, dynamic memory.
A derived identifier, called a tag, that points to the memory region the block represents, amongst all possible mapped regions it can represent, is also stored within the cache block. While simple to implement, direct mapping creates an issue when two needed memory regions compete for the same mapped cache block.
When an instruction invokes memory access, the cache controller calculates the block set the address will reside in and the tag to look for within that set. If the block is found, and it is marked as valid, then the data requested is read from the cache. This is known as a cache hit and it is the ideal path of memory access due to its speed. If the address cannot be found within the cache then it must be fetched from slower system memory. This is known as a cache miss and it comes with a huge performance penalty as it can potentially stall an instruction cycle while a cache update is performed.
Writing data to a memory location introduces its own complication as the cache must now synchronize any changes made to it with system memory. The simplest policy is known as a write-through cache, where data written to the cache is immediately written to system memory. Another approach known as write-back or copy-back cache, tracks written blocks and only updates system memory when the block is evicted from the cache by replacement.
SUPPORT NEW MIND ON PATREON
/ newmind
SOCIAL MEDIA LINKS
Instagram - / newmindchannel

Пікірлер: 240

  • @alexandersundukov3196
    @alexandersundukov31963 жыл бұрын

    06:02 Memory Management Unit 07:12 Pipeline 07:51 Instruction Pipelining 08:30 Pipeline Controller 10:57 Superscalar Pipelining 12:20 Cache 14:37 TAG 14:53 Fully Associative Cache 15:11 Set Associative Cache 15:17 Set 15:28 Set Size 15:45 Cache Hit vs Cache Miss

  • @asep.acep..junaedi9005

    @asep.acep..junaedi9005

    2 жыл бұрын

    TAG Unit X PSU Unit Y.

  • @benjamintostesen6802
    @benjamintostesen68023 жыл бұрын

    These 1-4 parts of documentary has more information than most text books. several times had to pause, replay and think through some of the concepts/designs/methods explained. I salute you for this huge work effort put into the topic - this is truly great work.

  • @codycall6513

    @codycall6513

    2 жыл бұрын

    I know right?! I went through high school in 2000-04, got my a+ networking cert. What a pain. But eventually learned, visual, Java, c++. Web design and development. Learning this way would have been so much easier. Wish I stayed in this field. I was good, real good. Even had the school tech guy got let go and my buddy and me took over. Let it go for 4 years and tried going back but was so far behind.

  • @vivianvaldi7871

    @vivianvaldi7871

    2 жыл бұрын

    As non CS student, his work looks a bit schematic & lacks some existing software illustration. Maybe that's why u have to pause and go back again. For me it lacks some realist & speedy approach. It's much less like he's enjoying the thing, than reading a resume & trying to make it short

  • @asep.acep..junaedi9005

    @asep.acep..junaedi9005

    2 жыл бұрын

    I thingking My Creation if AT Asked: explain your PCB?, Alert other say voice inside ME.

  • @asep.acep..junaedi9005

    @asep.acep..junaedi9005

    2 жыл бұрын

    @@codycall6513 it modif My Force One it not funny it have been use Copy to against ME Alert audio AT claimed My Property Alert other say voice inside ME.

  • @DanhNguyen-hh8kt

    @DanhNguyen-hh8kt

    Жыл бұрын

    1-4=-3

  • @migmardi
    @migmardi3 жыл бұрын

    As a CS grad, the amount of information and detail in this video amazes me

  • @varunkoganti9067

    @varunkoganti9067

    3 жыл бұрын

    As an electronic grad I like to shit on CS grad due to their lack of skill. :)

  • @migmardi

    @migmardi

    3 жыл бұрын

    @@varunkoganti9067 xoxo

  • @ed81ny

    @ed81ny

    3 жыл бұрын

    Agreed. Very high quality content.

  • @VeritasEtAequitas

    @VeritasEtAequitas

    Жыл бұрын

    @@varunkoganti9067 ECE master race

  • @emanuell6672
    @emanuell66723 жыл бұрын

    The amount of images and work that went into this, congrats

  • @AlokMeshram
    @AlokMeshram3 жыл бұрын

    Oh man I just landed on this series a couple weeks ago and when I saw there was no part 4 for a long while, I was upset. And here I am. This is great content, please keep it up! You inspired me to take an MITx course on digital architectures

  • @Sk1erDev
    @Sk1erDev3 жыл бұрын

    I'm a CS student and I learned about parts of how some of the hardware works. This video really put the complexity involved into scale

  • @blitz8229

    @blitz8229

    3 жыл бұрын

    Yeah, I agree, it reminded me of my microprocessor class in college, they really put a lot of work in their videos, I appreciate that.

  • @hi_tech_reptiles

    @hi_tech_reptiles

    3 жыл бұрын

    Hey I'm going back for IT/CS myself, YT has taught me an incredible amount, especially along with just diving in with old PCs, building and tinkering. The history is incredible tho. World changing, obv

  • @ByWire-yk8eh
    @ByWire-yk8eh3 жыл бұрын

    I enjoyed this video. As a commercial computer designer who started working in the late 1960's, I've seen the features and capabilities of the large mainframes (advanced architectures, 32/64 bit, caches, virtual memory, I/O channels, SMP, RISC, emulation, etc.) migrate to the consumer market as microelectronics dramatically reduced cost and increased density. As I see it, advances in microelectronics allowed consumer software to develop the capabilities we see today.

  • @VolodymyrPavlyuk
    @VolodymyrPavlyuk3 жыл бұрын

    Finally someone explained how CPU cache works with right amount of details! Usually explanations are either too abstract or too detailed

  • @TomStorey96

    @TomStorey96

    3 жыл бұрын

    Check out a video from What's a Creel, he goes into even further detail.

  • @VeritasEtAequitas

    @VeritasEtAequitas

    Жыл бұрын

    That's subjective

  • @vxqr2788
    @vxqr27883 жыл бұрын

    You got me instantly as a subscriber when I saw for the first time CPU series. Amazing! Thank you so much!

  • @electronichaircut8801

    @electronichaircut8801

    3 жыл бұрын

    Same

  • @JohannY2
    @JohannY23 жыл бұрын

    This was like a technical blast from the party, and very interesting.

  • @JP-tf4tb
    @JP-tf4tb3 жыл бұрын

    LOVE your CPU series. Are good.

  • @pedror598
    @pedror5983 жыл бұрын

    This is such a fascinating topic

  • @nathanielkirchoff311

    @nathanielkirchoff311

    3 жыл бұрын

    It goes way over my head compared to his other series.

  • @pedror598

    @pedror598

    3 жыл бұрын

    @@nathanielkirchoff311 Yeah, it's really easy to get lost

  • @antoniomaglione4101
    @antoniomaglione41013 жыл бұрын

    I went thru all of this in the '90 - the hard way. It has been a fascination to watch it all over at impossible speed...

  • @pardeepchhikara2170
    @pardeepchhikara217026 күн бұрын

    Thank you very much for sharing the information through such an awesome and easy to digest video series. I thank and congratulate everyone who contributed in creation of this. I appreciate the amount of time and efforts you have invested in this 4 part series. Loved it.

  • @ankitdubey6345
    @ankitdubey63453 жыл бұрын

    waiting for part 5 for 64 bit processors, simultaneous multithreading and low power risc processors. i enjoy ur videos.

  • @richardirvine2220
    @richardirvine22203 жыл бұрын

    I have waited so long for this part 4, thank you New Mind! Like a good year went by and I thought he had abandoned this series. Thank you for a new entry! Love your work!

  • @ExplosiveAnyThing
    @ExplosiveAnyThing3 жыл бұрын

    This channel is awesome !!! I love it keep it up!!! Best channel for me! Talks for all the topic that I am interested in!

  • @AjinkyaMahajan
    @AjinkyaMahajan3 жыл бұрын

    Wonderful explanation! Can't wait to see the next part of the series. Thanks for sharing Cheers

  • @GuildOfCalamity
    @GuildOfCalamity3 жыл бұрын

    This is one of my favorite channels by far. Excellent video... as usual.

  • @conroypawgmail
    @conroypawgmail3 жыл бұрын

    Oh man... I just found your channel and binged this series. Now, I can't wait until Part 5!

  • @RandomNullpointer
    @RandomNullpointer3 жыл бұрын

    I'm so happy you've finally made it to part 4 ^_^ Looking out for the rest!

  • @KabelkowyJoe
    @KabelkowyJoe3 жыл бұрын

    13:30 Someone benchmarked i386 33MHz and Intel Pentium M 1600MHz witch both L1,L2 caches disabled. Turns out in DOS it wasn't much faster like 2-3x at max not 50x. DDR1 at 200MHz was almost as fast as EDO RAM ~100ns. Witch shouldnt be supprise. But it shows that CPU wouldnt be ANY faster without internal SRAM and shows how DRAM became bottleneck like 20 years ago. Up until modern times 50-100ns remained typical time that is needed to access particular cell in DRAM. Most modern DDR4 need 15ns, witch is caused by faster clock speeds but modern CPU wastes 30-70ns before it even begins so total time is still at level of 100ns averange. Firstly CPU need to search for content in caches then it sends demand to RAM controller then RAM controller grabs whole 64bytes into the cache then it is available. Worth to mention that also DRAM in modern times wastes 50-80% of it's time on refreshing it's content. It became so large comparing to internal speed behind all those internal banking it uses almost all of it's time on refreshing. And depending how deep association is (how large association tables to seach for addresses) it may take for CPU 2 cycles L1 it may take 4-16 cycles L2 and 32 cycles L3 it all takes a lot of time before its even able to execute anything Success of Intel Core architecture in 2010 Sandy Bridge was caused mostly if not only by reduced time needed to access RAM ~45ns compared to 100ns for Buldozer's three layer cache hierarchy. Success of AMD ZEN was caused by multiple factors but wouldn't be possible if they not redesigned RAM controller and cache hierarchy. Modern DRAM including DDR5 is made upon very simple principle of adding yet another layer of miltiplexers, banks witch causes random acccess time to remain the same as compared to RAM made 20 years ago, what only increases is "streaming" lange data transfers. So in result even most modern CPU wouln't be any much faster than i386 without it's internal SRAM cache subsystem.

  • @rabbitcreative

    @rabbitcreative

    3 жыл бұрын

    As a budding DBA, I call this insightful. Thank-you.

  • @TheDude50447

    @TheDude50447

    3 жыл бұрын

    The Pentium actually is the much faster cpu. This test was basically comparing ram access speeds not the processing power of the chip. If you starve any cpu of data to process it just cant process anything and suddenly cpus are all the same :D

  • @KabelkowyJoe

    @KabelkowyJoe

    3 жыл бұрын

    @@TheDude50447 Well.. yes and no. It was benchmarked using popular DOS 3D bench so it wasnt just memory benchmark. Despite of superscalar architecture and huge clock speed advantage it wasnt much faster.

  • @KabelkowyJoe

    @KabelkowyJoe

    3 жыл бұрын

    @@TheDude50447 Forgot name of leading C++ programmer C11 standard co author. He proven how RAM became problem in current era of multithreading. Shared RAM resources are huge bottleneck, cache may fix single core problems would not fix it at all if you deal with tables that wont fix in cache and shared resources. He proven how 1 core system could be faster than 18C Sky Lake. If programmer would be aware that he dont have GB of RAM and GB/s but max 16kB and RAM is like slower than 1.44 FDD. Just like 40 years ago. Can you imagine it would take 1s to transfer 20MB of cache back and forth to registers if you call each cell in random manner.

  • @KabelkowyJoe

    @KabelkowyJoe

    3 жыл бұрын

    @@TheDude50447 You are correct Pentium is much faster, not just by RAM/cache cause Pentium begun era of superscalar CPUs. Dividing instructions into smaller pieces and puting those pieces in pararell, Intel was able to outrun 68k and also end CISC/RISC war. Obviously i386 have to be much slower since it wasn't superscalar and it's not just caused by more sophisticated caching. But even if you have multiple ALU and AGU units if you have pipeline in each of those you run multiple instruction in pararell.. still what program does is based on data. Huge data still it have to fit in RAM wont fit in cache. Spectre and Meltdown caused OS to invalidate caches every 20ms back and forth.. I was improvising when i wrote this comment you replied to. I just tried to convince myself either. Maybe it's not true but still better to feel weakness of RAM. Thats why i wrote it. So people may realise there is a lot to fix we have still one leg stuck in past.

  • @joshbassett
    @joshbassett3 жыл бұрын

    I have recently been designing a cache for a MiSTer core I am working on. This video is incredible, it manages to compress so many chapters from dense computer science textbooks into a mere 19 minutes. Amazing work.

  • @corwin881
    @corwin8813 жыл бұрын

    So much thanks and Kudos! 🙏 I am always delighted, when you post a new video.

  • @soraaoixxthebluesky
    @soraaoixxthebluesky3 жыл бұрын

    Man I’ve waited for this “part 4” for like ages. Thanks for the upload.

  • @KuraIthys
    @KuraIthys3 жыл бұрын

    It's interesting to note that one of the things that made the 6502 so fast relative to other contemporary CPU designs of the time is that it performs memory access and instruction execution in parallel. This makes it precursor to pipelined designs that came along later. (the 6502 however is a register + memory design which became increasingly problematic as the CPU speeds increased faster than memory speeds) You'll note however that many instructions still take multiple cycles to execute. In essence however you can predict the execution time based on how many memory accesses are performed, keeping in mind that the instruction opcode itself also has to be performed. For instance, an ADC (add with carry - note that the nature of this instruction means you get incorrect results if you don't precede it with a clear carry instruction, but it does mean multi-byte addition is faster for additional bytes) takes between 2 and 6 cycles (With a whole bunch of extra conditions adding a cycle which only apply to the 16 bit 65816). Since this adds the 8 bit accumulator to an 8 bit value, it's worth considering the time taken for different variants of the instruction. There are 15 variants of the instruction that apply to the 65816 (including one that takes 7 cycles), and 8 that apply to the original 6502 (and 65c02) The 2 cycle version is the ímmediate addressing version. This means the instruction sequence itself contains the data. Thus, 2 bytes of memory are read. The first being the instruction, the second being the second operand (the first operand is the value in the accumulator) A more typical 4 cycle add happens when using absolute addressing. In this case the second operand is in a memory location, but the instruction contains a pointer to the memory location. Since there is 64k of address space (2 bytes) on the original 6502, we read one byte for the opcode, two bytes to get the address, then finally we have to read the operand from the location we just got the address for, totalling 4 bytes read, and 4 cycles taken. By contrast, the zero page version of the instruction works with the same logic, but since only a single byte is needed to specify a zero page location, we need 3 reads, and 3 cycles. A full 6 cycles is taken by using a Zero page indexed Indirect, X addressing mode. This is as complex as the name sounds, and it uses the X register (which would have to have a value before you call such an instruction for this to make sense) The instruction, as always, has an opcode, as well as a single byte that specifies a base memory location in the zero page (hence why it can be a single byte). However, the CPU takes this base value that is specified in the instruction, adds the value in the X register, then reads the byte at this location in the zero page, as well as the next byte, before using that as an absolute address in memory to read a byte from and perform the addition. You'll notice this is one case where the total number of bytes read is 5, while the instruction takes 6 cycles to complete. However, notice that two additions are performed along the way; the first to determine where to find the memory address, then second to actually perform the addition. One instruction byte, one memory index, two memory address bytes, and finally the operand for the addition. Point is, the execution time is, with rare exceptions, tied directly to how many memory accesses are required for an instruction to complete, and for rather obvious reasons, something like an addition cannot be performed until it's second operand (The value you are trying to add) has been loaded into the CPU, but you can still do things like instruction decoding as soon as you know what your instruction actually is. Internally this is done by doing different parts of the instruction in parallel; decoding the instruction before the operand is available to perform the calculation, and fetching the next instruction the moment the previous one is ready to execute. It might only be a 2 stage pipeline (and even then it only barely qualifies), but it means the only bottleneck is memory access speed...

  • @Databus-yh9zv

    @Databus-yh9zv

    3 жыл бұрын

    well done ;)...

  • @michaelmccarthy4615

    @michaelmccarthy4615

    3 жыл бұрын

    A few charts and graphs and there is a lesson ready tp go.

  • @deedewald1707

    @deedewald1707

    2 жыл бұрын

    Well put & described !

  • @DannyCodePlays
    @DannyCodePlays3 жыл бұрын

    This video is absolutely awesome! I would have loved a video like this back in my college days for sure!

  • @elultimopujilense
    @elultimopujilense3 жыл бұрын

    One of the best youtube channels in term of quality of content and presentation. Bravo!

  • @MKeehlify
    @MKeehlify3 жыл бұрын

    I love this content. Your visualization of the concepts is extremely good!

  • @rdub2444
    @rdub24443 жыл бұрын

    It's finally here! Been waiting for part 4 for a while.

  • @rohithill
    @rohithill3 жыл бұрын

    I have waited for it for far too long!

  • @TheTwick
    @TheTwick3 жыл бұрын

    Just sat through all 4 parts! Wow! Great job! I’m a new subscriber now.

  • @blitz8229
    @blitz82293 жыл бұрын

    Wow, a lot of complex topics simply explained and introduced, I enjoyed this video really much!

  • @gazzacroy
    @gazzacroy3 жыл бұрын

    a brilliant set of videos, i reall enjoyed watching all parts.. top stuff :)

  • @anonymous_coward
    @anonymous_coward3 жыл бұрын

    I've been reading Computer Architecture by Patterson and Hennessy and I am amazed by how complete and correct your discussion of caching was. The only thing I think isn't quite right was the statement "virtually every aspect of a cache's design is made to achieve a maximal hit rate", which should have probably replaced the word "is" with "was" since access latency ,currently, is an issue that has to be considered and forces trade offs against maximizing hit rate. However in the context of the late 80's and early 90's I don't think the access latency of a cache was as much of a problem since it seems that the access latency only becomes an issue once caches reach a large enough size which they weren't in back when 32bit processors were first introduced. Hence why I think the quoted statement should have been more past tensey.

  • @abelashenafi6291
    @abelashenafi62912 жыл бұрын

    Thanks for all 4 videos.

  • @michaelkemp7782
    @michaelkemp77823 жыл бұрын

    This series is an excellent intro to computer architecture. Nice job.

  • @boio_
    @boio_3 жыл бұрын

    Yet another episode of rich research and concise narrative, godspeed!

  • @dewiz9596
    @dewiz95963 жыл бұрын

    Wow! I’ve always considered all aspects of the cache as “magic”. . . The key to removing all that complexity is faster memory and memory bussed. . . much faster. . . Great stuff. . . even for my own slow memory. Pipelining, predictive branching. . . Yup.

  • @andersjjensen

    @andersjjensen

    3 жыл бұрын

    The problem with faster memory is that electrical signals travel at the speed of light. Now take a ruler and measure the length between the far edge of the CPU socket and the furthest DIMM slot and solve for time. Now compare that to the fact that a typical modern CPU running on 3200MHz CL14 memory has a memory latency of 70-75ns... Not much of a margin there, right? We've been at this point for quite some time, and hence we invented Double Data Rate memory in it's various iterations. This doesn't help the latency but it improves the overall throughput by an ever greater factor with each generation.... So we need caches. Because caches help you when you need low latency and system memory helps you when you need overall high throughput. The same system is used in storage solutions: You have the memory on the disk controller itself which is wicket fast, then you have a few high speed NVMe drives on the controller for "ultra hot data", then you have some fast SATA SSDs for "hot data" and lastly you have a ton of spinning hard drives to take up the bulk of the data. This is the "L1, L2, L3 caches + system memory" approach all over again. And it is very fast and very cost effective.

  • @kiseitai2

    @kiseitai2

    3 жыл бұрын

    Anders Juel Jensen Then comes HBM into the picture, which makes the concept of cache and main memory almost one and the same by bringing the main memory onto the die. That decreases the reliance on higher layers of cache.

  • @andersjjensen

    @andersjjensen

    3 жыл бұрын

    @@kiseitai2 HBM doesn't have vastly better latencies than DRAM. They're better. A lot. But not to the point that it fundamentally changes anything. So you're still going to need read-ahead/pre-fetch, branch prediction etc, etc, so you still need caches. Just not as big caches. Which saves money. But HBM needs to be mounted on silicon interposers because you need 1024 lanes/traces to each HBM module. Interposers are made from regular silicon wafers, and are produced with (currently) 10-12nm class lithography. This is not cost effective. To the point that, outside of specialized equipment, a big honking cache and slower memory still gives you the optimal price/performance ratio. However, chiplets and chip stacking are dawning technologies, and eventually the yield and binning gains from producing lots of little chips and putting them together on an interposer will eventually outweigh the extra cost of the interposer itself. So eventually putting main memory on the same substrate as the computing silicon will become cost effective. What remains to be seen is whatever the combination of normal DRAM + big caches on the same interposer then just push the HBM + smaller caches back yet again.

  • @tylerdurden3722

    @tylerdurden3722

    3 жыл бұрын

    @@kiseitai2 DRAM, like HMB has other things that add latency. Like refreshing. Bits are stored in capacitors that experience leakage. So these capacitors need to be refreshed. Which can result "delays". SRAM, the memory typically used as CPU cache, uses several transistors to store a bit. It doesn't need to be refreshed. So there's no such refresh delay. But because it requires several transistors per bit, it has much lower density.

  • @MrArunraja08
    @MrArunraja08 Жыл бұрын

    Wow! Students who take computer architecture courses, this series give a total overview. Pipelining & ISA & Cache & Out of order execution & branch prediction & paging ... ect. This could have easily been a 50$ course. Thank you for this.

  • @stachowi
    @stachowi3 жыл бұрын

    This channel is INSANELY awesome!

  • @LordDecapo
    @LordDecapo3 жыл бұрын

    This is easily one of the best series on CPUs ever. I was hoping this p4 would come out. Gonna link this to a lot of ppl

  • @Drumsgoon
    @Drumsgoon3 жыл бұрын

    Very nice series! I rewatched part 3 before this one, which was useful.:)

  • @evilgamer0143
    @evilgamer01433 жыл бұрын

    Nice work, fan of this serie specifically

  • @listtamaru
    @listtamaru3 жыл бұрын

    Another fantastic production.

  • @conorkerin5277
    @conorkerin52773 жыл бұрын

    Great series man keep it up!!

  • @orangejoe54
    @orangejoe543 жыл бұрын

    This might be my favorite series on yt

  • @tommozzarella2671
    @tommozzarella26713 жыл бұрын

    appreciate your clear ways of explaining all topics discussed, 👌👋

  • @nancywangeci8189
    @nancywangeci81892 жыл бұрын

    Very Good Video Tutorials... THANKS.

  • @ankitdubey6345
    @ankitdubey63453 жыл бұрын

    I was waiting for this video so badly❤️

  • @TheMcSebi
    @TheMcSebi2 жыл бұрын

    Wow, this video sums up an entire university course I attended in the second semester. Good job explaining!

  • @Minzkraut
    @Minzkraut3 жыл бұрын

    Finally Part 4 ❤️❤️ it's been so long

  • @GADGETSCOGNOSCENTE
    @GADGETSCOGNOSCENTE3 жыл бұрын

    Finally a new video 💓💓 GREAT as usual ☺️

  • @virenramchandani6113
    @virenramchandani61133 жыл бұрын

    This video series can literally instill a Computer Architect in anyone. Kudos to the makers 🙌🙌

  • @adrianstanciu3988
    @adrianstanciu39883 жыл бұрын

    Great series! Than you!

  • @AndrewMellor-darkphoton
    @AndrewMellor-darkphoton3 жыл бұрын

    Can you take about other types of processors like GPU, FPGA, and ASIC?

  • @ultrapetey

    @ultrapetey

    3 жыл бұрын

    Technically speaking a cpu and a gpu are both ASICs...

  • @AndrewMellor-darkphoton

    @AndrewMellor-darkphoton

    3 жыл бұрын

    @@ultrapetey yea, but that a bad way to think about it

  • @ultrapetey

    @ultrapetey

    3 жыл бұрын

    @@AndrewMellor-darkphoton Why so? :(

  • @NationalSecessionistForces

    @NationalSecessionistForces

    3 жыл бұрын

    @@ultrapetey A GPU is an ASIC, but a CPU is a general purpose device, not so much "application specific", now is it? Also modern GPUs (if you ignore TMUs, Tensor Cores and Ray Tracing bullshit) are general purpose as well. If the thing is fully programmable for any task, is it really application specific?

  • @harrkev

    @harrkev

    3 жыл бұрын

    Easy version. A processor is "general purpose." It can do anything. But it comes with the down side that it never knows what it is doing next, so it constantly has to get an instruction to know what the next step it. Each instruction is one step, so a CPU does one thing at a time. --- --- --- If you custom-design the logic, then the logic knows what it has to do, so no instruction fetching needed. Plus, you can pack more logic in there. A CPU can add two numbers at a time. With custom logic, you could have 100 or 1000 adders all running at the same time. --- --- --- ASIC vs. FPGA. Both run custom logic. But the ASIC lets you physically place gates and transistors on silicon specific to your purpose. This is the ideal case: fastest performance, lowest power. But the down side is the extreme cost associated with designing an ASIC. The chips are cheap, individually, but it can run into the millions to design it. --- --- --- An FPGA has logic structures that can assume any desired function. Combinatorial logic is modeled with tiny RAMs. Muxes let you use built-in registers or bypass them. So, an FPGA can take on ANY design that will fit into it. All you have to do is buy one and start coding for it, so the cost of entry is VERY low. But the down side is that the flexibility of an FPGA costs silicon area, speed, and power. So an FPGA is great for a smaller run of a few chips (one to thousands). If you want to ship a product in the millions, consider an ASIC. --- --- --- A GPU is basically sort of a hybrid. There are certain types of math an operations that are common in 3D graphics. Plus, you tend to do the same operation on a LOT of data at once (look up "SIMD" on Wikipedia). A GPU is programmable, but optimized for the types of operations needed for graphics.

  • @TheTruthSentMe
    @TheTruthSentMe3 жыл бұрын

    Dude, this is awesome!

  • @edm4617
    @edm46173 жыл бұрын

    Thanks for amazing content.

  • @druviya9208
    @druviya92083 жыл бұрын

    Great job, really good explanation.

  • @peppa1492
    @peppa14923 жыл бұрын

    Can't wait to see more episodes!

  • @HasanAmmori
    @HasanAmmori2 жыл бұрын

    You are a great man, Mr. New Mind

  • @jonsmith1271
    @jonsmith12713 жыл бұрын

    As a cs teacher , this is to be a recommended video for my students. Very good structure and development of story . Excellent video

  • @paolotofani4069
    @paolotofani40693 жыл бұрын

    Tutta la serie è molto affascinante.

  • @lxm2600
    @lxm26003 жыл бұрын

    Thank you for the awesome series! waiting for Part 5: 64-bit CPUs and multi-cores, Part 6: GPUs ! :)

  • @mikejones-vd3fg
    @mikejones-vd3fg3 жыл бұрын

    Very cool, sort of off topic but this reminds me of how a CPU engineer was describing how the 1's and 0's are actually analog as it takes a an activation threshold to decide whether its on yet. A somewhat smooth curve on a graph and theres lots of points on it that you can say its "on" but they pick a certain spot and has to do with the electron bleed and what not. Far from a strictly on and off process, which leaves lots of room for advancement i think at the transistor switch level before quantum computers are finally practical.

  • @kingfisherdxb
    @kingfisherdxb3 жыл бұрын

    great work NEW MIND

  • @MrZapper1960
    @MrZapper19603 жыл бұрын

    This is a fantastic video

  • @jmac1099
    @jmac10993 жыл бұрын

    how the heck do you only have 240k subs.. All your work is so good, bravo.

  • @superpayaseria
    @superpayaseria3 жыл бұрын

    Simply one of the most wonderful things I've ever seen!!!!! Any robotic engineers want to help me build a robot let me know.

  • @hamletgauran3004

    @hamletgauran3004

    3 жыл бұрын

    What kind of robot

  • @johnsavard7583
    @johnsavard75833 жыл бұрын

    A pipeline treating fetch, decode, and execute as the three steps dates back to the IBM 7094 II, and the 6502 did this as well. Processors like the Pentium were considered notable because they had pipelines which split the execute phase into multiple parts.

  • @PrinceKumar-hh6yn
    @PrinceKumar-hh6yn9 ай бұрын

    Thanks for the historical touch in scientific manner

  • @avrilianbriliansyah5290
    @avrilianbriliansyah52903 жыл бұрын

    Finally! Part 4!

  • @grossersalat578
    @grossersalat5783 жыл бұрын

    Wow. I need to rewatch this after a cup of coffee.

  • @gua_s
    @gua_s3 жыл бұрын

    Love this videos

  • @tomkusmierz
    @tomkusmierz3 жыл бұрын

    Good job buddy.

  • @cannonfodder4376
    @cannonfodder43763 жыл бұрын

    Computers and such remain alien to me, but these are informative videos as always.

  • @TheMadMagician87
    @TheMadMagician873 жыл бұрын

    This was a great series, very well done! I wonder if you would be interested in expanding on this by explaining which of the mechanisms you have outlined, which historically achieved increases in processor throughput, were exploited by Spectre and Meltdown leading to security risks (i.e. speculative execution etc)?

  • @abhisekashirbad5649
    @abhisekashirbad56493 жыл бұрын

    Your content is so good and can rewatchable for science enthusiast like me. Keep up the good work. Love from 🇮🇳

  • @jessefurlan5585
    @jessefurlan55853 жыл бұрын

    0:23 spooky face at the back of the boat

  • @Klayperson

    @Klayperson

    3 жыл бұрын

    came here to post this. that's some deep-dream looking shit right there

  • @marielmartinez4930
    @marielmartinez49303 жыл бұрын

    THANK YOU.

  • @gpuwu
    @gpuwu2 жыл бұрын

    I love and hate how you make it sound so simple

  • @stefangrambach9519
    @stefangrambach95193 жыл бұрын

    this is great content

  • @gacsizclickon
    @gacsizclickon3 жыл бұрын

    This is GOLD.

  • @mrflamewars
    @mrflamewars3 жыл бұрын

    Talk about memory speed should include a mention of things like the integrated memory controller and the accompanying removal of the front side bus / north bridge. That was a big deal and ram access got way faster once the FSB was done away with.

  • @gregorymalchuk272

    @gregorymalchuk272

    3 жыл бұрын

    And now proprietary on-chip bridges and FSBs made it impossible to manufacture socket-compatible professors to modern Intel and AMD motherboards.

  • @Alyosha15
    @Alyosha153 жыл бұрын

    are you making a 5th episode? Just got though the series and I love it

  • @polontang7909
    @polontang79093 жыл бұрын

    Very good CPU series. May I suggest the next part to cover multi level cache, micro code, hyper threading, multi core (including big-little), non uniform memory access and other please?

  • @kalidsherefuddin
    @kalidsherefuddin3 жыл бұрын

    Thanks

  • @AboveEmAllProduction
    @AboveEmAllProduction2 жыл бұрын

    I just learn how put resist transist and capecitor and now from these video I can learns engineering of cpus. I know all from basic to assembly like lda and sto

  • @MiVidaLoca1024
    @MiVidaLoca10243 жыл бұрын

    Nice video. Are you working on one the explains the rise of CPU Cores? Gone are the good old days when MIPS were easy to understand.

  • @pmgodfrey
    @pmgodfrey3 жыл бұрын

    I have one device that uses a 386 processor. My 42" scanner that Contex made for HP. Bundled with the printer, it is known as the HP 815MFP. The processor just does the background work stitching the image together from the two cameras the scanner has. It's a nifty device.

  • @RodrigoVzq
    @RodrigoVzq3 жыл бұрын

    I love this

  • @thrasher7090
    @thrasher70903 жыл бұрын

    yes yes yes thank you so much

  • @c128stuff
    @c128stuff2 жыл бұрын

    One note about cpu and memory speed... there was a time when memory was faster than typical cpus. This can be seen in many late 1970s and early 1980s home computer designs, where memory was used in alternating cycles between video and cpu. In the more advanced designs, this was done using a dual phase clock, where one edge would be used as cpu cycle, and the other edge as 'video cycle'. For this, ram had to run at twice the rate of the cpu. Of course this approach became unviable with the introduction of 16bit and later 32bit cpus and higher cpu clock rates.

  • @ppugalia9000
    @ppugalia90003 жыл бұрын

    Make a series on quantum computing....

  • @Stravant
    @Stravant3 жыл бұрын

    Backwards cursor for line select! 0:44 Didn't know that was already a thing back as far as the very first version of word.

  • @theosib
    @theosib3 ай бұрын

    These are good videos, but I'd like to offer a minor correction. Super scalar refers to the general ability to fetch, decide, and dispatch multiple instructions per clock. This also relies on sufficient backend compute logic that multiple issue can mostly keep up. I'm not aware of any commercial processor that can split it's resources to follow multiple branch paths at once. Also, keep in mind that this would grow exponentially with each encountered branch.

  • @rjones6219
    @rjones62192 жыл бұрын

    I hate it when people, like this, present the historical evolution of I.T. Makes me feel so old! :)

  • @bigpantsbobnuggets5051
    @bigpantsbobnuggets50513 жыл бұрын

    I'll never view computer magic as simple again.

  • @scottfranco1962
    @scottfranco19623 жыл бұрын

    The import of flat memory model for the 80386 was that it got rid of the segmented model for programs larger than 64kb. In that amazingly stupid model, it was necessary even to custom design C programs just to run on the 8086-80286 CPUs, in the form of segmented declaration and management. This prevented, for example, a port of Unix onto the 8086-80286 CPUs. If they hadn't done that (introduced flat model), other CPUs would have taken over from Intel's CPU line, since microprocessors were taking over from Minicomputers and workstations. Intel initially embraced the segmented model as it prevented customers from easily migrating to other CPUs, since the programs would have to be redesigned, but with the rise of workstations based on RISC and 68k cpus, it became clear that it would prevent a lot of programs from being ported TO the 80x86 family just as it would prevent porting FROM the 80x86 family. It worked! 80x86 CPUs eventually killed off the workstation (and later server) market just based on cost. And this is long before the age of ARM.

  • @sirvapalot
    @sirvapalot3 жыл бұрын

    i watched all four parts but this level of information is over my head, frankly i'm surprised i'm intelligent enough to use a MacBook lol.