CppCon 2016: Timur Doumler “Want fast C++? Know your hardware!"

CppCon.org
-
Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/cppcon/cppcon2016
-
As C++ evolves, it provides us with better and more powerful tools for optimal performance. But often, knowing the language very well is not enough. It is just as important to know your hardware. Modern computer architectures have many properties that can impact the performance of C++ code, such as cache locality, cache associativity, true and false sharing between cores, memory alignment, the branch predictor, the instruction pipeline, denormals, and SIMD. In this talk, I will give an overview over these properties, using C++ code. I will present a series of code examples, highlighting different effects, and benchmark their performance on different machines with different compilers, sometimes with surprising results. The talk will draw a picture of what every C++ developer needs to know about hardware architecture, provide guidelines on how to write modern C++ code that is cache-friendly, pipeline-friendly, and well-vectorisable, and highlight what to look for when profiling it.
-
Timur Doumler
ROLI Ltd.
JUCE Senior Software Engineer
London, UK
Timur Doumler is Senior Software Developer at London-based technology company ROLI. He is currently working on JUCE, the leading cross-platform framework for creating audio applications used by hundreds of companies in the audio industry. After five years of writing high-performance code in Fortran, C, and C++ for numerical simulations of the cosmic structure formation, Timur became committed to audio and music production software. Before joining ROLI, he worked on various projects at market-leading company Native Instruments, such as KONTAKT, the industry standard sampling platform used by the majority of music producers and composers for film score, games, and contemporary popular music. Timur holds a PhD in astrophysics and is passionate about well-written code, modern C++ techniques, science-fiction, learning languages, and progressive rock music.
-
Videos Filmed & Edited by Bash Films: www.BashFilms.com
*-----*
Register Now For CppCon 2022: cppcon.org/registration/
*-----*

Пікірлер: 119

  • @digitalconsciousness
    @digitalconsciousness2 жыл бұрын

    This accessing of L1 cache is the reason game programmers are talking about Data Oriented Design (DOD) in the past few years. It is expensive to send an entire C++ object to L1 cache just to update an x and a y coordinate. You cannot put 100,000 ships on screen quickly. With DOD however, all the x's for all the ships are in a single array. Doing it this way, this array gets sent to the cache, and you rip through all 100,000 x values, then you do the same for the y array. Insanely fast.

  • @jeeperscreeperson8480

    @jeeperscreeperson8480

    Жыл бұрын

    Game programmers talk about dod/dop because this style of programming reduces software complexity. And the simpler the software, i.e. the less generic it is, the faster it is generally. An array of floats is simpler than a vector of objects. Dereferencing a float pointer is simpler than calling a getter. Loading 4 floats from an array into an xmm register is simpler than gathering those floats from 4 separate objects, etc.

  • @dd-qz2rh

    @dd-qz2rh

    11 ай бұрын

    @@jeeperscreeperson8480 it seems like you are mixing the software complexity definition which is mostly used in order to describe the complexity of a source code as perceived by humans reading said code and "simple" code that generates fewer instructions/those instructions are fast(er than non-dod code). DOD code can be much harder to read/write by an average programmer, it's complexity actually increases because a lot of stuff is not hidden by some abstractions anymore + you control and think about much more stuff than simply putting together some logic and abstracting it away via some interface/design pattern/etc.. And then what you call "simple" code is in fact just "fast" and "optimized" code, that is what has already been mentioned in the previous comment.

  • @J.D-g8.1

    @J.D-g8.1

    10 күн бұрын

    For new programmeres playing around and eventually wanting to program a game of some sort; you often end up wanting something to represent a world map or game map, etc, and often this is supposed to be tiles; from a simple pacman to Warcraft. When you make that map, you think you need a 2D array instead of a std::map which is slower. That 2D array is easy to implement as a class where you actually use a 1D array, and get each "tile" by the formula Width + (MaxWidth * Height) That way you just implement the array as f.ex char WorldMap[MaxWidth*MaxHeight] This is just a very simple implementation, you can of course do the same with a vector, and a 1D vector is faster than a 2D vector. The compiler will probably turn a 2d array like int WorldMap[100][100] into a 1D array anyway. And just use the same CurrentWidth + (MaxWidth * CurrentHeight) to index the array anyway.

  • @teranyan
    @teranyan2 жыл бұрын

    I once got rejected at a job interview because I told them exactly that thing you pointed out about the list/trees, that if you must use it at least it can be accelerated with a custom allocator (preallocating nodes) depending on your use case and if cache misses are a bottleneck. Apparently their "senior" programmer guy didn't agree with that one

  • @puppergump4117

    @puppergump4117

    2 жыл бұрын

    Anyone who listens to the people next to them without doing research are dumb.

  • @heavygm9623

    @heavygm9623

    2 жыл бұрын

    lol

  • @ShaunYCheng

    @ShaunYCheng

    Жыл бұрын

    What company was that?

  • @samhadi7972

    @samhadi7972

    9 ай бұрын

    Ya when you talk like that to your everyday devs they think you are from a different universe or making stuff up lol

  • @KilgoreTroutAsf
    @KilgoreTroutAsf5 жыл бұрын

    No1 tip for high performance: MEASURE EVERYTHING. It doesnt matter how strongly you THINK something is better. The technology stack is complex enough to interact in unforseeable ways.

  • @aorusaki

    @aorusaki

    4 жыл бұрын

    You want both. Especially since some things are hard to debug/measure accurately being able to deterministcally understand SOME part of whats going on will give u an invaluable heuristic that statistics alone can't grant. (Ie, being able to look at assembly code with a datasheet with ur microcontroller will give u veeerry exact performancr characterisitcs of X instruction over Y))

  • @aorusaki

    @aorusaki

    4 жыл бұрын

    In addition, its STILL importsnt to understand WHYYY x or y is slower than ghe other so you can work around it and or get hardware vendors to improve

  • @seditt5146

    @seditt5146

    4 жыл бұрын

    Dude you have no idea how right you are. I have been trying my hardest to benchmark multithreaded code and get accurate results esp comparing it against linear algorithms... Due to compiler optimizations, CPU optimizations, the overall difficulty of Benchmarking MT code in the first place this entire ordeal has become a nightmare and no matter what results I get and how close they are to what I believe they should be there is still that little voice in my head telling me I just can not trust the output of the Measurements since there is just so damn much going on under the hood idk what to believe about it all.

  • @ChrisM541

    @ChrisM541

    2 жыл бұрын

    You NEED to have significant expertise in assembly language. No point in having that debugger if you don't FULLY understand that core component...!!!

  • @samhadi7972

    @samhadi7972

    9 ай бұрын

    I wasted so many hours of my life optimising based on theory only for the result to end up being slower… definitely bench mark and do it often if you care for the results

  • @andreytimokhin936
    @andreytimokhin9364 жыл бұрын

    Well organized talk with excellent explanations of the reason causing performance degradation using easy to understand examples.Would be very helpful event for quite experienced developers

  • @NehadHirmiz
    @NehadHirmiz7 жыл бұрын

    Thank you very much. I really like your choice of presenting very intuitive examples.

  • @minastaros
    @minastaros3 жыл бұрын

    Quite dry stuff, and not really what I need for my daily work. But gave me a huge amount of new knowledge. Very well presented, really good examples, and in general a clear and entertaining talk. I have really enjoyed watching it!

  • @CppCon

    @CppCon

    3 жыл бұрын

    Glad it was helpful!

  • @user-zp3nd6ht8v
    @user-zp3nd6ht8v4 жыл бұрын

    Very intuitive example!Down to earth knowledge!

  • @TrueWodzu
    @TrueWodzu5 жыл бұрын

    Thank you, some stuff I knew, but never heard of denormals. Would have to check them out :)

  • @prafulsrivastava8942
    @prafulsrivastava89422 жыл бұрын

    This is exactly what I was looking for. Great work! Thanks for the info!

  • @CppCon

    @CppCon

    2 жыл бұрын

    Glad it was helpful!

  • @aorusaki
    @aorusaki4 жыл бұрын

    These conventions NEVER fail to provide useful knowledge and springboard ideas to help insoire newer developments and technologies!

  • @marec023
    @marec0234 жыл бұрын

    What a great talk, thanks!

  • @renzocoppola4664
    @renzocoppola46647 жыл бұрын

    He knows his stuff.

  • @suryatn
    @suryatn4 жыл бұрын

    Nice explanation with code samples and supporting data analysis.

  • @TheGokhansimsek35
    @TheGokhansimsek353 жыл бұрын

    Excellent talk, full of great information and tips!

  • @CppCon

    @CppCon

    3 жыл бұрын

    Glad it was helpful!

  • @Altekameraden79
    @Altekameraden798 ай бұрын

    I am just a mechanical engineer doing vibration and signal analysis, but this presentation was very informative, even for me barely starting out in C++ a few days ago.

  • @dvlduvall
    @dvlduvall5 жыл бұрын

    Nailed, great examples.

  • @ciCCapROSTi
    @ciCCapROSTi9 ай бұрын

    Good presentation, knew about all of it except the FPU stuff, but got way more useful details than I had.

  • @michaelmorris2300
    @michaelmorris23007 жыл бұрын

    Good talk. People need to remember that performance in general purpose computing will have its limits, where Fpgas would take could take over. There is only so much performance that can be squeezed out of general purpose computing notwithstanding the versatility and quick development costs of that paradigm. The talk was very good.

  • @danielnitzan8582
    @danielnitzan85823 жыл бұрын

    Excellent talk!

  • @deckarep
    @deckarep2 ай бұрын

    I know this talk is quite a few years old and it has some solid points to consider but I did want to comment on a few things. One of them is that he compares audio software engineering to game development saying that while game devs worry about code having to be performant at 60 fps, on the other hand audio software development has to worry about 44k sps (samples per second) and this is not a fair comparison. Yes, a game loop may need to run at 60 fps, but within the context of a single frame, many computations are being done on the CPU, and likely way more than a few hundred thousand data-points. The second thing: At one point he says that in general you want to structure your Array of structs next to each other for locality which is true in general but if you utilize the Struct of Arrays (SOA) paradigm, which I don't think he mentions at all you may likely get even better performance out of your code due to the memory being more tightly compacted and also due to not having to fetch certain objects for computations that you don't need.

  • @NekonyaNyaNya1
    @NekonyaNyaNya15 жыл бұрын

    Great stuff, at least for someone without degree in SC, thanks!

  • @sundareswaransenthilvel2759
    @sundareswaransenthilvel27595 жыл бұрын

    Thanks a lot! Nice explanation!

  • @coder2k
    @coder2k4 жыл бұрын

    Great talk! Thank you! :)

  • @Peregringlk
    @Peregringlk4 жыл бұрын

    15:30 hash for integers is usually the identity function which is a no-op.

  • @praveenalapati5234
    @praveenalapati52347 жыл бұрын

    Good explanation

  • @jaime7295
    @jaime729511 ай бұрын

    This was really good!!!

  • @ambushedraccoon6408
    @ambushedraccoon640811 ай бұрын

    Hello! Great content. Is there a link for a video mentioned at 19:31 ?

  • @dineshdevaraju6698
    @dineshdevaraju66986 жыл бұрын

    Awesome Thanks...

  • @kapyonik2424
    @kapyonik24244 жыл бұрын

    52:08 : here, I see a different reason than the one mentionned. One the left code, there is 4 different values accessed ( a[i] b[i] b[i+1] and c[i] ) while on the right version code, only 3 values are accessed ( b[i+1] c[i] and a[i+1] ) so is it really about vectorization ? I don't understand this part can someone explain please ?

  • @BGFutureBG

    @BGFutureBG

    4 жыл бұрын

    The same thing stood out to me initially, but I'm pretty confident it's insignificant here. b[i] and b[i+1] are extremely likely to be on the same cache line.

  • @qm3ster
    @qm3ster2 жыл бұрын

    Summary of what is most surprising: 1. cache associativity hole on round values 2. some modern cpus are good at unaligned access (I was sure we are moving away from that instead?) 3. false sharing

  • @Luix
    @Luix4 жыл бұрын

    Very interesting

  • @azadalishah2966
    @azadalishah29663 жыл бұрын

    🤔How will you find size of cache line [before using alignas(cachelineSize) ]?

  • @RajeshPandianM
    @RajeshPandianM Жыл бұрын

    At 53:20, is there a typo on left side code at line .. a[i] +=b[i+1] instead of a[i] +=b[i]?

  • @marktaylor6052
    @marktaylor60523 жыл бұрын

    33 mins 40 secs, I found this out myself trying to optimise a virtual machine. I thought aligning instructions and arguments on 4 or 8 byte boundaries would help. In the end I saw I was only getting a 1% increase in performance.

  • @KeithMakank3
    @KeithMakank32 жыл бұрын

    What are we supposed to do about problems that specifically require random traversal of arrays?

  • @Shardic
    @Shardic5 жыл бұрын

    I loved this talk. Never knew that a simple contiguous set of variables can be SO MUCH faster than randomly situated ones.

  • @masheroz
    @masheroz Жыл бұрын

    What is the talk he refers to at 49:45?

  • @sirnawaz
    @sirnawaz3 жыл бұрын

    The example at 56:20 is interesting. I didn't quite understand the alignment issues with the offset. Is it because of different offset for both buffers? If I use same offset, say `+2`, for both, would it still have the alignment issue?

  • @sirnawaz

    @sirnawaz

    2 жыл бұрын

    Watched this talk again a year later. I still have the same doubt. Haha.

  • @ChristianBrugger

    @ChristianBrugger

    Жыл бұрын

    The problem is that SIMD instructions only work for aligned data. If both have the same offsets the compiler can special case, the first 2, and then take 4 aligned each.

  • @jackofnotrades15
    @jackofnotrades154 жыл бұрын

    Just curious, Isnt this why data-oriented design is preferred? Seems like this gives a good example for Data oriented Design. Am I wrong??

  • @7Denial7
    @7Denial77 ай бұрын

    I think there's a mistake at 27:00. 8-way associative cache means there must be 8 blocks(cache lines) per set. So If we have 128 KB 8-way cache, 64 bytes per Block, this means 128 * 1024 / (64 * 8) = 256 sets, 512 bytes per set, 8 blocks per set. Not 8 sets 16KB each like you show in the presentation

  • @elliott8175
    @elliott81753 жыл бұрын

    To vectorise the loop (at 52:00) why not just do this: for (int i = 1; i a[i] += b[i] for (int i = 1; i b[i+1] += c[i] Not only does this get rid of dependencies, but it results in requiring 2 positions in the cache (for each of the arrays) rather than 3, so better overall practice to reduce crowding. Or does this actually screw up the SIMD vectorisation? I don't know much about it.

  • @ChrisM541

    @ChrisM541

    2 жыл бұрын

    Make certain you confirm - and fully understand - what the compiler is doing by critically examining the assembly version of your routine.

  • @Wayne-wo1wc

    @Wayne-wo1wc

    2 жыл бұрын

    This could take double the time. A CPU can sometimes fit two instructions in one cycle in the same core. Therefore it is better to do it in one loop.

  • @bencenagy2542
    @bencenagy2542 Жыл бұрын

    Actually, frame rendering has to provide much more samples than 24-60. You have to render screenwidth*screenheight samples. So for a 1080p resolution that is about 2*10e6 samples.

  • Жыл бұрын

    Maybe, but real-time sound programming is a little different. In a game, for example, you can render the same frame twice and nothing bad will happen. In sound, the membrane of a speaker is a physical object with kinetic energy, and if you send the same signal 2 samples in a row, that means the magnet should apply infinite braking power to stop it for a moment. It's like pulling the emergency brake on a locomotive, the listener will hear an audible click.

  • @seditt5146
    @seditt51464 жыл бұрын

    At 41:00 He stated no branch predictor can work with Random Numbers but I was wondering if they can detect patterns and the BP is capable of looking ahead why is it not possible for it to generate the Random number in advance and make predictions using that pregenerated Random number

  • @mrlithium69

    @mrlithium69

    4 жыл бұрын

    not an expert, but random is usally directly affected by the CPU clock as the source of entropy, so picking a number in advance is not the same number as picking it normally. Also picking extra random numbers and then throwing them away cause of any invalid branches would have an effect, some would be missing, and the following set of random numbers would not match the the original pattern of distribution.

  • @flutterwind7686

    @flutterwind7686

    4 жыл бұрын

    Depending on the CPU architecture, branch prediction can look ahead and precalculate based on different possibilities only up to a certain point before running out of cache. IF it invests entirely on a certain possibility, if that guess is wrong there is a huge penalty since it has to start calculating the branch all over again (from the branching point in code ex. if statement). This limits how far the branch predictor can look ahead when dealing with random numbers (as the number of possibility quickly inflate) in for example a nested if structure.

  • @AshishNegi1618
    @AshishNegi16184 жыл бұрын

    if `rand()` is inside the for loop, then `random loop on 2-d array` would be slow because of cost of `rand()` as well.. [ of course, random 2 d access will be slow.. but should not be that slow compared to column wise access ]

  • @teekwick
    @teekwick7 жыл бұрын

    Really great talk, tho next time maybe try to reduce the number of sub-topics or ask for more time to present them since it looked like you tried to rush in the second half to get everything out.

  • @ivailomanolov6908
    @ivailomanolov69083 жыл бұрын

    Can someone explain to me 54:19 Example please ? I have a hard time understanding it .

  • @antonfernando8409

    @antonfernando8409

    2 жыл бұрын

    Something to with: SIMD vectorisation

  • @severemaisjuste
    @severemaisjuste4 жыл бұрын

    Not considering any oversampling in either domain, isn't it more like ? gfx : 1920*1080*30 sfx : 44100 * 2 samples per second ?

  • @shinyhappyrem8728

    @shinyhappyrem8728

    4 жыл бұрын

    Why 30?

  • @jamessandham1941

    @jamessandham1941

    4 жыл бұрын

    Ya I don't really know why he made that statement. Kind of needlessly provocative IMHO. Its even more skewed in games than you indicate because in games you are not just sending colors to the screen but also dealing with physics systems, multiplayer, tons of custom tools, fast compile times, and a general complexity of game play that is only limited by a designers creativity, etc. Otherwise good talk though.

  • @TernaryM01

    @TernaryM01

    4 жыл бұрын

    It's harder to write a program that does some computation in 1/44100 seconds than 44100 such computations in 1 second. It's about bandwidth vs latency. His point is that not only do you need to make computations as efficient as possible, but they also have to have low latency. You don't worry so much about latency in programming a game renderer, and the GPU itself is not optimized for latency, unlike the CPU. Of course he is not an idiot who thinks that the computation he's doing in audio is 200 times faster than game engines. Game engines are written in C++ and C, not Python.

  • @BGFutureBG

    @BGFutureBG

    4 жыл бұрын

    I too think processing 60 frames (or even 250 with modern monitors these days) in a video game, is way way way more expensive than sampling 44kHz audio...

  • @teranyan

    @teranyan

    2 жыл бұрын

    The CPU (which you use to process audio) usually has 8 strong cores while modern GPUs have 3000+ "stupid" cores. It's not the same type of sample and you can't really compare that.

  • @stevenclark2188
    @stevenclark21883 жыл бұрын

    I suspect in a case with only a few columns, column major might even be faster, because it could use a cache line for each column thus using more of the cache. If you don't line them up so they step on each other's cache slots of course. Most of this is covered in that computer architecture course of a CS bachelors.

  • @childhood1888
    @childhood18882 жыл бұрын

    Perfect talk except for the ums

  • @quoc-minhton-that6391
    @quoc-minhton-that63915 жыл бұрын

    Can someone explain why at 56:14 , calling multiplyAdd with the array pointers offset cancels vectorisation? What data is fetched into the SSE register and how? Thank you.

  • @CBaggers

    @CBaggers

    5 жыл бұрын

    simd expects the data that will be loaded to be aligned a certain way. By offsetting slightly the compiler could either 1. use unaligned simd load instructions if the cpu supports them or 2. fall back to a non vectorized loop. On older x86 cpus unaligned loads were siginificantly slower but nowerdays it's not so bad. So option 1 may be significantly slower and 2 (almost) certainly will be.

  • @quoc-minhton-that6391

    @quoc-minhton-that6391

    5 жыл бұрын

    @@CBaggers Thank you!!

  • @TheOnlyAndreySotnikov
    @TheOnlyAndreySotnikov3 жыл бұрын

    16:03, not true. VTune shows you "these things".

  • @DanielJamesCollier
    @DanielJamesCollier7 жыл бұрын

    woudn't it be much better if you generated the random numbers before,

  • @DanielJamesCollier

    @DanielJamesCollier

    7 жыл бұрын

    talking about 21:40

  • @dascandy
    @dascandy7 жыл бұрын

    You should've included the SSD / harddisk in your memory speed / latency discussion.

  • @valetprivet2232

    @valetprivet2232

    7 жыл бұрын

    what for? If you interacting with hard disk you know it will be slow

  • @Slithermotion

    @Slithermotion

    5 жыл бұрын

    I'm angry too, still wondering if a floppy disk is faster then internal RAM...haha xD But I think it's common knowledge at this point, if you are going in such detail with coding performance then this should be common knowledge. If you are watching a video where a guy explaines that getting data from L1 or L2 cache is time critical you better already know that ROM Memory is slow as fuck in that dimension.

  • @flutterwind7686

    @flutterwind7686

    4 жыл бұрын

    That's a rabbit hole, he can only cover the basics given the time. HashMaps -> Self Balancing Trees -> BTrees -> Esoteric Data Structures to model databases (improve disk read performance when searching) take up an entire lecture on their own.

  • @Vermilicious
    @Vermilicious2 жыл бұрын

    Cache lines, eh. Thanks!

  • @elbozo5723
    @elbozo57232 жыл бұрын

    “want fast python? well tough shit”

  • @avashurov
    @avashurov3 жыл бұрын

    I hope you’re not benchmarking memory random access speed with rand function modulus operator on every access at 20:44 🤦‍♂️

  • @iamlalala1995

    @iamlalala1995

    2 жыл бұрын

    Lol yeah i was thinking that

  • @expurple

    @expurple

    2 жыл бұрын

    So, the correct thing to do is to add that rand()%n calculation into the other two loops as well? Or maybe precompute these random indexes?

  • 2 жыл бұрын

    Of course replacing %m with &(m-1) is much faster if m is power of 2 and if m-1 comes from const variable) but he might have kept it for readability instead.

  • @dat_21

    @dat_21

    Жыл бұрын

    Yeah. Easily 100 unnecessary cycles just from these two.

  • @aritzh
    @aritzh4 жыл бұрын

    In the example at 20:44, wouldn't the rand() be very expensive, since it (at least on linux) uses a system call, with the context switch and everything?

  • @TheElectromatter

    @TheElectromatter

    4 жыл бұрын

    It is not a syscall, its a lcg in glibc

  • @blanamaxima
    @blanamaxima4 жыл бұрын

    Well hard because some are really bad programers.

  • @KeithMakank3
    @KeithMakank32 жыл бұрын

    im not sure knowing hardware helps us tremendously, we are bound by data access times whatever we do, and if its significant the impact it means hardware is slow, until we actually design hardware its not our problem.

  • @dupersuper1000

    @dupersuper1000

    2 жыл бұрын

    It’s really only “our problem” when high performance is a business requirement. The people who really stress out about this stuff typically work in gaming, high-frequency trading, low-level networking, etc. The other thing to point out is that we are hitting the physical limits of processor speed and transmission of data between caches. It’s no longer possible for us to just “wait until hardware gets faster.” Either you figure out how to optimize your code and parallelize what you can on multiple threads, or you just eat dirt until some unforeseeable scientific breakthrough allows us to make faster hardware. Those are your options.

  • @qqqqqqqqqqqqqqq67

    @qqqqqqqqqqqqqqq67

    Жыл бұрын

    Dude, do you think slack running slow on ryzens is because the hardware is slow?

  • @zhulikkulik

    @zhulikkulik

    Жыл бұрын

    You can't make one thing fit all purposes so it is “our” problem. You can't bend physics and force electricity to run faster, but you can bend your approach to data and code and make it easier for the computer to do stuff. If someone managed to send Voyager to space in 1977 and it still operates to this day - we definitely can make a videogame/social network app/whatever to run on a hardware that's a zillion times faster than what people from NASA had 40+ years ago.

  • @Antagon666
    @Antagon666 Жыл бұрын

    The comparison at the beginning is kind of lame... You don't render just 60 frames per second, those frames often consist of 2 million or more samples. In comparison 44000 samples is nearly nothing, even slowest of CPUs can emulate that in real time nowadays.

  • @wpavada3247
    @wpavada32477 жыл бұрын

    so we have another thing to worry about while coding in c++, i sometimes believe it is getting too over complicated!

  • @NikoKauppi

    @NikoKauppi

    7 жыл бұрын

    It applies to both C and C++. To some extent other languages too depending on how they handle things internally. And this is something that has been present for years already. For as long as you keep your data aligned and your array accesses linear you should be good to go. But it's good to know about possible pitfalls in case you need to squeeze some performance out of some of your programs.

  • @wpavada3247

    @wpavada3247

    7 жыл бұрын

    Can you offer me a job?!

  • 7 жыл бұрын

    Wp Avada nobody said it was gonna be easy 😀 just remember that C++ let's you work efficiently, but other higher level languages can't provide that

  • @ChrisM541
    @ChrisM5412 жыл бұрын

    37:24 It's incredibly shocking how, in the relatively short space of around 30 years, the vast majority of today's programmers - even C++ programmers - have bypassed so much critical expertise in understanding assembly language. I mean it's truly...almost criminally shocking, especially given that... 1) Assembly language is at the absolute core of C++ debugging and, as demonstrated in this video 2) Different compilers can frequently produce different machine code, running at different speeds Higher level languages may well have a more valid reason, but C/C++ programmers have - literally - no excuse.

  • @abhinavraghunandankannan3546
    @abhinavraghunandankannan35467 жыл бұрын

    The material covered is decent but the way things are explained is quite bad ! It will be safe to say that the people who understand this talk would already have a decent idea of the content.

  • @ahmeterdem9312

    @ahmeterdem9312

    7 жыл бұрын

    I was gonna disagree with you but when I watched until the end, I now agree. I think the purpose of the talk is more like raising some questions and curiosity into actual hardware that your c++ code running.

  • @Boneamps

    @Boneamps

    7 жыл бұрын

    I actually thought this talk was really good. It makes you remember a that if you want to do real proper software development of high performance applications, you MUST know about computer architecture. And this guy Timur seems to really know his stuff. For some reason, SIMD and VLIW and all that fancy stuff that processors do nowadays are quite ignored in the community. But the bottom line is, yeah this talk is not for explaining to someone who is learning. The people who attend this talk are industry experts, who probably studied all of this things but don't have the years of hand on experience in the particular topics they are listening in the talks.

  • @rg3412
    @rg34126 жыл бұрын

    hmm counter is exploding. Speaker should take a Toastmaster course, that would make his presentation clearer and more engaging.

  • @mariogalindoq
    @mariogalindoq5 жыл бұрын

    Good content but I feel very distracted with hammmm aaaaaa before each sentence. He really needs to learn how to speak in public. His speech is so stressing so I can't understand a lot of sentences. Aaaam.

  • @Superlokkus5
    @Superlokkus55 жыл бұрын

    "You found books from computer science", yeah the fact he didn't know about and also didn't say "computer architecture" suggest he never attempted to learn a bit of formal computer science. "C++ books never talk about this stuff" LOL are you kidding me? *Skipping thru* Oh wow branch prediction, data alignment, array access i.e. cache locality, false sharing I never heard of it... NOT This is the nth average talk with stuff already heard dozen times.

  • @gamingsportz3390

    @gamingsportz3390

    5 жыл бұрын

    Did all of this in my high performance class at my university

  • @aorusaki

    @aorusaki

    4 жыл бұрын

    Well some of us dont know and now we do.