cpponsea.uk
We use them every day, but how often do we stop to think about the kinds of amazing things our compilers do for us? Modern compilers are a feat of engineering and in this talk Matt will demonstrate just a few of the very cunning things they do for you.
Matt will concentrate on the output of the compiler: the tricks they use to generate efficient, optimized assembler code.
Writing clear, readable code that's also efficient hinges on being able to trust your compiler's code generator. By the end of this talk, you'll be be able to read assembly well enough to be able to appreciate your compiler, and have an understanding of what it can - and can't - optimize for you.
---
Matt is a C++ programmer and occasional verb. He loves writing efficient code and sharing his passion about how computers work under the hood. An engineer at Coinbase, he has previously worked at a trading firm, on mobile apps at Google, run a C++ tools company and spent more than a decade making console games. When not tinkering on Compiler Explorer, Matt enjoys working on emulators for old 8-bit computer hardware.
cpponsea.uk/2019/sessions/key...
Filmed and Edited by Digital Medium Ltd: events.digital-medium.co.uk
Enquiries: events@digital-medium.co.uk

Пікірлер: 61

@Wyvernnnn3 жыл бұрын
Go on, funny verb man, blow my mind.
@seditt51463 жыл бұрын
"Oh Golly I'm running out of time"- Dude, never enough time for ya, great speaker no doubt up there with Herb, Chandler and Andrea as well as some others.
@KabelkowyJoe3 жыл бұрын
17:00 Not only one instruction, LEA is using AGU instead of ALU unit for its calculations, when two instructions LEA and ADD are close each other both will be done same time as if it was 0.5 instruction for each. Of course if (and this is common) CPU have 3-4 ALU units 3-4 ADD, SUB will run same time (if independent) but if AGU used you can increase this pararelism up to 8
@KabelkowyJoe
3 жыл бұрын
18:30 Reason why GCC still using LEA even for multiply cause it will be done in just one cycle, without LEA it would be faster to use SHL still than multiply witch takes 4 cycles usualy or SHR instead division witch uses 12 cycles usually
@BryonLape3 жыл бұрын
I learned enough in my compiler class in university 30 years ago to know I didn't ever want to make one.
@aoeu256
3 жыл бұрын
The easiest compilers to make are Lisp macros if you think about it...
@colinmaharaj4 жыл бұрын
I am in fact, using this to help write an interpreter.
@logiciananimal4 жыл бұрын
What a wonderful website and introduction! Thanks for the resource.
@bzboii Жыл бұрын
52:34 “GCC has something called speculative virtualization…” **crowd member giggles** “yeah!”
@movax20h4 жыл бұрын
Actually at 58:36, compiler is perfectly free to replace this implementation of bogo-sort with call to qsort or something. Programs that don't terminate are ill formed. So compiler can assume there will be a progress, and the function will finish. There is no other observable state (mt is a local, and there is no other global state touched). So it must terminate and exit to ensure progress. So it knows that there is no other side effects, and on the exit the table is sorted. So sort it. :) It is kind of allowed optimization, but no sane compiler will actually do it, because it is essentially impossible to deduce the above logic by looking at this code, including stability of sort used, or which version is actually faster (it might depend on the size of the array possibly, and even the data values in the array itself).
@Alexander-yk
4 жыл бұрын
This is partly why we need (more-)high-level concepts. They will allow to distinguish RNG from PRNG (which is handy for crypto) and qualify what is mean for array to pass `is_sorted`. With this replacing bogosort with `std::sort` is reasonable. So, yeah. We can make a lot of fantastic decisions and optmizations on compiler front end.
@EllipticGeometry
4 жыл бұрын
Programs that don’t terminate are perfectly legal. But suppose for the sake of argument that not terminating is undefined. That would just lead to the principle of explosion, of which the desired outcome is an infinitesimal subset. To be more explicit in a way that still incorporates some of the usual logic, why wouldn’t is_sorted be considered the culprit? Forcing it to be true in a case where the MT shuffle would never hit the right permutation is no more ridiculous than magically sorting the list (or setting your aunt’s hair on fire while still not terminating) in that same case. A theoretical super smart compiler might recognize that the defined case always results in a sorted list, and a simple way to omit the undefined case is to run the sorting algorithm there as well. Maybe an even smarter compiler could optimize the sort based on some property of bogosortable lists, and make even a charitable interpretation of undefined behavior break things again. Just don’t ever go there.
@styleisaweapon
7 ай бұрын
That highlights the two principled kinds of optimization: those that transform, and those that calculate (or deduce), cache, and leverage runtime results at compile time - the second kind is always problematic at the edge cases because the compiler is dictating that it has perfect information when it in practice doesnt (it is assuming the state of runtime memory within an abstract machine that has multiple actors that can change it) - there are entire keywords dedicated to making the second type of optimization less problematic but they dont solve the problem they only disable those problematic optimizations (ex: volatile)@@EllipticGeometry
@apojoga4 жыл бұрын
Great talk.
@mockingbird38093 жыл бұрын
3:21 is really a good joke :D
@Badcrow77134 жыл бұрын
His name is GODBOLT
@xrafter3 жыл бұрын
Assembly is scary? Nah x86 is .
@ominous-omnipresent-they
3 жыл бұрын
Your avatar reminded me of how much I've been wanting to try Arch Linux on my desktop. Currently running openSUSE Tumbleweed on my laptop, and honestly, I freaking love it.
@xrafter
3 жыл бұрын
@@ominous-omnipresent-they Happy to hear that .
@jamesflames69874 жыл бұрын
Wow!
@jonathangriffin34864 жыл бұрын
Can one do fixed point arithmetic with 256 bit registers, I ended up using fixed point for a ray-trace sqrt algorithm for 32 bit code in masm. Probably around 2007.
@horowitzhill64803 жыл бұрын
28:45 he meant to say 256 bit wide registers, right? 32 bytes are 256 bits
@proxy10354 жыл бұрын
"Compilers are awesome" me writing mostly with the 65C02: :(
@rabidbigdog
4 жыл бұрын
But the 65C02 is awesome.
@baruchben-david4196
4 жыл бұрын
@@rabidbigdog My first CPU.
@xrafter
3 жыл бұрын
Assembly language is language
@BryonLape3 жыл бұрын
All this time I thought int was defined as the machine word size. I guess not.
@adrianomachado56293 жыл бұрын
we should have a T-shirt: Compilers are awesome
@RachelMant4 жыл бұрын
github.com/bad-alloc-heavy-industries/substrate/blob/master/substrate/fd#L192 - my answer to the insanity that is "just use the netcode library to conditionally 'fix' the endian-ness of your data once it's already too late".. and the great thing is, this compiles out entirely on a CPU with matching endian.. and compiles to an efficient swap instruction otherwise
@Ryan-xq3kl4 жыл бұрын
Terry Davis has entered the chat.
@RealDukeOfEarl
4 жыл бұрын
RIP Terry, those glowies can't get to you now.
@xrafter
3 жыл бұрын
Who is terry ?
@lucianoosinaga2980
3 жыл бұрын
@@xrafter best programmer to ever live
@xrafter
3 жыл бұрын
@@lucianoosinaga2980 Wow.
@Yupppi7 ай бұрын
I've been listening to these talks by Kevlin while trying to learn Rust, and I'm just thinking trying to write in Rust is like playing that crazy create password online game where any time you satisfy a condition, you get a new requirement until it agrees to compile. So basically at the end you're putting out fires protecting the egg and losing the workings of your already existing code and it never compiles. Especially if you're trying to combine features.
@ThePlacehole3 жыл бұрын
Yeah, compile-time speculative inlining of virtual functions sounds cool and all... Until you compare it to any JIT ever.
@loli42
3 жыл бұрын
yeah, compiling all your code on the user's machine at program startup under extreme time constraints using a 100 megabyte runtime that ships with your program just so you can inline virtual functions 25% more frequently sounds cool and all... until you compare it with something that generates better code in every other case, once, ahead of time, and doesn't require a runtime.
@ThePlacehole
3 жыл бұрын
That does not sound cool to me, but to each their own I guess... But, just talk me through your reasoning here. So you have a program which needs to respond quickly, and you've picked a JIT which compiles on startup, but then you have also decided to start it up every time you need it? Why? Either way, you can't fanboy over C++ in those "extreme" situation either, because that is C territory.
@loli42
3 жыл бұрын
@@ThePlacehole what are you even saying? my entire comment was sarcastic. i \*don't\* like JIT. i think it's a stupid idea. how does it follow that if you want your program to "respond quickly" that you would use a JIT? also yes, every time the program itself STARTS, the JIT has to recompile it. unless .NET or whomever has some stupid bytecode cache thing. i wouldn't know, i'm not a pajeet, and i'm most certainly not "fanboying" over sepples. my profile picture is literally yuki nagato holding up the k&r book.
@ThePlacehole
3 жыл бұрын
@@loli42 I am trying to say that your hypothetical situation does not really detract from JITs, because using a JIT in that way is retarded. Also that the only place where technologies have no negatives is the Magical Fanboy-land.
@loli42
3 жыл бұрын
@@ThePlacehole using a JIT in what way?? the only way to use a JIT is to compile just-in-time, i.e. when the program starts up! in your hypothetical situation, does a JIT just never fucking compile the program? you know it takes TIME TO COMPILE A PROGRAM RIGHT??? it's like i'm talking to a fucking serial killer.
@aoeu2563 жыл бұрын
Lol Haskell is actually closer to what the CPU does than C with out-of-order execution.
@raymundhofmann76615 жыл бұрын
Wouldn't it be more appropriate if the CPU itself does most vectorization? With ever changing architectures and instruction set inflation it becomes increasingly pointless to target a specific subset. Also we are stuck since over 10 years in the time of saturating single core performance, it is time for the CPU itself to be able to auto-vectorize and auto-paralellize to give more performance. Maybe this job of mapping the code to the available hardware of GPU, CPU cores, co-processor, etc. will be pioneered in a virtual machine like WASM before a CPU manufacturer manages to deliver something.
@compuholic82
4 жыл бұрын
I am not sure that is the way to go. (1) It is really hard to do in hardware and it would mean that you would need to devote lots of transistors on the chip to this kind of optimization. And that means that CPUs would become even more inefficient than they are now. Even today only a tiny fraction of the transistors on the CPU are dedicated to the actual computation. Pretty much all transistors are used to keep the processor pipeline moving: caches and cache-management, out-of-order execution, speculative execution and stuff like that. (2) Such optimizations cross the boundary to algorithms. IMHO Software should not rely on the hardware to fix stupid algorithmic design. There is another way to stay machine-independent while still using machine-dependent optimizations: Acceleration libraries. I am thinking about libraries like Intel IPP, MKL. You can write architecture-independent code and at program startup the libraries will automatically detect the CPUs capabilities and select the appropiate implementation.
@edcatchpole4737
4 жыл бұрын
@@compuholic82 Don't forget about ISPC!
@ped7g
4 жыл бұрын
One of the obstacles is, that the original C++ source target C++ abstract machine. After you compile it to the machine code of some target architecture like x64, you are throwing away all the original constraints and contracts defined by the C++ source and introducing the new set of constraints and contracts defined by the target-architecture. The machine code constraints are of course result-wise compatible with the original constraints to ensure that the result of machine code is correct under the C++ abstract machine considerations, but you can't reconstruct the original set of constraints from the machine code, you would end with lot more complex (more strict) requirements. But the CPU does see only the machine code, so it would have to vectorize against the lot more harsh contract, often ending with less optimal code than the compiler could produce, as compiler understands better the original intent of programmer (I mean the intent encoded in the C++ source, not the true intent of programmer, that will be revealed later over multiple debugging sessions after the programmer will realize how to write his original intent correctly in C++).
@raymundhofmann7661
4 жыл бұрын
@@compuholic82 I didn't say it has to be done with specific CPU logic circuitry, just by the CPU from the code perspective. Doing it with hardwired logic would be a rather stupid idea for such a complicated and constantly changing problem. I implied by mentioning virtual machines that it might just be that, a virtual machine running software doing the machine specific mapping of a abstract machine to the actual computing resources. The CPU HW only has to allow for this virtualization, as it is already to some degree, the rest is SW. And the computation for this can be partly pooled, put into a cloud, as there is significant homogeneity in computational resource architectures.
@mattr22384 жыл бұрын
Friends don't let friends use Windows for serious work.
@_DATA_EXPUNGED_4 жыл бұрын
The laughter is super creepy
4 жыл бұрын
-funsafe-laughter
@mohammedj2941
3 жыл бұрын
it's not, WTH
@trisinogy4 жыл бұрын
Big congrats for the hideous audio level. The guy is perhaps a C++ guru, but he doesn't know how to speak properly
@trisinogy
4 жыл бұрын
Peter Mortensen first and foremost: either the mic or the audio editing of the video are far from being ok. The “volume” (i.e. the level) is far too low. Secondly, the speaker occasionally bursts into loud sentences or laughter and then all of a sudden starts murmuring something to himself as if he wasn’t addressing an audience. That’s extremely disturbing and spoils the otherwise enjoyable and insightful content. I spent most of the time riding the volume control.
@Dorumin
4 жыл бұрын
I listened to most of it with half my attention, but honestly the sound is fine for me, granted I usually listen to things at a comfortable mild level, never loud, so the only times I complain about sound is when something is too low to be actually discernable
@trisinogy
4 жыл бұрын
Doru Min very glad it works for you. You clearly have different standards when it comes to audio quality.
@magfal
4 жыл бұрын
@@Dorumin The audio was fine for me as well.
@alexismandelias
4 жыл бұрын
I see what you mean, I wouldn't necessarily complain but it is indeed an inconvenience

KEYNOTE: What Everyone Should Know About How Amazing Compilers Are - Matt Godbolt [C++ on Sea 2019]

Ғылым және технология

Пікірлер: 61

@KabelkowyJoe

3 жыл бұрын

@aoeu256

3 жыл бұрын

@Alexander-yk

4 жыл бұрын

@EllipticGeometry

4 жыл бұрын

@styleisaweapon

7 ай бұрын

@ominous-omnipresent-they

3 жыл бұрын

@xrafter

3 жыл бұрын

@rabidbigdog

4 жыл бұрын

@baruchben-david4196

4 жыл бұрын

@xrafter

3 жыл бұрын

@RealDukeOfEarl

4 жыл бұрын

@xrafter

3 жыл бұрын

@lucianoosinaga2980

3 жыл бұрын

@xrafter

3 жыл бұрын

@loli42

3 жыл бұрын

@ThePlacehole

3 жыл бұрын

@loli42

3 жыл бұрын

@ThePlacehole

3 жыл бұрын

@loli42

3 жыл бұрын

@compuholic82

4 жыл бұрын

@edcatchpole4737

4 жыл бұрын

@ped7g

4 жыл бұрын

@raymundhofmann7661

4 жыл бұрын

4 жыл бұрын

@mohammedj2941

3 жыл бұрын

@trisinogy

4 жыл бұрын

@Dorumin

4 жыл бұрын

@trisinogy

4 жыл бұрын

@magfal

4 жыл бұрын

@alexismandelias

4 жыл бұрын

Келесі