Rust multi-threading code review

Ғылым және технология

my 3d cellular automata simulation written in rust with multi threading was sub-optimal. Using benchmarks with criterion and the help from my community, I learnt what the bottlenecks of this application are.
Today I share the journey I went through to optimize my multi-threading code!
Let's talk about the code... ;)
original video 3D cellular automata video:
• 3D Cellular Automata -...
3D Cellular automata project links:
Source code: github.com/TanTanDev/3d_cellu...
IsseW gpu implementation: github.com/IsseW/cas
Dstaatz atomics implementation: github.com/dstaatz/3d_celluar...
BENCHMARKS I made for this video:
github.com/TanTanDev/cellular...
My discord group:
/ discord
Want to support me?
Patreon: / tantandev
Monero: 43Ktj1Bd4Nkaj4fdx6nPvBZkJewcPjxPB9nafnepM7SdGtcU6rhpxyLiV9w3k92rE1UqHTr4BNqe2ScsK1eEENvZDC3W1ur
Credits:
Music Prototype by Wintergatan Build Tracks
Music Valentine by Wintergatan
Music The Rocket by Wintergatan
Music Non-Euclidian Geometry by Punch deck
Music Brazilian Street Fight by Punch deck
Music See The Light by Punch deck
Wintergatan link: wintergatan.net/
License can be found on website
Punch deck link: / punch-deck
Punch deck license: creativecommons.org/licenses/...
#rust #multithreading #review

Пікірлер: 243

  • @ItsMeChillTyme
    @ItsMeChillTyme2 жыл бұрын

    Very valuable video. Some youtubers and bloggers keep on insisting that efficiency doesn't matter cause "new compooter phast" and have a considerable influence on new learners who get stressed and look to validate if what they are learning is worth it or not. And, at that point if they come across people like those, it could completely misdirect them from becoming better engineers. While what has been done here to compare is not a usual application, it goes to demonstrate how much efficiency matters and how all those little nanoseconds and microseconds add up quick.

  • @thoriqadillah7780

    @thoriqadillah7780

    2 жыл бұрын

    I bet the people you meant are javascript developer. No offense lol

  • @ItsMeChillTyme

    @ItsMeChillTyme

    2 жыл бұрын

    @@thoriqadillah7780 Mostly but I've seen it come from new developers for everything, even C++. Though JS and Python have it most since they are beginner friendly.

  • @nagoranerides3150

    @nagoranerides3150

    2 жыл бұрын

    @@thoriqadillah7780 I first saw the idea that efficiency isn't important because "new compooter phast" in the early '80s. The truth is that efficiency of coding is sometimes more important than efficiency of execution. I write 90% of my code in Perl, because I write a lot of small utility scripts for solving specific issues at work and quite often the code is single-use. Shaving a day off development and testing time is far more useful than even shaving an hour off run-time. If I have a task that will take weeks to run, then I write it in something fast so the return outweighs the time lost in coding in a more formal language. And something that requires realtime doesn't leave you much option.

  • @thoriqadillah7780

    @thoriqadillah7780

    2 жыл бұрын

    @@nagoranerides3150 but we're talking about game dev here. And the "javascript dev" is about frontend in web, or even mobile. I think the game dev part the need for speed is critical because we're dealing with customer pc. Remember "but can it run crysis?" meme? Yeah i remember that. And the frontend web dev i think speed is somewhat important, because we're dealing with user internet speed and data

  • @dealloc

    @dealloc

    2 жыл бұрын

    You have to set constraints on what hardware you aim to support, otherwise we'd never move forward. For example, Valve has done a tremendous job at creating an engine that is versatile across low and high-end hardware-and Source 2 is highly targeting VR applications, so speed and varied hardware support is of utmost importance-but it is also very limited in its capabilities when compared to other modern engines that may not be suitable for low-end hardware. Of course, that doesn't mean you should max out everything and only aim for high-end hardware, since that can also have drawbacks in and of itself.

  • @quantumdeveloper2733
    @quantumdeveloper27332 жыл бұрын

    For the boolean array you could also try putting it all into a single 32 bit integer, where each bit represents one value. The check would then be something like (rule & 1

  • @Tantandev

    @Tantandev

    2 жыл бұрын

    Solid idea! Faster or not... we still save some bytes haha

  • @anderdrache8504

    @anderdrache8504

    2 жыл бұрын

    The compiler can't do memory layout optimizations like that. As a rule of thumb: it can only reorder struct fields, nothing else.

  • @zyansheep

    @zyansheep

    2 жыл бұрын

    Or you could use bitvec :)

  • @nilstrieb

    @nilstrieb

    2 жыл бұрын

    It will be slower, not faster, since now there are several instructions to get the value. While the array will get slightly smaller, it doesn't really matter, since it's quite small anyways, so you won't get any benefit from the smaller cache usage. But it would be interesting to benchmark it

  • @quantumdeveloper2733

    @quantumdeveloper2733

    2 жыл бұрын

    ​@@nilstrieb Trying to prove you wrong I tried to verify my hypothesises by googling: Originally I thought that the processor couldn't adress anything smaller than a word and therefor would have to do bitshifts to adress a byte anyways, but that doesn't seem to be the case for x86 anymore. And I also thought that my variant would allow the compiler to do more SIMD optimization because there was no SIMD memory lookup, but in avx2 there is an instruction for that as well. So I guess on modern hardware you are right. I also made a benchmark to be sure, and indeed on my avx2-capable desktop pc I get a 15% performance decrease. But on the raspberry pi 3b, which has an ARM processor, I get a 30% performance increase. It might be interesting to check more modern ARM processors as well, but I don't own one.

  • @Leonardo-nf5jc
    @Leonardo-nf5jc2 жыл бұрын

    This video reminded me the mantra "make it work, make it right, make it fast". As always interesting and entertaining content, thank you.

  • @Leonardo-nf5jc

    @Leonardo-nf5jc

    2 жыл бұрын

    @JM Coulon you may have not implemented error handling for example, it work but it's not the right way to code

  • @demetriuslewis6750
    @demetriuslewis67502 жыл бұрын

    This is an amazing video and I'm super glad you opened up about the issues that were made and how it was fixed. Super healthy for the community and understanding programming in general.

  • @pigworts2
    @pigworts22 жыл бұрын

    The nested vector is not as messy as you made it look. For initializing you can do: vec![vec![vec![0u8; SIZE]; SIZE]; SIZE]. For indexing, you can do it just like the array: v[0][4][3]. The flat vector is definitely a better idea tho.

  • @biglexica7339

    @biglexica7339

    Жыл бұрын

    even if it was messy, just write a macro

  • @shambhav9534

    @shambhav9534

    Жыл бұрын

    Yep, a flat vector would be better. I think that's how it is intended to be used.

  • @erikwg3814

    @erikwg3814

    Жыл бұрын

    Ye, especially the .get(0).unwrap().get(4).unwrap() and so on felt quite dishonest. If you aren't doing any checking anyway there is absolutely no point in using the get function

  • @jesperhustad
    @jesperhustad2 жыл бұрын

    This is rare great content as most KZreadrs only cover the fronted. Thanks for this great resource on low level performant code.

  • @PenguinjitsuX
    @PenguinjitsuX2 жыл бұрын

    It's just so interesting to see you explain and talk about different concepts! I love these videos.

  • @andrewgr144
    @andrewgr144 Жыл бұрын

    Mad props for publicly sharing your mistakes for the rest of us to learn from. It speaks to your depth of character. Thank you!

  • @elliotf3029
    @elliotf30292 жыл бұрын

    You've quickly become one of my favorite Rust centered channels. Well done!

  • @niomeda
    @niomeda2 жыл бұрын

    That's why i love coding. No matter how good you think you've made something, it's always room for improvement and learning new things.

  • @danielmitre
    @danielmitre2 жыл бұрын

    You can count the number of neighboors with convolution. With 2D it would mean to slide a 3x3 matrix [[1, 1, 1], [1, 0, 1], [1, 1, 1]] in every position the cell matrix, multiplying the values in the same position, sum them and put in the center position. There are very fast algorithms to run this on CPU and even faster on GPU

  • @brdrnda3805

    @brdrnda3805

    2 жыл бұрын

    Can you please elaborate a little bit more? I really would like to understand.

  • @RottenFishbone

    @RottenFishbone

    Жыл бұрын

    @@brdrnda3805 Convolution is a pretty common technique for running some kind of matrix-defined algorithm in parallel on a large dataset, usually images. Each value of the matrix is basically an offset of a target cell holding a value to multiply by. So, given that the matrix is 3x3, the top row of the matrix would multiply [above-left, above, above-right] by 1's and sum them. The middle row would be [left, self, right], the value of 'self' is ignored, because the matrix is 0 at that position. The end result is you add 1 for each active neighbour i.e. sum of neighbours. You do this on each cell to build a new 2D array with the value of each cell being the sum of its neighbours. The benefit of convolution is that no cell's calculation depends on the calculation of any other cell. As such, you can have a kernel executing on each cell in parallel, potentially ALL at the same time. This is trivial to do in GPU API like CUDA or Vulkan.

  • @trash_aly
    @trash_aly2 жыл бұрын

    This is my favourite video of yours! Diving into code, showing alternatives, and explaining exactly why one is better for a certain goal. That's such a great way to actually learn and level up as a programmer!

  • @m4rt_
    @m4rt_2 жыл бұрын

    0:00 Eyyy It's me!

  • @DrIngo1980
    @DrIngo19802 жыл бұрын

    This was a great video. Great at explaining how one should go about optimizations and performance tuning. A bit light on how to actual measure performance, but hey, that could be another video 🙂 Anyway, the message you wanted to convey is pretty clear and that's cool. It's also really cool how you handle feedback it seems. I suck at handling feedback, especially feedback that criticizes my code. I need to become a lot better at that.... Again, thanks for a great video!

  • @blitzventium7237
    @blitzventium72372 жыл бұрын

    Really loved that you took time to explain where you could have done better. Not a lot of youtube content where they dissect a cool project to look for optimizations. Great job:)

  • @OfficialMrRATpHace
    @OfficialMrRATpHace2 жыл бұрын

    Just found your channel about a week ago. Cube World was my jam, full support for this game of yours! Keep up the awesome work, you're keeping the hopes of many fans alive.

  • @scoringdigitsson.5194

    @scoringdigitsson.5194

    2 жыл бұрын

    checkout veloren

  • @Skeffles
    @Skeffles2 жыл бұрын

    Fantastic to see how much can be optimised! It seems like you've learnt some great tips here and I can't wait to see how it applies to your voxel game.

  • @soupnoodles
    @soupnoodles Жыл бұрын

    To actually see an influencer not get big headed and reject all possible advice by saying they don't need it, is really good. Usually you'd have so many people tell you how good you are that you'd stop caring enough for those who might help you improve. I think it's great that you're open to feedback and took it in the right way. Great stuff, easily one of my favourite channels

  • @hermannpaschulke1583
    @hermannpaschulke15832 жыл бұрын

    Hmm, I wonder how this would look with Rayon. For testing, I did a simple mandelbrot program, and then simply plugged in Rayon, and it scaled really well. Maybe a project for next weekend :D

  • @joshxwho

    @joshxwho

    2 жыл бұрын

    I love rayon, for my CLI apps I build it makes multiprocessing super easy.

  • @m.sierra5258

    @m.sierra5258

    2 жыл бұрын

    Mandelbrot is embarrassingly parallel though, it's a whole different story if you have a border exchange like in cellular automata. That's where the real difficulty of efficient multithreading begins :)

  • @SbF6H

    @SbF6H

    Жыл бұрын

    Rayon? That shit which is for toy programs? Sure.

  • @adamhenriksson6007

    @adamhenriksson6007

    Жыл бұрын

    Rayon feels kinda like using go func in go. So simple and essy

  • @Andrew90046zero

    @Andrew90046zero

    Жыл бұрын

    I've been trying to use Rayon to multithread another type of simulation I'm making. And I used it on one part to see if there was any improvement. But It seems like it only reduced the performance of the simulation! I think there is still a lot I have to figure out...

  • @micowata
    @micowata Жыл бұрын

    A good story with a proper lesson, you're a great storyteller Tantan! Thank you for the quality entertainment.

  • @KilgoreTroutAsf
    @KilgoreTroutAsf2 жыл бұрын

    6:11 2.05us = 2050ns , not 205000 ns, so it is a 7.5x improvement, which is still pretty worthy

  • @JackEnneking

    @JackEnneking

    3 ай бұрын

    ​@rishabhkhemu He's correct on hashmaps at 3:10 : 35 us = 35,000 ns.

  • @CountJamesMcGuigan
    @CountJamesMcGuigan Жыл бұрын

    This code review style video is a really insightful way of understanding code. More please.

  • @atalocke
    @atalocke2 жыл бұрын

    0:01 my typo will live forever! Great video! I read through the whole repo after the last one. Wish I could have contributed but I came after all the easy work was taken 😂. I’d love to see more videos like this! ❤

  • @DunceHouse
    @DunceHouse2 жыл бұрын

    I really enjoyed this breakdown. Also nice way of looking at the code with editing.

  • @michaelganzer3684
    @michaelganzer36842 жыл бұрын

    Let's summarize what I've seen: You had an idea and shared your approach on your YT channel. You managed to write functional code and you managed to deliver a result. Others (in your team, bc c'mon, we're all happy coder buddys) felt the urge of improving your well intended path to success with a more robust and reliable solution. I'm impressed to see you entertaining us with your learning, thank you for that! Because that story would otherwise be a fantastic and totally realistic representation of why 80% of all software projects fail: skipping non-functional requirements, skipping testing, skipping error handling, skipping the path from knowledge to understanding, skipping quality at all, keeping the total bus factor below 1. So congrats, you just got a new subscriber who's speaking about such topics on community events. :)

  • @GPandzik
    @GPandzik2 жыл бұрын

    Along the lines of your "Measure before you optimize" insight, Gene Amdahl (one of the guys who created the IBM mainframe Assembler language in the 1960s & 70s for the IBM S/360 & S/370 mainframes) expressed Amdahl's Law (paraphrased): "The maximum value of a speedup (1/(ratio of the runtimes before & after optimizing)) from optimizing a part of a system is bounded by the amount of time a system spends in that part of the system." (For a speedup, if you reduce your runtime from ten seconds (t1) to five seconds (t2), the speedup is t1/t2 = 10/5, for a speedup of 2.0. Reducing from ten seconds to one second has a speedup of 10/1 = 10x.) What Amdahl's Law shows us is that the maximum speedup from optimizing part of a system is bounded by the amount of time a system spends in that part. If you have a choice between optimizing, say, disk IO performance by half or CPU performance by 90%, and your system spends 80 seconds out of 100 in IO, but only 10 seconds out of 100 in CPU processing, with another 10 seconds in "Miscellaneous processing," it actually makes far more sense to optimize the IO subsystem, since a 50% optimization would result in a speedup of around 1.67x (100 / (half of 80 (40) for new IO) + 10 (CPU) + 10 (Misc. time, like memory or something)) -> 100 / 60 -> 1.66666... 100 / 91 -> 1.09890... These are only things you'd know -- where your program is spending its time, and doing what in which inefficient ways -- if you've properly benchmarked or otherwise measured your application's performance characteristics. Using an insight like Amdahl's Law also gives you a place to start to look for optimization opportunities, regardless of if you never calculate actual speedups -- if your program spends 90% of its time in one subroutine, then you know that it makes little sense to start anywhere else in your search for optimization targets, since you could eliminate every other part of the program and only get a

  • @superblaubeere27
    @superblaubeere272 жыл бұрын

    5:30 You could also just use a 27-bit bitset stored in a 32-bit value. Since moving 27 booleans (27 bytes) around is not efficient and such large values can easily lead to cache misses which may take microseconds to be resolved.

  • @DanDeebster

    @DanDeebster

    2 жыл бұрын

    Yup, all you need to do it decide whether to write something yourself or use bitvec or bit-vec or bitvector or bv or bitvec_simd or bitsvec or fixedbitset or smallbitvec or awint or bitvec-rs or indexed_bitvec or ...

  • @user-dh8oi2mk4f

    @user-dh8oi2mk4f

    Жыл бұрын

    @@DanDeebster just use an u32?

  • @jamieforster9986
    @jamieforster99862 жыл бұрын

    i love these kinds of programming videos! they're so informative and entertaining

  • @valshaped
    @valshaped Жыл бұрын

    Glad you're taking advantage of the Wintergatan creator license

  • @seneketh
    @seneketh2 жыл бұрын

    Some awesome conclusions everyone can learn from :D . Cool!

  • @Nathan00at78Uuiu
    @Nathan00at78Uuiu Жыл бұрын

    duuuuuuuuude this video is awesome and made so many connections for me from my computer science education. super cool content and quality!!!

  • @Tantandev

    @Tantandev

    Жыл бұрын

    thank you thank you!

  • @JoeTaber
    @JoeTaber2 жыл бұрын

    Nice video! You could probably multithread the full calculation if you use a "double buffer" system: *two* copies of the simulation data, one is the "previous" (read-only from all threads) the other is "next" (each thread mutably borrows a section). For each step, swap the buffers, split_mut the write buffer, and fan out to N threads that each read from the read buffer and writes cell states to the write buffer for its borrowed section.

  • @LIRAY
    @LIRAY2 жыл бұрын

    I came across you while I was looking for an OpenGL manual. You're a very nice guy

  • @leonardovee
    @leonardovee2 жыл бұрын

    dude, your videos are crazy good!

  • @m3thid
    @m3thid Жыл бұрын

    I was just about to make the same mistakes as you when I saw this video! I was able to take this and apply it on my project. Took it from 18s to 450ms!

  • @Filly309
    @Filly309 Жыл бұрын

    great video! general when optimizing code I look for in order: 1) Can we do less work (e.g. iterations) 2) Can we avoid doing work (e.g. caching) 3) Can we spread the workload out (e.g. concurrency)

  • @camh2316
    @camh23162 жыл бұрын

    Microbenchmarking is useful for measuring if your optimisations actually helped, the other important tool to use is a profiler. Profiling your application will tell you where most time is being spent. Profiling your application for the first time almost always finds some things that could be more efficient.

  • @yhevrah
    @yhevrah Жыл бұрын

    I don't know what's about in this video, but this rejuvenates my love for programming. It made me remember why I feel in love in programming in the first place. Thanks!

  • @GRAYgauss
    @GRAYgauss2 жыл бұрын

    "What gets measured, gets managed." - Some body building quote I have applied to most aspects of my life.

  • @az-qf9ht
    @az-qf9ht2 жыл бұрын

    Arrays continue to blow my mind

  • @dabbopabblo
    @dabbopabblo Жыл бұрын

    Hash maps only see you a performance increase in the use case you need to do a fast search where there is no way of deriving the values location in memory from a previously accessed one, over the alternative of iterating through a an entire array just to find the location of a single key/value pair

  • @julienleroycomptefan1032
    @julienleroycomptefan1032 Жыл бұрын

    Sweet tutorial fam !

  • @imaginaoYouTubesoquecomarrobas
    @imaginaoYouTubesoquecomarrobas2 жыл бұрын

    Great video as always!

  • @leddoo
    @leddoo2 жыл бұрын

    that was quick, nice vid!

  • @Tantandev

    @Tantandev

    2 жыл бұрын

    My unofficial mentor. Thank you so much for the help!

  • @leddoo

    @leddoo

    2 жыл бұрын

    @@Tantandev any time, my man!

  • @leddoo

    @leddoo

    2 жыл бұрын

    looking forward to the new voxel stuff 👌

  • @aliancemd
    @aliancemd2 жыл бұрын

    The code doesn't look like "spaghetti" any more. That "optimized" method for example, looks quite beautiful. It's clear you've got a lot of experience now, even though we always keep learning how to do some things better.

  • @constantinhirsch7200
    @constantinhirsch72002 жыл бұрын

    The simple way of parallelizing is having two data buffers: read from one, write to the other. No mutexes or atomics needed, just par_iter on the mutable array. For next generation, swap the buffers.

  • @mikael808
    @mikael8082 жыл бұрын

    Loved the video, thank you! Also can't wait for the voxel game video, sounds super exciting! :D

  • @SuboptimalEng
    @SuboptimalEng2 жыл бұрын

    Good thing I never have to worry about writing the optimal solution 😎

  • @DanelonNicolas
    @DanelonNicolas Жыл бұрын

    this was awesome. learn a lot! thanks!

  • @johanrojassoderman5590
    @johanrojassoderman55906 күн бұрын

    Another optimization (which you kinda already touched on) is to avoid looping through the neighbours to count them and instead do that using constant offsets and adding them all together in one line. Saves branching and writes.

  • @Neph0
    @Neph02 жыл бұрын

    Love your videos Tantan, I laughed at the Vec, that was great

  • @Tantandev

    @Tantandev

    2 жыл бұрын

    Truly beautiful code

  • @amphibia95
    @amphibia952 жыл бұрын

    Awesome stuff. Subscribed!!

  • @b.o.p.6955
    @b.o.p.6955 Жыл бұрын

    Your video absolutely amazing entertaining and educational

  • @JoeTaber
    @JoeTaber2 жыл бұрын

    Speaking of addressing and layout strategies, I wonder if you could encode the data on a hilbert curve. The "lindel" crate has functions that can efficiently convert between points in the cartesian space to an index along the curve. This might be nice for cache coherency.

  • @simonfarre4907
    @simonfarre49072 жыл бұрын

    So many great lessons you've learned. I can *definitely* co-sign the last one at 11:00. This should be taught at Uni. Unfortunately their teaching is generally bad.

  • @crustydev5561
    @crustydev55612 жыл бұрын

    Found your channel yesterday and somehow it is perfect for my ADHD

  • @kennethbeal
    @kennethbeal Жыл бұрын

    Wisdom at 3:25: "Oh it's empty! I will get some water instead, then." Much better than too many cups!

  • @lassipulkkinen273
    @lassipulkkinen2732 жыл бұрын

    Definitely a nitpick, but the "slow" code at 2:40 uses HashMap::iter which is very different from indexing the elements individually in order, which is what you appear to have benchmarked (this is what was actually being done in the simulation code (where HashMap::iter (i.e. one-by-one iteration though cells in arbitrary order) definitely wouldn't have been applicable), which is why I said that this is a nitpick). I haven't done benchmarks to confirm this, but I would expect (map: HashMap).iter() to be about as fast as iterating through a slice of map.capacity()* elements, mem::size_of::() in size each. (The std hash map implementation appears to use a flat array of (K, V), so compared to a Vec the need to store the keys makes it a bit bigger and thus less cache-efficient, and that's about it.) Of course none of this applies to one-by-one indexing, where hashing and perhaps the chaotic memory access pattern are the main sources of overhead. *Actually, the number of buckets. Apparently these are not the same thing...

  • @zhaadd
    @zhaadd2 жыл бұрын

    i feel at home with these coding videos that i dont understand but know it will make sense after sometime

  • @aliengarden
    @aliengarden2 жыл бұрын

    What is your greenscreen? It looks well made and I'm looking for one!

  • @bytefu
    @bytefu2 жыл бұрын

    Haven't watched the whole video yet, but I have to address an issue. If you want to create a multidimensional Vec and initialize it with a specific default value, the code at 2:00 and 2:05 is unnecessary bloated. First, there is the "vec" macro that works exactly like array initialization. Second, you can index Vecs the same you index arrays. Given these facts, the code can be written like this: fn main() { const SIZE: usize = 10; let states: Vec = vec![vec![vec![0; SIZE]; SIZE]; SIZE]; let cell = states[0][4][3]; println!("{}", cell); } which is pretty much the same as with arrays.

  • @freezerain
    @freezerain2 жыл бұрын

    Wow changing wrapper around data can make code run faster!? Mindblowing

  • @vacantknight
    @vacantknight2 жыл бұрын

    Great Video

  • @martinbeltrandiaz
    @martinbeltrandiaz2 жыл бұрын

    I haven't tried Rust yet, but this videos is interesting as hell

  • @phillipneal8194
    @phillipneal81942 жыл бұрын

    Excellent !! Thank you. As Knuth said: "Premature Optimization is the work of the devil"

  • @thegame4027
    @thegame4027 Жыл бұрын

    One of the first things my first boss taught me about development was, "Make it work, make it fast, make it beautiful." It's just by far the best approach. Since then, my development workflow has been hacking together something that just does the job, then I optimize it until it does the job well. Then I throw it all away (refactor it) into an architecture that fits the solution.

  • @RevHardt
    @RevHardt7 ай бұрын

    Amazing attitude!

  • @wadecodez
    @wadecodez Жыл бұрын

    lol classic example of pre-optimization going horribly wrong! Good thing you got it and listened to people trying to help

  • @Palundrium
    @Palundrium2 жыл бұрын

    That Wintergatan background music... :)

  • @Tantandev

    @Tantandev

    2 жыл бұрын

    many people comment when I use his music. Happy to see other Wintergatan fans :)

  • @qtluna7917
    @qtluna79172 жыл бұрын

    So, funny thing, both the KZread algorithm and me both thought this was about the video game Rust. I thought weird topic, but okay, I'll bite. (It doesn't help that the thing in the thumbnail looks like the heightmap for a video game island)

  • @L4Vo5
    @L4Vo52 жыл бұрын

    hell yeah, he talked about the code!

  • @skejeton
    @skejeton2 жыл бұрын

    You can optimize it even more if you consider that it's safe to run threads on cells that are 1 cell apart (No colliding neighbors) without any locks

  • @Spiritusp
    @Spiritusp2 жыл бұрын

    Regarding how to store the data: I would use a hashmap of blocks (say 16x16x16). It would easily parallelize as well, as each block could be a task. And we could avoid atomics IMO, perhaps by presaving boundary data, which is an order of magnitude smaller than the actual data.

  • @zoeythebee1779
    @zoeythebee17792 жыл бұрын

    Look ma! I'm in a niche internet video!

  • @xandr1922
    @xandr19222 жыл бұрын

    Awesome video

  • @longbranchgooberdapple2238
    @longbranchgooberdapple22382 жыл бұрын

    man I love those boolean uses

  • @SolaireML
    @SolaireML Жыл бұрын

    nice video!

  • @teenageoperator7246
    @teenageoperator72462 жыл бұрын

    oh man, Wintergatan as BGM? I’m in.

  • @NetherFX
    @NetherFX2 жыл бұрын

    Are you planning on adding examples in the repo? :-)

  • @zitronenfresser7089
    @zitronenfresser70892 жыл бұрын

    Could you make a video about your vscode setup?

  • @ajnart_
    @ajnart_2 жыл бұрын

    Hello simple question but can't you just use one single array allocated with like 1x10^N slots and access the element like elem[203] for elem[2][0][3] ? It would be more memory consuming but not time consuming, right ?

  • @distrologic2925
    @distrologic29252 жыл бұрын

    A nested vector is no problem if you just put it in a wrapper type with a simplified Api... like get_cell(x, y, z).unwrap() or expect_cell(x, y, z) if you want to drop the unwrap

  • @GaryCraft
    @GaryCraft2 жыл бұрын

    ayyyy, Wintergatan Music...

  • @apidas
    @apidas2 жыл бұрын

    awesome. I learned a lot

  • @nexovec
    @nexovec Жыл бұрын

    I can tell you from looking at that that the cached vectors insanity is slower than just returning the values if you don't do something crazy.

  • @jauki2742
    @jauki27422 жыл бұрын

    nice video :)

  • @MrLeonnius
    @MrLeonnius2 жыл бұрын

    if the bounds are limited and if the maximum bound a user can select does not exceed an absurd amount of memory then you could still do the array method, just always allocate the maximum right?

  • @saxpy
    @saxpy Жыл бұрын

    you could store cells in a 3d matrix as a chunk, and have a hash map against those chunks

  • @monad_tcp
    @monad_tcp2 жыл бұрын

    All I was thinking when you used hashmap and arrays and double passes was. Convolution, use convolution kernels ! instant speed and threading.

  • @jonxslays
    @jonxslays2 жыл бұрын

    lmao the music when the multi threading code comes up.

  • @marksmod
    @marksmod2 жыл бұрын

    3:30 the bane of every coffee addict. Get a cup and it basically is empty before you even sit down and start programming

  • @diadetediotedio6918
    @diadetediotedio6918 Жыл бұрын

    Consider using a hashmap not to store the voxels itself, but to store chunks of voxels, this way you can have expansible chunks of area without dealing with all the performance costs of using the hashmap for every single voxel

  • @gh0stcloud499
    @gh0stcloud4992 жыл бұрын

    Very interesting vid. Thanks for sharing :) slightly random question: if one wanted to learn more about the various mathematics required for this kind of thing what resources would you recommend? Asking for a friend

  • @1over137
    @1over137 Жыл бұрын

    You are storing an int FFS. Just make the array as large as your potential maximum, alloc it at start up and move on. A single megabyte gives you 100*100*100 of 8 bit ints. We have done this in the real world, stock exchange gateway risk checker, processing 20,000 messages per second on each of up to 20 bi-directional sessions. Wire to Wire latency of less than 20 microseconds, sometimes less than 1 microsecond. (Non commodity hardware). We had a multi-dimensional array of financial instruments containing market data (pointer). We just allocated double the number of possible symbols and allocated a 15Gb Array on start up. If you want performance, use static memory allocation or face doom.

  • @1over137

    @1over137

    Жыл бұрын

    Also, remember, (while this might not work in rust).... you can allocate a flat single dimensional array of data and then create as many indexes as you want to access those pointers in whichever way is performant. Obviously this works best when you know the bounds of that data first though.

  • @user-ng8wh8to5o
    @user-ng8wh8to5o2 жыл бұрын

    It would be great if create blog with articles about optimisations

  • @giantnoah
    @giantnoah Жыл бұрын

    Instead of a hashmap you could use an array set to the maximum size you're planning to use, and then make a getter/setter that clamps your values to the bounds set in the program. Should certainly be faster to iterate over than a hashmap, and depending on the hash function could likely be less memory used.

  • @mfyoutube5003
    @mfyoutube5003 Жыл бұрын

    tantan, 2.05us is not 205000 ns, it's 2050 ns 1us = 10^-6s and 1ns = 10^-9s so 2.05us = 2.05·10^-6s = 2.05·10^3·10^-9s = 2050·10^9s = 2050ns I never comment on KZread but this triggered me sorry lol I love your videos

  • @mfyoutube5003

    @mfyoutube5003

    Жыл бұрын

    ok so this is why you finish the video before commenting friends.

  • @Jkauppa
    @Jkauppa2 жыл бұрын

    incremental iteration rocks :)

  • @VelocityTheory
    @VelocityTheory2 жыл бұрын

    2:00. Why instantiate an object to 0 on O(n), when there is calloc()?

  • @multiHappyHacker
    @multiHappyHacker2 жыл бұрын

    waiting for C++ '23 and '26 flat_map and flat_set until then we can probably use a map to map the key to an index into a vector or flat array, so you still have a flat array to iterate if you choose and constant time accessing.

  • @ronaldweasly561

    @ronaldweasly561

    2 жыл бұрын

    O(1)?

  • @multiHappyHacker

    @multiHappyHacker

    2 жыл бұрын

    @@ronaldweasly561 that sounds more like unordered_map territory