DIY 256-Core RISC-V super computer
Ғылым және технология
Free Assembly for 1-6 Layer PCBs at JLCPCB, 3D Printing from $0.3, Sign up to Get $60 Coupons here: jlcpcb.com/?from=bitluni (Sponsor)
This new cluster build escalated quickly. Especially with the bugs I built in but here are some specs:
256x RISC-V 48MHz
17x RISC-V 144MHz
640x GPIO
256x ADC
17x 8-Bit bus
Combined single core clock rate would be 14.7GHz not that impressive but also not too shabby.
0:00 Supercluster recap
0:41 Intro
1:41 PCB Design and BU
2:36 JLCPCB
3:30 Assembly
5:14 First tests
5:48 BUS protocol fix
8:28 BUS tests
9:53 Conclusion
Tools and parts (affiliate links):
Preheating Station: aliexpress.bitluni.net/heatin...
Flux: aliexpress.bitluni.net/flux
Syringe Pusher: aliexpress.bitluni.net/pusher
Low Temp Solder Paste: aliexpress.bitluni.net/lowTem...
Tweezers: aliexpress.bitluni.net/tweezers
Edge Connectors: aliexpress.bitluni.net/edgeConn
CH32V003: aliexpress.bitluni.net/ch32v003
CH32V203: aliexpress.bitluni.net/ch32v203
Scope Siglent SDS1104-E: amazon.bitluni.net/siglent4
Digital Probe Siglent SLA1016: amazon.bitluni.net/siglent16
Github Sponsors: github.com/sponsors/bitluni
Patreon: / bitluni
Channel membership: / @bitlunislab
Paypal: paypal.me/bitluni
bitluni live: / @bitlunilive
Twitch: / bitluni
Mastodon: chaos.social/@bitluni
Twitter: @bitluni
Discord: link.bitluni.net/discord
Пікірлер: 253
Dude - use a foot-operated vacuum pen - much quicker & easier than tweezers!
@johboh
Ай бұрын
I would like one! Any recommendations?
@ProtonOne11
Ай бұрын
@@johboh I guess the Pixel Pump might be a good candidate. I have not used one myself, but the fact that it's an open project is a good thing. Of course there are cheaper and less capable options, but if you do regular board assemblies, buying a decent and a bit more expensive tool once will save you a lot of time and money over time.
I think your decision to not put everything on one big shared bus was the smart approach. Each input pin on the bus has a small amount of parasitic capacitance, which increases bus loading and requires additional drive current from the output pin driving the bus. That increases dI/dt which means more radiative EMI and crosstalk, and distorts the edges. This is less of a problem with an open drain setup, but still causes slower edge transitions and ringing. The long traces will have a lot of inductance which, left undamped, also tends to cause a lot of ringing. Longer traces also mean you're getting to the point where you're having to model them as transmission lines, since the Nyquist frequency of the design is set by the rise/fall time (not the clock!) and that's very fast on modern ICs - it's pretty common to see frequency components in the 300-800MHz range during transitions, so if you're running traces further than about 9cm you can no longer treat them as lumped lines. Once you get to this sort of scale you typically want to be using bus redrivers to break the bus up into smaller segments to avoid SI/EMI problems. If you start finding that you have SI issues once you add all the boards, two things you can do are reducing the pullup resistor value and adding a small resistor in series with each IO line. Right now with 5.1kΩ pullups you've got that classic sharkfin shaped clock, where the pullup resistor takes a while to overcome all the parasitic capacitance on the board. You can speed that rising edge up by reducing that pullup resistance - bodging a second 5.1kΩ resistor on top will do that. The falling edge is very fast because the IO pins are actively pulling the bus to ground. This causes big dI/dt spikes at the falling edge, while all that charge stored in the parasitic capacitances rushes through the low impedance path created by the active low-side FET. You can moderate that dI/dt with a small value resistor (e.g. 22Ω) in series with each of the IOs, so the bus is still strongly pulled down but the current isn't controlled only by the Rds(on) of the low-side FET in the IO. Since you've already spun the boards this might be kinda tricky to add - maybe something for a rev2/3? :)
@rya3190
Ай бұрын
It also doesn't hurt that he left the "repetition" and modularity to the board coppies. Kind of made me think of repeating code where a loop should be implemented. It would be easier to maintain/rid of bugs, and left the mind numbing repetition to the manufacturing. Not to mention he can expand the cluster as needed.
@modernsolutions6631
Ай бұрын
It was fun meeting you in person during CCC last year. Strange to see you pop up in a comment section though.
@gsuberland
Ай бұрын
@@modernsolutions6631 I've never been to CCC! EMF Camp 2020, maybe?
@modernsolutions6631
6 күн бұрын
@@gsuberland My bad. Then i must have confused you.
I love the random clock variations on the blink sketch. Fun source lf entropy.
@hellsing56666
Ай бұрын
It's one of the nightmare of electrical designer. Very hard to synchronize differents components at high speed.
@gsuberland
Ай бұрын
It's also sensitive to temperature, so if you have a thermal gradient across the ICs you'll find that some drift faster than others.
@king_james_official
29 күн бұрын
@@gsuberlandheating up half of the boards sounds like a cool idea
@siz1700
24 күн бұрын
@@king_james_official hot* idea
@king_james_official
24 күн бұрын
@@siz1700 ha ha ha!!! (with long pauses in between)
Makes me wish I'd done electrical engineering at university. This level of dev is beyond my capability of simple analog electronics, I'm like a monkey with a spanner. Not enough time in the day now to reskill but your work is inspiring and why I'm subscribed.
@thek3743
Ай бұрын
Want an easy start? Watch Ben Eater videos! Start with the breadboard series, then the 6502!
@curtheisler1200
28 күн бұрын
@@thek3743 Ben eater is the GOAT. 100% great series. His 12(?) part networking series is also great.
@theRPGmaster
21 күн бұрын
Same here. I'm a software developer, so I don't have much time, but I've always been interested in electrical.
So awesome. IMO Fiasco would be a cool code name for a project or chip.
@ted_van_loon
Ай бұрын
fiasco 256, that way there can also be a fiasco 10000
@jercos
Ай бұрын
The L4Re Microkernel is named Fiasco.
At 10 pins free per 48mhz cpu, you could connect 20,040 leds (or 6 million if they are combined). Enough to make a small terminal screen...or play bad apple. With each pin handling 90 leds at 48Mhz, this thing would push pixels like a monster. Just need the timing to be perfect.....
Not sure if you've done this already, but it might make sense for you to have a seperate "subnet" for each blade and then only send transmitted data on the inter-blade bus if the destination is outside of that subnet.
@uis246
Ай бұрын
Dude, design GPU already
@monad_tcp
28 күн бұрын
oh my god, you're reinventing the ethernet
@tophyr
27 күн бұрын
@@monad_tcp that sort of sub-networked interconnect is common in CPU design as well
@cabbose2552
24 күн бұрын
@@tophyr mfw everything is just ethernet
Watching your pick and place makes me want to both go into electronics and stay the heck away from it.
@kefsound
Ай бұрын
"your"??
Ok, but can it run Crisis?
@pazsion
14 күн бұрын
in theory, yes... i guess we will see. first doom, then quake, then crisys then half life 🤓
@pazsion
14 күн бұрын
doom first quake halflife crisys. it should run it. but without any gpu it may be animated gif gameplay...
@sharma_harsh
7 күн бұрын
Crysis*
I am a software person and I built cards with my electronic partner 15 years ago that each card has three microchip processors that communicate with each other on the card in fast serial communication on pullup lines. These cards communicated with other similar cards for ranges of 10 km on a pair of cords that also transferred the energy for the needs of agriculture in the field.
Can it run Doom?
@Meskalin_
20 күн бұрын
sounds like a reasonable end goal
Ever heard of the "transputer", a 1980s commercial computer made of a collection of thousands of tiny weak processors working in parrel for advanced scientific tasks. Your cluster reminds me of it. Retrobytes channel made a video on it several months ago.
@timsoft3
Ай бұрын
we did basic programming on them in the 90's. used for fft audio processing
@destiny_02
Ай бұрын
that sounds pretty much like a gpu with its shader units
@cryptocsguy9282
Ай бұрын
Yep I was reading an article on the chips & cheese blog the other day about a Qualcomn mobile GPU & that's what I was thinking @destiny_02
@laurensweyn
Ай бұрын
Reminds me of TIS-100
@king_james_official
29 күн бұрын
that's how a modern video card works!!! they have thousands of units (they're called differently among gpu manufacturers) that run in parallel executing small programs called shaders, which (oversimplifying now) all determine the color of EVERY pixel on your screen tens of times a second
first ime ive seen tape and tray of parts being used, kudos. i did inkdot for a year because i loved the simplicity and focus it required. they moved me to pin refurbishing when they found out i could do it easily
wow, this is incredible to see the idea from start. You're awesome!
As always, an amazing project. The funky music for hand SMD assembly *almost* made it look enjoyable 😂
Wow just discovered. Awesome. Can't wait for the next!!
Many kudos for attempting such a "mega-project". No pain no gain...
Amazing work, what a project! 😮👍
You are very close to the original Ethernet CSMA/CD protocol. The XOR checksum has the problem that two colliders can cancel eachother - two single bit errors could result in a correct checksum - making a packet "appear" good. As such Ethernet uses a CRC. Further, if you detect a collision you "jam" the whole packet with alternating ones and zeros to really mess it up and then do your randomized backoff. What you will find, and you are not the first, is that as you scale the collisions will increase and the bandwidth will be insufficient. The cores will be data starved. This was the case with the Intel MIC's (Knight's Corner). They used PCI-E but the issue is the same, multidrop and star topologies oversubscribe easily. You will note datacenters (home of enormous clusters) used leaf spine (and other) interconnects to mitigate this. But fun none the less. So you have a huge number of course - what will you do with it? What would others in the comments run?
Your message collision scheme is remarkably similar to how CAN. works. It seems you've independently discovered an excellent system. very impressive.
@masterofx32
Ай бұрын
Kind of, but CAN has a priority system and allows the message of the highest priority transmitter to go through. This is especially important in automotive applications.
@curtheisler1200
28 күн бұрын
It's CSMA-CD. Used most commonly in 802.3 (commonly ethernet) communications.
This is the coolest thing I've seen in a while!
Cool project, thanks for sharing.
I love it when blink goes out of sync... it looks like one of Big Clive's "supercomputers" except it really is a supercomputer!
@MichaelKingsfordGray
29 күн бұрын
It reminds me of the Lost in Space equipment in the early 1960s! Great danger.
that's going to be fun to program
CSMA/CD reinvented :)
You're going to run Game of Life on that thing, aren't you?
@rya3190
Ай бұрын
That or Bad Apple.
@nathanadhitya
Ай бұрын
That or we get rickrolled.
"...actually 273 but okay" is the best subtitle for a video in the history of the platform.
this is nuts! , i love it!
Awesome work 😮
Amazing video! you are teaching a lot of stuff with this.
Ein Jahr jeden Tag auf neue warten hat sich gelohnt 🥹🥹
This is so awesome!
I don’t know what it is, but I love it! More!
I would use an active pullup (constant current source) on the bus with so many devices on the bus. It could be a current mirror with two P MOSFET transistors (e.g. BSS84). With 5 mA current, it would probably speed up the communications a lot.
8:36 Holy rise time Batman! The Signal Integrity engineer just started breaking out in a cold sweat
Need more!
That is a lot of CPU power for some random blinking LEDs :)
@Ral2O3_
9 күн бұрын
That could have been achieved with much less resources and effort indeed
man I wish I had your skillset.
Thats HUGE!!
just discovered your channel and this is super cool! what was your career path that got you into electronics? thanks!
I have to admit, watching them go out of sync is beautiful. Am I weird to like that more than synchronized blinking?
I know people who would go over the edge for your random parts placement :) "All values of similar resistors have to face the same directions"... LOL. Nice one. Wish I had more time to join the livestreams again ...
@valet_noir
Ай бұрын
how is your comment 2h old ? the video was uploaded 5min ago 🤔
@peter.stimpel
Ай бұрын
@@valet_noir Patrons get early access, even this means only 2 hours in Butluni terms. Other KZreadrs are a bit more generous here ;)
@edgeeffect
Ай бұрын
All values of all components must face the same direction!!! ;)
Heck of a great project! And custom CDMA!
This project reminded me about both the game of life automaton, KISS principle and CD part of CSMA/CD.
If you send a considerable amount of broadcast it makes sense to have a bit after the source address which is only set if it's a broadcast. So you can skip the target address completely. This only makes sense if you send a lot of broadcast messages, as every unicast message is then 1 bit longer
Great stuff
i just discover your channel an io immediately subscrtibed. this project mesmerize me. keep on!
A true work of art! My hats off to you! 🍻
Wow that's insane
You may be nuts, but that's much of the fun of watching. This project is a delightful sprawl, full of potential and hurdles. What do you want it to become, beyond the LED art? I mean, is there a target functionality or is the journey the goal? Well, I guess we'll find out.
Really nice project. What are you using for the top view shots? 4:39
sounds insane in performance, but actually would be roughly equal to 4 cores running at just over 3ghz due to the low clockspeed. that said it does show it is possible, and if this works it will also work with much faster risk-v chips. actually in some arm architectures the cores where designed to be kind of used like this so you could just keep scaling them, there was actually some 1000core arm cpu somewhere around 2013 or such, sadly never took of since back then mulithreading didn't really practically exist yet, as in that basically no softwares used it, and that things like handling large amounts of data at once wheren't a thing yet. that said, risc-v is opensource, so it means it should be possible to actually make a risc-v cpu which directly combines tons of cores. if you plan to make something like that I do have a better way for you to try out than using a single bus(or a few busses) since using busses like that can work but can have problems, I roughly designed a new experimental way of doing such multichip communication for the raspberry pi foundation some years ago, actually was to try and get them to make a board with way more cores. but essentially it is a method giving quite some bandwith but also large buffering and chips being able to get the data when they are ready instead of needing to accept it directly, that said, in some cases direct busses might be more usefull, luckily in a full cpu design you can make many more busses, both have advantages and weaknesses depending on the loads.
after 2 minutes you already deserve a like!
You may want to decrease the resistance on your clock line. A slow rise time can cause one of the processors to miss a clock and become out of sync with the host.
crazy! in a good way!
You could use the now free command pin to sync all the clocks together
GAME OF LIFE on this would be insaine
The blink looked like game of life 😂
Game of Life; Each Cell (group) has a finite time to check for- Food, Friend or Foe in adjacent blocks. Movement is turn based. Food, a limited, randomly placed resource, extends (life) up to 10 turns. Finding a Friend, adds a chance of 1-2 new Cells each turn. Each Foe can remove 1 adjacent Cell (not of its group) per turn, adding a chance for their group to grow next turn. *time is finite* for all Cells. Friend, Food or Foe. Meaning- the simulation ends, and you get to see a nice pattern of what groups fizzled, which ones flourished.
I think this is an amazing achievement. I would love to see you demonstrate its speed with some "sha-1"cracking or comparison testing against a raspberry pi 5 and a mid range PC with a long duration 24hr minimum to see how far 17Ghz can go I a day
super..., so what is it for? what you can implement on it?
You can also add in a small fpga to make the to run or manage the cluster?
So you are the guy that created the brain of skynet! I knew it!
Did you just create a pretty good random number generator with those blinking leds? Looks much cooler than those lava lamps
Nice!
would be really cool to see this do something like a phone or computer software benchmark... with the lights it would be very satisfying... knowing the computer is actually computing... do the same for the hd/ssd and gpu 🤓🤓🤓 actual functional led display / rgb lighting
Hi, Is there a video on the tool chain for this uC ? Cheers !
he's gone mad!
this is really incredible. i'd love to see a collab between you and @beneater !!! Really great work.
Pretty project, thanks for sharing
3:30 LumenPnP when? My hand and eyes hurt just watching all that placement! ( I probably just have a low tolerance though lol)
how is software development with the ch32 ic? i have been thinking of trying them out, but the "sdk" (or examples) looked really scary...
let's game on it! :D
An absolutely awesome project with great prospects and the limitation is the users imagination. however, I have 2 questions: 1. can it run doom? 2. in chat language was the code written? was it C/C++>
Very interesting and good job, but what's the next step?
Your collision detection is very similar if not the same as CAN Bus collision detection (I need to check but I believe it’s at least close)
if the collision detection is waiting a random time using the ID as the seed so they're always different, why not just use the ID as the amount of wait time directly?
5:18 > We can hear you laughing. I like your enthusiasm
I would mine so many moneroj with this
What might be some of the use cases for the Megacluster?
Despite missing half of the explanation bcs I have no idea what the used terms means, I nevertheless found everything fascinating. For me, its like our modern day version of an art painting. Can u tell me which kind of university degree/knowledge/skills are necessary for such project? And good job! 👍
😅 This is a kind of projects I really like watching, but I have a question, what can It really do beside some basic stuffs, anything like computing with a lot of cores ( that may be too hard 😮 ). In my opinion, this is an interesting project I love. Thank you for making the video, hope you have a great day 🎉🎉🎉
@sc0or
23 күн бұрын
Exactly. I think that to blink a LED, some FPGA will beat x10 RISC-V by number of I/Os, speed, and by a price. It's interesting to have some idea what is it for, how much for one flop, what are alternatives in terms of a price, performance, so on.. It's like to build a cluster with Raspberry PIs, when you can take an i7 and save money and have much better performance.
Would you he able to upload those first streams in which you made the cluster and the protocol? It's not on twitch nor KZread...
I have been planning to do this with sg2002s.
I've got two questions: 1) for what could this be used for? 2) for the waiting time after a collision, couldn't you use the ID itself as a delay? Or maybe force them to report in order, maybe using a master or calling the next one in line
Nuts!
very gud 👍
What an epic project! Subscribed Probably dumb question, but could the collision detection be replaced by a queueing system where an mcu can request the bus then get serviced fifo? Maybe that would be slower. Would be cool to see a map-reduce algorithm running on this beast.
@jercos
Ай бұрын
That becomes a latency/bandwidth tradeoff... if you can request large chunks of dedicated time, you can shift bytes out at full speed, while both collision detection and turnaround (the system setting after currents potentially change direction on the backplane, also peak EMI) inherently slows down the timing required. Many systems use a fast clock with added guard intervals, clock cycles where nobody drives the bus.
how many GBit/s is your outside communication? Do you use QSFP??
I would love to see a super cluster with the cluster and a 2040 I/O chip?
And then add some what are currently at the moment quite inexpensive RAM and Storage for BIOS?
now can this thing actually process data like a cluster?
What’s the music playing during assembly at 4:13?
You are integrating systems for the s100 bus.
I wonder how the Green Arrays chips handle communicating between CPU's.
Hows the timers and clock speed? Can it runa 60hz hdmi or vga
0:50 imagine the reveal of an ai being sentient by the blinking lights getting faster and faster and then it stops, and just starts showing text?
Could u check the flux link? it also refers to the syringe page! thx!!
nice project, but i have one question. i want to try this chip CH32V003 but can i use other swd debugger or it need to be e-link debugger?
@0LoneTech
Ай бұрын
SWD is a 2 wire ARM variant of JTAG (normally 4 or 5 wires), unlikely to appear on non-ARM chips. CH32V003 uses a different 1 wire debug interface they call SWD or SDI.
Those timelapses a true Luni Pick & Place