Gaming on NVIDIA Tesla GPUs - Part 2 - NVIDIA Pascal

Thanks to Maximum Settings Cloud Gaming for sponsoring today's video. Get started with your very own Cloud Gaming VM or Bare Metal Machine at bit.ly/3wxfUuB!
Grab yourself a Pint Glass at craftcomputing...
My review of every Enterprise GPU in my collection continues, this time with NVIDIA's Pascal series. Today, we're going to take a look at gaming performance of Tesla Flagships in the Tesla P40 and Tesla P100, as well as the low-powered Tesla P4.
But first... What am I drinking???
Ironhorse Brewery (Ellensberg, WA) Cookie Death Smooth Dark Ale (7.8%)
-- vGPU Installation --
Proxmox vGPU Installation Script: github.com/wvt...
PolloLoco's vGPU Installation Guide: gitlab.com/pol...
Manual vGPU Install Tutorial: • Proxmox GPU Virtualiza...
-- vGPU Memory and Frame Limiter Unlocks --
(Under 'Define GPU Profiles')
drive.google.c...
Links to items below may be affiliate links for which I may be compensated
-- GPUs --
Nvidia Tesla P40: ebay.us/vhT8pZ
Nvidia Tesla P100: ebay.us/lAVQK6
Nvidia Tesla P4 SFF: ebay.us/LXlTeJ
AliExpress: s.click.aliexp...
-- GPU Server --
Asus ESC4000 G3 GPU Server: ebay.us/ngCIDO or ebay.us/hAThwr
Asus ESC4000 G3 Sliding Rails: www.neobits.co...
Intel Xeon E5-2697 v4 18-Core: ebay.us/6nlzsD
8x32GB DDR4-2400 REG-ECC Memory: ebay.us/4DNJ00
1.92TB Patriot Burst Elite SATA SSD: amzn.to/4b8Q9jc
Follow me on Mastodon @Craftcomputing@hostux.social
Support me on Patreon and get access to my exclusive Discord server. Chat with myself and the other hosts on Talking Heads all week long.
/ craftcomputing
-- Timestamps --
0:00 - Intro + Sponsor
2:21 - Server and GPU Specs
7:15 - 3DMark Testing
10:00 - Games Testing
16:17 - Results
20:00 - Why No AI Tests?
22:32 - Beer Review

Пікірлер: 214

@zeroforkgivenАй бұрын
Price at the launch of this video for the Tesla P4 is ~105 on eBay. Very curious what it will be tomorrow.
@CraftComputing
Ай бұрын
Do I post a pre-emptive sorry, or wait until they're $250?
@logan_kes
Ай бұрын
@@CraftComputingsomething has to have driven up p4 prices in the last month or so. In December last year I picked up a pair of p4’s for $85 each (trended $75-$80 from China or $80-90 from U.S. sellers) and now they have shot up to close to $110 for us sellers and $95 for China sellers. I’d imagine your video will bump these up even more 😅 glad I just got 2x Tesla p40’s though last week before those go up too 😂 Keep up the great content, as you get more and more popular you will start to mess with the used enterprise equipment market more and more to the point you will need to put a disclaimer in your videos stating *pre video pricing* lol
@zeroforkgiven
Ай бұрын
@@CraftComputing LOL, its the Craft Effect. I don't mind as I already own 2 of them (the best Plex hardware card IMO) and the prices will fall back down in a few weeks.
@garthkey
Ай бұрын
Yea after he post videos they spike. I just bought the ASUS gaming server from a couple videos ago. Original price was 175. Then it spiked to 250
@JPDuffy
Ай бұрын
I bought one for $50 in February. It's excellent, but I don't think it's worth $100+ considering the extra work to setup. I have it running in a 4970K Dell and it plays 1080p games at max settings 60+ without breaking a sweat.
@BrinkGGАй бұрын
I've been waiting for this one! Was holding out on buying a P40 or P100 until this came out. Thanks Jeff. :D
@ProjectPhysXАй бұрын
Main difference between P100 and P40 is not the VRAM. The P100 has 1:2 FP64:FP32 ratio, for the P40 (and all other Pascal GPUs) it's 1:32, basically incapable of FP64. P100 is much better for certain computational physics workloads that need the extra precision, like molecular dynamics or orbital mechanics.
@OliverKr4ft
Ай бұрын
All games use FP32 though, so the additional FP64 FPUs on the P100 make no difference
@sidichochase
Ай бұрын
@@OliverKr4ft For gaming. but for people who want a nice cheap GPGPU, the p100 is the better choice.
@TheLibertyfarmer
Ай бұрын
P100 can do 2:1 fp16 to fp32 ratio too which makes it much faster at training at 1/2 precision than the P40 as well having NVlink support. and thus more efficient for training in general
@gg-gn3re
Ай бұрын
@@OliverKr4ft Yea, gaming isn't "molecular dynamics or orbital mechanics" if you didn't know
@MiG82au
Ай бұрын
@@OliverKr4ft The claim in the video is that the unique chip is for HBM2, which is arguably wrong and at least only half the story because the biggest difference is the huge count of FP64 execution engines. Whether games use FP64 or not is irrelevant to why the GP100 chip exists.
@AmericancosworthАй бұрын
Hurray! More good ideas for my poor life decisions (building a cloud gaming server)
@edgecrush3rАй бұрын
I am running the p4 for almost one year 24x7 and absolutely love this card. I have more projects running on this thing now, my NAS doesnt classify as NAS anymore😅 more vGPU Emulation server, enjoying Mario Kart on many connected devices with the whole family 😂 its just so dang cheap now, its impossible to beat and great for inferencing llms (the p100 would be better due to faster memory).. i am now hoping the t4 will drop in price.
@samthedev32Ай бұрын
I have been waiting for this video for so long! I was planning to get a P4, and now I want it even more :)
@SpoonHurlerАй бұрын
I agree with you on benchmarking LLMs and AI (or advanced logic generation). Many benchmarks will also be irrelevant in a year (my opinion, not a fact) I wouldn't waste time making possibly bad results in such a chaotic environment unless I was very equipped to do so. I do think a video of playing around/ learning LLMs could be interesting though... With no comparative numbers, just a journey episode.
@CraftComputing
Ай бұрын
Yeah, I did a couple videos on Stable Diffusion last year, where I explored running it in my homelab.
@tylereyman5290Ай бұрын
I somehow managed to snag a p100 last month for $40 off ebay. That may have been the greatest deal i have ever scored.
@dectoasd3644Ай бұрын
1 minute in and I'm already excited with my 2 x P40
@Satori-Automotive
Ай бұрын
how does a single one perform in rendering and editing compared to something like a 1080ti?
@DerrangedGadgeteerАй бұрын
I'm so glad you ran these benchmarks! I'm elbows deep building a multipurpose virtualization/AI server out of 2nd gen threadripper and p100's. It's good to know what to expect, and also that my expectations weren't way off base when I started.
@davidfarnham3548Ай бұрын
Really curious to see how a t4 performs vs a p4
@KiraSlith
Ай бұрын
Ehhh... As I understand it, the P4 has two working NVENC engines on-die, but the T4 is a custom compute-targeted die from word go. You'll get more for your money from the P4 if you're using it for virtualization/transcode, especially since the P4 is still staying sub $130, where the T4 is hovering around $600 at the moment. If however, you're looking for FP16 compute specifically (like for AI tasks), the T4 is fast enough it competes with a 3090 while staying at 75w. It's a spectacular monster within that specific arena only, it's FP32 is pretty miserable for it's price however, which is what games make the most use of.
@Prophes0rАй бұрын
I know it is comparing completely different families, but I'm interested in comparing the P4 to an A380 in straight passthrough. The A380 can be had new for $120ish for the half-height cards. That's in the same ballpark. I know we will never get SR-IOV for the ARC cards (Except maybe the A770 with major hacks). But I do think it has the possibility of being really interesting. Plus, there are other comparisons to be made. The Nvidia cards likely have a gaming performance advantage, but QuickSync is SO much better than NVENC that it might make big differences when it comes to encoding those remote desktop video streams.
@DMS3TVАй бұрын
My big takeaway from this is just how impressive integrated graphics are now. These dedicated GPU's were once the bee's knees and now a 7840U can outpace them in games. Really cool times we live in!
@DustinShortАй бұрын
I was really surprised by the P4. I may have to try it as an energy efficient VDI solution. At two VMs per GPU it should be more than enough powerful for light CAD work, but I bet you could squeeze 4 VMs if you aren't working with large assemblies.
@anthonyguerrero4612Ай бұрын
Wow, I wasn't expecting this, thank you for further experimenting. 😊
@criostasis14 күн бұрын
I designed and developed a RAG based LLM chatbot for my university using GPT4All, langchain and torchserve. Testing with my 16GB RTX 5000 laptop it performed on par with just my 13900K, producing answers with memory and chat context in about 40-50 seconds. On my server with an RTX 4080, it was blazing fast, answers came in about 5-10 seconds. Im sure a 4090 would be a bit faster but I didnt have one to test. Concurrency on a single gpu is where you can really hit a bottleneck. You have to setup a queue and locks but to handle it but with one gpu it gets slow. Thats why OpenAI and others have thousands and thousands of GPUs to handle to concurrent workloads. That and some magic code sauce I didnt get around to implementing in my time working on it before handing it off.
@m4nc1n1Ай бұрын
I still have my 1080TI on a shelf. Used it for years!
@pkt1213Ай бұрын
I just put a P4 in my server. 4 transcodes were using ~25w. I may pick up a p40 or p100 if I want to run a AI locally.
@pkt1213
Ай бұрын
I also took the front plate off 5 sink and ziptied a 40mm noctua fan over the die. Haven't seen it much over 50C.
@thedeester100Ай бұрын
Been using a Quadro P4000 gpu for over a year. Was half the price of a 1080ti on e-bay at the time. I dont game a great deal anymore but its never failed at anything Ive thrown at it.
@clintsuperhero10 күн бұрын
I had seen titan XP's for cheap($180), i bought one out of childhood dreams. Though the performance ive gotten from it has been great over the 1080 i used to have, less stuttering in some games and overall better for 1440p like my two main screens are.
@ShooterQАй бұрын
Just threw a unused Tesla P4 8GB into my Frigate NVR for video decoding. Thing shoots all the way up to 105C and crashes. Added an 80mm Arctic P8 with some custom ducting and have it working along at 71C constant now. Doing great for the $100 pricetag. Dell Optiplex 3080 SFF, so it's the biggest I could fit and works well with the available power in that slim PSU.
@Seventeen76Ай бұрын
Jeff is it possible to run dual 30 series Nvidia cards for stable diffusion machine learning?, I am currently using a 3060 12 GB, and I am waiting on a 3080 10 GB to come in the mail, is it possible to run them at the same time? ( I know you can't combine them and run them as one) But somehow making them both work for my desired use? Or is it just better to run it with the 3080 and just leave the 3060 for something else? edit: I have a 5950x, and I'm using a x570 crosshair 8 extreme motherboard, 64 GB of g skills Trident 3600 mega transfers, with a seasonic platinum 1000 watt power supply.. in a cooler master cosmos c700m case.
@KomradeMikhailАй бұрын
I run into significant app crashes and issues when using an HBM2 graphics card through PCIe Passthrough to a VM. Most noticably with KiCAD, and Deep Rock Galactic. They run fine on the same hardware bare-metal. First encountered with a Radeon VII, then tested a Titan V to compare. Same results for team red and team green. Tested on a Broadwell Xeon workstation, slimmed down from what Jeff runs in this video. Anybody else have issues passing through HBM2 ?
@CraftComputing
Ай бұрын
I've had no issues at all. I've done testing on the P100 and V100, and haven't had any problems.
@OliverKr4ft
Ай бұрын
Have you stress tested the cards on bare metal? The memory type should not have any effect on stability when passed through
@buddybleeyesАй бұрын
Lets goo! Love this cloud gaming series 😄
@novantha1Ай бұрын
With regards to AI tests: It might be an unexpectedly sagacious decision to avoid jumping into it at the moment. We're at what is simultaneously a crossroads and an wild west, and I can only see it getting more crazy. In the simplest possible terms: Raw FP16 compute sort of doesn't lie. Given a sufficient quantity of it (and memory bandwidth to feed it), it's pretty straightforward to multiply two matrices. But there's a problem. TOPs. Dedicated TOPs don't operate on the same principle as FP16 compute (And I'm giving companies' marketing divisions the credit of assuming they're talking about tensor operations when they talk about TOPs which is not always true), so it can be hard to draw an equivalence between, for instance, the FP16 compute of a pascal card and the tensor performance (which is often the majority of the AI performance) of a modern Nvidia GPU, for instance...To say nothing of extended instruction sets in the X86, ARM, or Risc V space (I would love to start a youtube channel talking about those at some point; a lot of people misunderstand CPU AI performance, including Ampere, and now Intel's Sierra Forest marketing department). And then it gets even harder. Do you compare the memory access patterns or the raw performance? If you do the raw performance, a pascal GPU might hold up surprisingly well because in the end, FP16 and memory bandwidth will get you most of the way there. On the other hand, something like a CPU with VNNI extensions (Zen 4, and I think Intel's server P-cores, but not consumer) might actually perform more efficiently for its memory bandwidth in the sense that it can do lower precision AVX compute, and at a faster rate per unit of bandwidth thanks to fused instructions, but it might have a slower absolute rate of operation. Which one is better? Well, it depends on your use case. Plus, all of this is ignoring more exotic things like Tenstorrent's lineup (very sexy), or things like Hailo M.2 accelerators (very accessible). So when you add it all together... At what precision do you evaluate? Some accelerators (notably CPUs, NPUs, and accelerators) will perform at an outsized rate on lower precision, particularly integer operations like int8. Common high performance AI models are not trained with those precisions in mind, so there is an accuracy loss at those precisions (And some of those losses only show up experientially, and not on standard benchmarks). Is it fair to compare an accelerator with block floating point 8 to the full FP16 of another accelerator? How much customization is allowed to the pipeline? Is it fair to compare image generation on Nvidia and AMD using Automatic1111 webUI, when AMD is a second class citizen there? Do you compare Automatic1111 Nvidia to nodeshark AMD? How do you compare an accelerator with more RAM at a slower speed to one that has a fast speed but little RAM? Some people favor accuracy/quality, while some people favor responsiveness, and some people have crazy workflows that depend on huge amounts of generations from models whose quality almost doesn't matter. In this case, the one accelerator would just be better because it can run the higher quality model, but that might not be what everyone wants. Is the evaluation on training or inference? If training, with which framework? Are tensor cores used? Do you use "pure" primitives like ResNet, or off the shelf production grade models and pipelines? Do you measure at large batch sizes indicative of peak performance, similar to how we do CPU evaluations in gaming benchmarks, or do we test single user latency, which is reflective of end-user engagement with the product? Do you focus on objective, timeless evaluation, such as by looking at peak performance (people would have had a really bad time buying hardware for AI if they bought before quantization and flash attention changed the game pretty drastically), or do you take into account the current state and usability of the hardware (people would have had a really bad time buying an unsupported AMD GPU assuming that "oh, AI's a big deal, it'll all get supported eventually"). Honestly, at the moment it's a bit of a mess, there's not really industry standards, every option you try to test at could have default settings or customizations which vary by hardware, making it potentially not fair, and there's just not a lot of collective industry wisdom on how to do it right. To be honest, I'm not sure why I typed this out, I'm not sure this is going to be terribly useful for anyone, lol.
@CraftComputing
Ай бұрын
LOL, I read it. I've done some research and talked to a number of colleagues about AI performance testing, and you summed up a number of points nicely. Every model is built a bit differently. Every GPU has their own strengths and weaknesses with its own hardware, configuration, available features, etc. My 2¢, oftentimes, orgs that want to run a specific AI model will purchase the hardware that model was built for. Me running generic benchmarks isn't really an accurate assessment of performance, as each model will take advantage of specific GPU architecture features. Like as you mentioned Int8, FP16, Tensor/RTX, etc performance in GPUs will wildly affect the speed of running a specific model, but that's really down to software selection of what you WANT to run, not the hardware you're running it on. It's a chicken and the egg, but with LLM and GPU. You choose one and it decides the other.
@KeoniAzuaraАй бұрын
Still rocking the M40 with 12Gb and the NZXT water cooling bracket
@gustersongusterson4120Ай бұрын
Great video and I love the series! Though it would be a lot easier to visualize the data in bar graph form rather than just a matrix of values.
@TheRogueBroАй бұрын
Random power related question. If you were to run multiple P4's. If you turn off a VM does it "power down" the card?
@CraftComputing
Ай бұрын
All of the GPUs have idle power draw, because they're still being used by the host. There is a host driver for monitoring and partitioning the GPU. The P40 and P100 were around 12-15W. The P4 was closer to 8-10W.
@calebgrefe8922Ай бұрын
I get so excited thinking a itx gaming build with the p4 = )
@michaelstowe3675Ай бұрын
Good choice on the beer! Local to me!
@CraftComputing
Ай бұрын
Quilters Irish Death is one of my top 20 beers. So good!!
@LeetaurenАй бұрын
ai benchmarks for home labs are relevant. Please, include some.
@sjukfanАй бұрын
Hm... is there a x16 to x8/x8 splitter with externa power that can drive two 75W cards? Then you could run two P4s in a x16 😛
@gabrielramirezorihuela69359 күн бұрын
The tiny Tesla is hilarious.
@jamb312Ай бұрын
Have a couple of Quadro t400s for Plex and VM glad I got a P4 as it's been a powerhouse for running LLMs, recognize, etc. By the way, Epyc 7302 is what I'm running, and I love it other than being a little heater. Iron Horse Brewery has its main staples, like the quilter’s Irish Death, but they play with many others. I was up in their tap room last week, and the cookie death was only $3.
@Yuriel1981Ай бұрын
I think the main problem with switching to an Epyc platform is finding a board that can accommodate the 8 gpus. The best and most affordable option I see on eBay is a 7551p and an asrock EPYCD8 board with 4 PCIE 3.0x16 and 3 PCIE 3.0x8 slots. Since the last slot is a x16 you could* (if you can find a case big enough, or modify one) us a P100 or P40 that suffers the double VM afflection. But, the newer platform may make up some of the difference. Ad postings around 450, not sure if that actually includes cpu though, most similar full Epyc boards with cpu and various RAM combos can range from 500-850$. Might be more doable than you think.
@CraftComputing
Ай бұрын
There are a couple servers with a very similar design to the ESC4000 that accommodate either 4 or 8 GPUs. They're just insanely expensive.
@Sunlight91Ай бұрын
From what I've heard machine learning is best at FP16 to half the memory requirements and speed up computation. Some even do it in INT8. This means old architectures are not recommend, particularly pre Turing.
@blendpinexus1416Ай бұрын
got a 12gb 2060, am happy with it's performance and thought about getting tesla t4 gpus (the turing version of the p4) but the 12gb 3060 is also a runner up for that. similar efficiency too.
@win7bestАй бұрын
as someone who has owned a P100 and still owns a P40(24GB) i can say that the P4ß has the better expiriance, also the P100 only has 16GB and i dont think that the HB2 memory will save it.
@al.waliiidАй бұрын
what about rendering and 3d and montage time and smoth and adobe premiere pro
@montecorbit8280Ай бұрын
At 25:55 "....Better at 50 degrees Fahrenheit then 35 degrees Fahrenheit...." I remember reading somewhere that the optimum temperature for beer to be served was 40 degrees Fahrenheit....anything colder and you will "freeze out" the flavor. That information comes from a time before "artesian brews" were a thing, though. I take it this is no longer correct....or was it ever correct??
@CraftComputing
Ай бұрын
That's a very generic statement. Different flavors are better and worse depending on temperature. I enjoy IPAs starting at 35F, and letting them warm up to 50F while drinking, as you get a whole range of flavor and experience. Stouts and other dark beers are typically much better starting at 45F and letting them warm even up to room temp. Domestic Lagers and Pilsners, well, they're advertised ice cold because they're absolute garbage above 40F 😂
@montecorbit8280
Ай бұрын
@@CraftComputing I have never particularly light beer, so I was curious. Thank you!!
@xmine08Ай бұрын
LLMs are, in my opinion, becoming a huge thing in homelabs. For everyone? No, but then, many homelabbers have maybe two raspberry pi's and yet videos like yours exist where you have full blown real server hardware (Albeit old and thus affordable). I appreciate your honesty however that you don't want to produce numbers that you don't feel qualified for!
@LinHolcombАй бұрын
I still love to see tokens /sec running a few mainstream LLMs. I run 2 P40 in a AMD 5950 64Gb ram, truely the processor is not used to its potential. Going to send you a pickle beer.
@Majesticwalker77Ай бұрын
Thanks for keeping the info within your knowledge, I definitely appreciate it.
@ICanDoThatToo2Ай бұрын
I've been learning LLM on my R720, and found something interesting: My old 1050 Ti 4GB card runs AI about 3x faster than all 16 CPU cores together (2x E5-2667 v2 chips). While neither of those options are fast in absolute terms, and the RAM is very limiting, basically _any_ GPU is better for AI than CPU only.
@user-pj3jn6vg3qАй бұрын
I used a P4 o Proxmox for Ai VM. Just good to "build" your VM which is a very long journey. Getting all lib drivers and venv. Once done I quickly understood self hosted Ai is all about trial and errors and waiting for the P4 became painful. Also very very important: vGPU builds are ok for gaming but NOT for Ai. It's nearly impossible to get cuda working while using vGPU. At least not with these homemade setups. Without saying Ai is all about vram and vgpu vram split as direct dramatic impact. I ended up removing all vGPU setup and sticked to PCIe passthrough, that was the only way to have multi-purpose home server for both gaming VM and Ai VM.
@VinnyG919
Ай бұрын
exllama runs fine on vgpu here less than 10% overhead loss
@KHITTutorialsАй бұрын
they have less cores, but the E5-2687Wv4 come quite close to desktop gaming chips. Most likely it will help with the "bottleneck", but will impact how many machines you can run. But would be interesting to see what improvements come from it
@MiG82auАй бұрын
Surely there's a mistake in the Fire Strike results? The P100 x2 and P4 physics and combined scores are higher than P40 and single VM P100.
@ccleorinaАй бұрын
I've been waiting for P100 or P40 setup and guild vgpu. since I stil cant get it run with proxmox 7 or 8. Still wait for new vgpu guide.
@insu_na
Ай бұрын
What problems are you experiencing? I've been running proxmox with p100 vgpus for a year and p40 vgpus for months
@drakkon_solАй бұрын
I have my P4 sitting in my PE-T110-II, as my decoder for Plex. (My PE-T110-II is my NAS, plex, MC, BeamNG server. Total cost for this 32tb server: $200 CAD)
@mrsrhardyАй бұрын
The cards dont have video-out, so you need onboard GPU (say intels) so how do you get windowsOS 10/11 to use the gpu for gaming (assuming steam)? I know intelsQsync is good but in apps like DaVinci Resolve can the nVidia-GPU alternative be selected? I ask becuase I know you use these often in VM enviroments and do passthrough for HW/GPU support so obviously its slectable from a Lev1 Hypervisor but what about plebs like us mere mortals with a ssf-desktop with intergrated intel graphics, is the P4 a nice affordable boost or more trouble than its worth?
@daidalosczАй бұрын
Would love to see how you set up sunshine+moonlight next. especially on headless systems, with no GPU output.
@hi-friaudiomanАй бұрын
Oh baby, he's dropping the E5 v4's! we gaming now boys!
@masoudakbarzadeh8393Күн бұрын
I buy tesla k80 , and rx580 , can i use the same time power?
@logan_kesАй бұрын
I just got a pair of p40’s in last week and have begun benchmarking them on my Dell 14g servers running skylake and cascade lake xeons, I might throw them in my old 13g with broadwell xeons to see if the performance of scalable makes a noticeable jump for the massive price increase of the platform in a “cloud gaming” situation
@dualbeardedtechАй бұрын
In regards to your commentary on benching AI... Well said my friend!
@AlexKidd4Fun
Ай бұрын
I agree. Much respect for not presenting benchmarks for AI until you feel comfortable understanding what you're presenting. 👍👍
@mastermoarmanАй бұрын
I wonder how well the three works with transcoding for plex/jellyfin and running project code ai for security camera image recognition
@OMGPOKEMON47Ай бұрын
Was the P4 tested at PCIE x16 or x8? If I understand correctly, using 8 of the P4s (vs. 4 double slot cards) on this server would result in each slot running at x8 bandwidth.Not sure if that would make a difference in gaming performance 🤷‍♂
@StevenWilliams-lb9tfАй бұрын
Jeff, have you tried the rtx 4000, ive thought of getting one as it claims to be close to an rtx 2080 mobile on quadro wiki, but tech powerup claims its more a rx 6600. im thinking, do i save for the rtx 4000 or just get a p100. Single slot at 160w vs half the price at 250w thanks
@cgrosbeck19 күн бұрын
Do you have a how to setup with your hardware. Specifically OP system drivers network to terminals like raspberry pies
@k9man163Ай бұрын
Would you be intrested in testing these cards for local LLM preformance? Im curious what impact the HBM2 memory will have over the DDR5.
@zr0dfxАй бұрын
I’d like to see an update on the we home server you did in that pre jonsbo style case! I made a very similar build but used LSI 9300i and 10gb m.2 adapter with TrueNAS scale (could not get pcie pass through to work either)
@carbongrip2108Ай бұрын
How did a single Volta GPU perform when running 2x VM’s? We know you tested it 😉
@CraftComputing
Ай бұрын
Volta coming shortly ;-)
@SoftwareRatАй бұрын
Old GeForce NOW instances used the Tesla P40 shared between two instances
@aaronburns2858Ай бұрын
Do you think lga3647 machines are relevant? I just ended up with a supermicro x10spm-tf and a Xeon gold 6232. I got it dirt cheap and curious if you think it’d be good enough to run a couple machines for the kids to play Minecraft and me to run a few other games(fallout,cyberpunk, hogwarts legacy. And mostly old titles)
@Seventeen76Ай бұрын
Is a used second gen threadripper, good for machine learning, ai? I was considering hooking up a system with one, or trying to get an epyc Cpu. Are those CPUs any better than just regular ryzen, for that intended purposes?
@JoshWolabaughАй бұрын
I might have to drop a P4 in my dell r720 and give it a go. thanks jeff
@matthewsan4594Ай бұрын
as people may use the cards for other things like: video editing, conversion, and animation. could you please do that sort of testing as well??
@forsaken1776Ай бұрын
I've watched many of these types of videos not to mention most of your other vids. what i'm not sure about is how your vm's are set up. Are your VMs just a vm of windows with the game(s) installed or is there a way to directly install the game in a vm without the overhead of windows or linux OS?
@cyklondxАй бұрын
You should disable ecc vram on either of those cards; on p100 with ecc enabled it suffers some 30% of performance.
@frankenstein3163Ай бұрын
Littel off subject. How do you send the cloud gaming around 200 ft ?
@DanielPerssonАй бұрын
I have benchmarks for newer cards. If you want to Collab on a video about AI inference or training I could help out.
@spicyandoriginal280Ай бұрын
I know that you can’t test everything but I would love to know if 5C/10T makes a noticeable improvement? It opens up the possibility of a 6 x P4 system with dual 16 core Xeon (2.6 GHz Base Clock).
@robe_p3857Ай бұрын
Looking forward to AI benchmarks. Trying to decide whether to be creative or just grab a 5090.
@blehbop4268Ай бұрын
Would you be able to test your store of GPUs, both gaming and professional, with BOINC GPU tasks with power consumption and production in mind?
@haylspaАй бұрын
can you put tesla p40's or P10's in SLI with a Titan XP or X ??? this is a question I have because I am building a Godlike MSI x99 platform
@CraftComputing
Ай бұрын
No
@haylspa
Ай бұрын
@@CraftComputing Thank you! have a blessed day!!
@VinnyG919
Ай бұрын
you may be able with different sli auto
@ronaldvanSluijsАй бұрын
I have a Dell r730 with a recently bought GRID K2 card in it and have been struggling for ever with it. I have it recognized in proxmox and on my windows server 2019 vm. It is also showing up in plex as a transcoder option. But some how plex does not seem to wanna use the video card and transcodes with the CPU instead. I see you have allot of experience with this, did you find a solution to this with your previous build?
@Agent_ClarkАй бұрын
Where and how might I get more information on a server like this. I'm interested is building one but only have experience with mostly consumer hardware.
@kenzieduckmooАй бұрын
I support your new channel Cookie Computing
@CraftComputing
Ай бұрын
Today's show is brought to you by the letter "C"
@mrsittingmongooseАй бұрын
Is the stuttering in every single game just the video? Or are they actually that stuttery?
@theroyalaustralianАй бұрын
The 1080Ti is THE GOOOOOOOAAATTTT... ITS THE GOOOOAAATTT.
@cyklondxАй бұрын
Hi, disable ecc memory on P100
@user-rk3uu5ge5yАй бұрын
Feed me Seymore, Feed me.
@FaithyJo
Ай бұрын
Feed me all night looooong!
@AwSomeNESSSАй бұрын
Now I’m wondering how these run on Chinese X99 with Turbo Boost Unlock on Xeon V3 CPUs. E.g., a 2699 V3 runs at 3.2-3.4ghz TBU full load, 18c/36t would run at 4c/8t x4 equivalent machines with 2C/4t to spare for the bare metal OS, with 128GB = 28GB for the VMs and 16GB for the bare metal. Could have the base system up and running for ~$450ish + the cost of GPUs.
@CraftComputing
Ай бұрын
kzread.info/dash/bejne/oJuLwaiCZLO2is4.html
@AwSomeNESSS
Ай бұрын
@@CraftComputing Man, that’s quite a throwback! Peak of when you were reviewing Chinese parts every few videos. Hopefully Turing comes down in price in the next couple of years, would be interesting doing a revisit with more GPU grunt down the road. Top-end Tesla/Quadro Turing is still ~$2000CAD.
@CraftComputing
Ай бұрын
No idea why Turing GPUs are still so expensive. You can snag an A5000 for less than $1200 for 2x the performance of an RTX 6000.
@AwSomeNESSS
Ай бұрын
@@CraftComputing that is weird. Must be connected to contract pricing or the like (e.g., not enough Turing supply has hit the market yet). Probably can expect it to bottom out as more companies move to Lovelace/Hopper/Blackwell. A single A5000 + Chinese X99 V3 setup would be an interesting proposition for an all-in-one server: 2 8c/16T VMs with 1/2 an A5000 + a 2C/4T promox server. Add-in a cheap A310 for Plex and you’d have a decent home lab setup started.
@CraftComputing
Ай бұрын
I've got a pair of A5000s, and you'll be seeing them shortly here on the channel ;-)
@ewenchan1239Ай бұрын
There isn't a standard way of benchmarking GPUs for AI that's meaningful for homelabbers. You can run the HumanEval benchmark for example, but the score is practically meaningless (as it is use moreso for benchmarking the MODELS rather than the hardware that said model runs on).
@PCsandEVsАй бұрын
Love your work Jeff thanks!
@rklaucoАй бұрын
Maybe stupid question - when you calculated the price, did you include the license of Windows into it? I am not sure if my information is correct, but I thought you need special (and quite expensive) Windows 11 license to run it in VM. But it's possible I am wrong and there is some option to get it without the $100+ license...
@CraftComputing
Ай бұрын
When I'm running tests like this, I often run Windows without a license key. No sense purchasing a Windows license for a VM that won't exist in two months. For long-term deployment, grab an OEM license key. They're possible to snag for $10-15.
@rklauco
Ай бұрын
@@CraftComputing I thought these OEM keys are not in line with MS licensing and their license (while technically working) is not allowing you to virtualize the machine and should only run on bare metal. But again, not windows licensing expert.
@spotopolisАй бұрын
With how old the P4 is at this point, how would an Intel Arc A310 stack up to it? Its half the VRAM, but its clock speeds are double that of the P4. Do you think the lower powered card with newer architecture would have a chance?
@CraftComputing
Ай бұрын
Oof... The A310 and A380 don't hold up well for rasterization performance. They absolutely win when it comes to video encode/decode though. Depending on your needs, they're a solid option.
@elpanaquteАй бұрын
are you still using the v14.0 nVidia vGPU Drivers like it says on your text file on google drive? Bc i'm having trouble with this configuration: P40 proxmox 8.2 (kernel 6.8.8) Linux nvidia driver 17.1 (550.54.16) patched and xml replaced from the 16.5 (as it says on the polloloco manual) mdevctl profile 52 (12Q) to this point everything fine, the problem: on the windows vm, the only driver that works its the rtx/quadro 552.55 and it's limiting to 15fps after 20 minutes. what am I doing wrong?
@CraftComputing
Ай бұрын
No, I'm using 16.4. Different versions of the GRID drivers will only compile on specific kernel versions. Check out the link in the description to the Proxmox vGPU Install script. It'll set up everything automatically including drivers.
@elpanaqute
Ай бұрын
@@CraftComputing For some reason the first time I tried the script, two days ago.. was a complete failure. But now tried again on a fresh install of pve 8.2 and went flawlessly. Thank you so much.
@TheAnoniemoАй бұрын
How were the temperatures on the P4? I know they have some very specific airflow requirements due to the small restrictive heatsink.
@CraftComputing
Ай бұрын
This server is specifically designed for passive GPUs. The P4 ran at ~45C. The P40 and P100 ran between 55-62C.
@TheAnoniemo
Ай бұрын
@@CraftComputing thanks for the reply, I was wondering because I know we had some issues at work when installing a single T4 and no other expansion cards. The perforated back of the chassis provided too little restriction so all the air just went around the T4 instead of being forced through. It would subsequently throttle like crazy...
@pidojaspdpaidipashdisao572Ай бұрын
I always had only one question for you, why do you drink beers (or whatever that is) from a glass? Why not the bottle or the can in this case? I feel like a less of a man when I drink it out of the glass.
@CraftComputing
Ай бұрын
Glossing over the strange identity crisis you seem to be having, a glass let's you smell the beer far better than a can or bottle. Secondly, pouring a beer with a head brings out more flavors. A nucleated glass also helps refresh the head, making your beer more enjoyable longer. As for your latter comment, I think it's queer to let other's opinions of you define your identity. Next time you're at a bar, order that Cosmo you've always wanted.
@pidojaspdpaidipashdisao572
Ай бұрын
@@CraftComputing Making a science of an orange juice that you drink, mfw. Nobody defines me, we all know who drinks out of a glass. What is Cosmo?
@CraftComputing
Ай бұрын
Who drinks out of a glass?
@RedneckRestoАй бұрын
8 P4 FTW
@jasontechlordАй бұрын
Looking at possible AMD solutions and it seems all those cards have just enough power to render the crickets of the AMD server card market.
@CraftComputing
Ай бұрын
I've got some AMD GPUs, and will be covering them in an upcoming video as well.
@Adam13069425 күн бұрын
Just put in there two 2696v3 (for $50-70 a piece), unlock them and have ~3.6GHz clocked CPUs, with the same 72 threads?
@CraftComputing
25 күн бұрын
The unlock is still power limited. Under full load, the CPUs would still likely struggle to hit 2.8GHz or higher.
@Adam130694
25 күн бұрын
@@CraftComputing I’ve seen them hitting 3.5-3.6 in games quite frequently… but you being someone with higher experience I believe you tested that. Good job anyways and as always!
@yokunz1837Ай бұрын
Can i run Tesla m40 on bluestack or ldplayer?
@playeronthebeatАй бұрын
Will you do one more video for Turing/Volta cards (essentiall 20xx Series), too, or are those still out of reach (budget wise etc)? Would be interesting to me if they're not too expensive.
@CraftComputing
Ай бұрын
Yep! I've got some V100 and A5000 GPUs lined up. Not sure if I'll cover Turing, as those are prohibitively expensive still.
@playeronthebeat
Ай бұрын
@CraftComputing ah. That's unfortunate. Would still love to see it, honestly. The V100 doesn't seem too expensive on their own. Still, they'd definitely strecht the budget quite a bit going for ~€700€ here for the 16GB SXM2 and roughly 1k more for the 32GB SXM3. For someone like me toying with the idea of having at max one or two systems on there, it'd be quite cool. But eight systems (4 GPUs) could be a bit harsh regarding the price.
@LetsChess1Ай бұрын
If you want to do AI bench mark stuff and want to learn about how to accurately benchmark them you can hit me up. It’s what I do.
@0mnislash79Ай бұрын
No fighting game test to also see the input lag with 2 VM 😟
@adamtoth9114Ай бұрын
Send me those cards and I'll give you some step/sec results in tensorflow training 😃. Same dataset, multiple runs with different batch sizes for each card. I used a K80 lately for this and I have a well established test environment in docker for it.
@pachete.Ай бұрын
It's cool, but I don't have the need to get a tesla gpu. I think my 4650g server is enough for me.
@UntouchedWagonsАй бұрын
Does anyone else lose track of which GPU he's talking about at times?
@CraftComputing
Ай бұрын
I tried to go over benchmarks in the same order each time. P40 -> P100 -> Dual VM -> P4.
@HerrFreese
Ай бұрын
My problem following was that I forgot about the main goal of best performance per pcie-slot and minimum requirement of 8 VMs per 4 dual pcie-slots. I had to rewatch some parts to understand why the p4 could the best choice. The performance per Graphics-Card was informative but also I lead me to compare the wrong numbers. Might be perfectly my fault for not listening at the important part.