Two Minute Papers
2 жыл бұрын
318,762
1

NVIDIA’s New AI: Wow, Instant Neural Graphics! 🤖

Ғылым және технология

❤️ Check out Lambda here and sign up for their GPU Cloud: lambdalabs.com/papers
📝 #NVIDIA's paper "Instant Neural Graphics Primitives with a Multiresolution Hash Encoding" (i.e., instant-ngp) is available here:
nvlabs.github.io/instant-ngp/
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bryan Learn, Christian Ahlin, Eric Martel, Gordon Child, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: / twominutepapers
Thumbnail background design: Felícia Zsolnai-Fehér - felicia.hu
Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: discordapp.com/invite/hbcTJu2
Károly Zsolnai-Fehér's links:
Instagram: / twominutepapers
Twitter: / twominutepapers
Web: cg.tuwien.ac.at/~zsolnai/
#instantnerf

Пікірлер: 914

@dissonanceparadiddle2 жыл бұрын
This is going to make photogrammetry so much easier
@ET_AYY_LMAO
2 жыл бұрын
I hope so, imagine being able to take a few images of a place and instantly be able to walk around virtually in this space. You could couple it to image search services and just type a location to find third party images, and be there.
@Qubot
2 жыл бұрын
Google Earth can be very enhenced with this method.
@dissonanceparadiddle
2 жыл бұрын
@@ET_AYY_LMAO imagine if vr headsets used this to finally make ar truly be able to interact with the environment. It's so fast that just a quick walkthrough would make a 3d map. Plus you could have it run in the background if there's any new positional info that needs to be added as you move
@WangleLine
2 жыл бұрын
I really hope so!! I hate how long it always takes to process my inputs
@dissonanceparadiddle
2 жыл бұрын
In fact with a few cameras something like this could make a 3d video phone call finally work
@sqworm53972 жыл бұрын
Great video! It would be amazing if you had a second channel titled "Twenty Minute Papers," where you go more in depth on topics that interest you.
@TwoMinutePapers
2 жыл бұрын
You are too kind. Thank you so much! 🙏
@DonC876
2 жыл бұрын
Yeah i have been thinking the same lately, that i would love to dive deeper on some topics. @Two Minute Papers i think this is something you should seriously consider as a second channel - big fan of your work :)
@brightmatter
2 жыл бұрын
just saying, I'd subscribe to that.
@guillermojperea6355
2 жыл бұрын
@@TwoMinutePapers Karoly, we expected you to tell if you'd do it!
@v-sig2389
2 жыл бұрын
@@guillermojperea6355 a second channel with in-depth analysis of papers, which is a whole new huge project with hours of work for each episode ... yeah let's decide in a few seconds and announce it in a comment's reply 😂 Btw check the chanel's playlist, he has courses !
@draco63492 жыл бұрын
This is incredible. It's not real-time raytracing, it's an AI literally just eyeballing it. And it's MORE real-time than the regular methods. Can't wait for this kind of rendering to make it into games and simulations, it would outperform anything ever seen before.
@Dimension_eleven
2 жыл бұрын
Introducing ai vision to settings in fps games 😂
@draw4everyone
2 жыл бұрын
Imagine how much this will streamline workflows for 3D graphics designers! You’ll have updates to adjustments in SECONDS
@lorimillim9131
2 жыл бұрын
It also begs the question and possible reality or realities that if an AI can generate graphics on demand in this contextual sense how do you know what you see is actually what is represented before your eyes compared to something that's shown to another entity? Not taking any position just wondering what if?
@yellowblanka6058
2 жыл бұрын
@@lorimillim9131 Yeah, that has pretty frightening implications for media/the legal world.
@FunnyVidsIllustrated
2 жыл бұрын
@@lorimillim9131 I assume like we saw with deep fakes that a counter-ai trained specifically on the pitfalls of the technology in question gets rolled out at about the same time to counteract misinfo
@halko12 жыл бұрын
I’ve used this about ten years ago and … It was labour intensive, took ages to complete and results were sketchy at best. Seeing where we are today blows my mind.
@brexitgreens
2 жыл бұрын
That was what I've just elected to name "classic photogrammetry". Back then, there was _no other,_ and AI was still barely something more than academic curiosity. Unless you count stuff like OCR as AI. I'm talking about the distant past of four years ago.
@AlvaroALorite2 жыл бұрын
A lot of people are mentioning computer graphics as an interesting application, but are missing the bigger picture: This is using neural networks, and it's replicating a 3D environment from limited input, which is VERY similar to what our brain does (dreaming, for example)... This is amazing for neuroscience.
@jerchongkong5387
2 жыл бұрын
Neh, its amazing for rule 34 artist, imagine the possibilities. ( ͡° ͜ʖ ͡°)
@unintentionallydramatic
2 жыл бұрын
This is an incredibly important point. On that note. Let's also not forget that this means you can hyper-accelerate drug design from first principles based on receptor shape. There's already algorithms that can design molecules with a certain shape and algorithms that can search for a synthesis pathway. So you could theoretically feed the machine a series of images of a receptor and get a recipe for a drug targeting it out the other end.
@chrisray1567
2 жыл бұрын
It’s not just in our dreams, our brain creates a 3D environment while we are conscious too. Our eyes are 2D sensors. It’s our brains that combine that information into a 3D experience.
@stevenrogersfineart4224
2 жыл бұрын
100% FMRIs can already get enough data to discern if someone is thinking about a building/person/ animal etc. Until now the resolution was bad. If AI fills in the extra data reliably , we are not far from mind reading/projection :P
@rxtr664
2 жыл бұрын
" which is VERY similar to what our brain does" - Not really. Our brains probably don't need to "reconstruct" a 3d environment. It's already perceived to be a 3d environment, no need for "reconstruction"
@Rezmason2 жыл бұрын
This channel will live to see the day when the pace of progress in this field will exceed the speed of publishing and paper discovery. The format may have to switch to a statistical approach that samples results from multiple simultaneously published papers to depict the state of the art. aka "paper transport" Then Nvidia will publish work on a hardware accelerated paper transport resolver that produces "perf"s at a rate faster than this channel. The papers will move so fast we won't be able to hold onto them
@Supreme_Lobster
2 жыл бұрын
This is the paper singularity
@nowymail
2 жыл бұрын
There are already neural networks that can learn by reading papers. And other neural networks that can compose videos. Not much more is needed.
@NotASpyReally
2 жыл бұрын
This is funny but could also end up being true WHY NOT
@THEMATT222
2 жыл бұрын
Very Noice 👍
@Peter-ik9fz
2 жыл бұрын
Two Minute Papers: a better paper in every 2 minutes 😲
@koendos32 жыл бұрын
Imgine rendering only 10 frames of a 100 frame animation. And then feeding it into this new AI. You'll finish your render 10x faster. Thats amazing!
@zynius2 жыл бұрын
In a few years when this can run at 60hz+, all you need is a few cameras in a space and you'll be able to use VR to insert yourself in that location :O That will be completely bonkers!
@krajsyboys
2 жыл бұрын
As I understood it, it takes a couple of seconds to create the "render" of the scene but when it's done it is in fact running at 60fps. So I guess you can just have a loading screen or something before you get to see anything
@lorimillim9131
2 жыл бұрын
Imagine ditching the equipment too and having the AI embodied?
@XZYSquare
2 жыл бұрын
could even be next month lol
@michaelleue7594
2 жыл бұрын
This NOT creating a 3d environment, or even a single 3d object. This is creating a smooth track of 2d images. It's a very cool technique, but if you're imagining a game where you can do anything more than ride a roller coaster or something like that (and not change the direction of your camera) then this won't be applicable to that game.
@Mike7Lof
2 жыл бұрын
Yes, with one remark: it will be done with ONE camera.
@GierlangBhaktiPutra2 жыл бұрын
I am eager to see the practical application to be available. As an architectural historian, it would help documenting architectural heritage much easier with more simple equipment!
@Adhil_parammel
2 жыл бұрын
But there is better laser scanner now to scan whole building. I have seen an episode about that in net geo
@virutech32
2 жыл бұрын
@@Adhil_parammel lasers are cool. Getting a near-perfect render with a phone camera & 2 minutes is objectively better. Especially if the site is hard to get to or not very secure. even if the laser thing is higher quality the lower cost & higher accessibility of this technique would still be mighty usefull.
@stub42
2 жыл бұрын
Careful. Remember that what isn't in the source photos has to be made up. Great for many applications, but not so great for archival and historical research. Is that the actual gargoyle , or invented from training data from all periods of history?
@BHBalast
2 жыл бұрын
@@virutech32 Also one could use small lightweight drone to take photos of places where using laser scanning is impossible.
@BHBalast
2 жыл бұрын
@@stub42 It's a valid argument, but the same is true for a laser scanning, and it requires human clean-up. After photo scanning with this net there also would be validation process.
@danczer12 жыл бұрын
Is it possible to apply this on a video footage? It would be mind blowing to have this in a VR video player. Current VR video players play a video projected inside a sphere. Which not a real 3D, because once you start moving your head in 6DOF it breaks the immersion. But having a dynamic 3D mesh which would be used for the video projecton instead of a sphere, would be mind blowing!
@spyral00
2 жыл бұрын
the are so many applications to this, in VFX too
@cbuchner1
2 жыл бұрын
If that‘s the case, we‘ll soon be able to walk around in movie sets and possibly assume the role of a character, just like the book Ready Player One predicted.
@Danuxsy
2 жыл бұрын
@@cbuchner1 bruh Imagine being able to see boobs from other directions like that 🤯
@lucaspedrajas5622
2 жыл бұрын
@@Danuxsy and it it's gonna be the most rentable application of it
@georgri
2 жыл бұрын
I'd argue that machine being able to interpolate between photos is still far from understanding the actual geometry and synthesizing the scene from ANY point of view, as VR does require.
@WikiSnapper2 жыл бұрын
Can you imagine being able to apply this to table top gaming maps and have the ai fill in the game world as you zoom in!
@Robert_McGarry_Poems
2 жыл бұрын
And then turn it into a photo realistic image. What a time to be alive!
@Ginsu131
2 жыл бұрын
What exactly would be interesting about that?
@WikiSnapper
2 жыл бұрын
@@Ginsu131 A lot of time can go into making the details of a map in table top rpgs it would be awesome to have an AI be able to fill in that detail as it would speed up production and take a lot of mental energy off the GMs.
@ep1cn3ss22 жыл бұрын
This is unbelievable. Phenomenal work, can't wait to see the applications! Especially in photogrammetry.
@sigmata02 жыл бұрын
Absolutely extraordinary. My first thought was that "Now we know what those old CSI shows were using to zoom into their videos to find important details" Haha Yet also, does this mean from a couple of photos we can now create 3D models which could plausibly be printed? Effectively advanced Photogrammetry?
@EVPointMaster
2 жыл бұрын
As cool as this tech is, you'd have to be very careful if you wanted to use it like this. The AI doesn't know the truth, it's just guessing one possibility of what could be true, based on the limited amount of information it was given.
@sigmata0
2 жыл бұрын
@@EVPointMaster I assume you mean the CSI idea. Yes that is a joke. With regards to the 3D printing functionality I assume you'd still need to do some work to get a useful mesh. However it looks really close to something that would be useful.
@ge2719
2 жыл бұрын
@@EVPointMaster youd like to think that enhanced footage was never used in criminal cases, but just look at the kyle rittenhouse case, the prosecutor submitted a frame that was an interpolated and upscaled image from a video. in order to try to make a very specific claim of kyle rittenhouse doing a specific thing. for one frame. it was allowed as evidence and even enhanced it was a blurry mess that showed nothing specific, but the prosecutor was allowed to claim it showed him aiming a gun at someone.
@fuchsiebabe2 жыл бұрын
This has the potential to revolutionise filmmaking and visual effects. Awesome!
@markwood18552 жыл бұрын
Thank you for making Two Minute Papers. Your excitement and enthusiasm is infectious! And I always find myself getting more and more enthused as I watch your videos and see the pace of progress. It's nice to see a channel that just... Makes me feel that the future can be bright.
@J0R1AN2 жыл бұрын
NVIDIA has been going crazy with these AI papers
@rafqueraf2 жыл бұрын
Imagine what cheap GPUs will be able to do in the future
@SnrubSource
2 жыл бұрын
nothing because new cheap GPUs won't exist anymore, they're all going to keep costing $700+
@ChuckSploder
2 жыл бұрын
@@SnrubSource 3090's will be cheap
@victorius2975
2 жыл бұрын
@@ChuckSploder trust me it'll stay the same and newer gpu's only get more expensive
@wojciechbem8661
2 жыл бұрын
Mining bitcoins I suppose.
@silly_lil_guy
2 жыл бұрын
My gt 1030 can run tetris! In 10 FPS... I meant 10 Seconds Per Frame
@jonnyhifi2 жыл бұрын
Astounding - I can see 3d scanning apps / software will pretty soon become “trivial”on phones etc. … which itself is astounding never mind all the other stuff !
@hondajacka22 жыл бұрын
Crazy. Can’t wait to see applications of this come out.
@jamesabell94942 жыл бұрын
Amazing! Thanks for your videos, they really keep me up to date with visual AI.
@tiefensucht2 жыл бұрын
Making a game in the future: "Hello Siri, create me a game that plays like Doom 12, but with Disney-Characters that look like musicians that are currently in the top 20 charts and this all should happen in Tokio at daytime, raining. End boss should be a giant paper."
@magen6233
2 жыл бұрын
that's what OpenAi Codex and Github Copilot are doing (not at this level, but they are quite good)
@dilonardomultimediaproductions2 жыл бұрын
There are a lot of amazing two minutes papers, but only a few standalone software are available for this (like the ones from Topaz Labs). When will this technique be available easily for everyone?
@B0A0A
2 жыл бұрын
It can take several years to ten years. No matter how great the performance is, if it is a specialized single function, there is little motivation to offer it as an easy-to-use application. If this feature can be further developed and used for all kinds of surveying on construction sites, people's productivity will be visibly improved.
@brexitgreens
2 жыл бұрын
@@B0A0A Why the society cannot simply into Kickstarter some developer?
@GunwantBhambra
2 жыл бұрын
@@brexitgreens Nvidia don't need Kickstarter they will loan the tech to game devs to earn royalty
@mustardofdoom
2 жыл бұрын
The method is already available on GitHub. For commercial use, the authors say that Nvidia should be contacted. So it is a matter then of going through a sales process to generate interest between stakeholders, agreeing on pricing, and going through legal. Only then would a commercial agreement begin. And if the goal was a graphical interface you'd have to give ample time to develop this, bug test it, and run perception tests to ensure that it is user-friendly. It altogether takes a while and can explain the lag between the newest results and easy-to-use graphical programs like Topaz Labs. I work on a sales team for a commercial scientific image processing software. I suggest new ideas to our R&D regularly. Maybe 2%-5% of ideas are accepted and end up in the product. For these features (which are mostly low-effort high-reward due to cost considerations), it often takes 2-3 years. And that's when we already have an application and team to build the new feature in to.
@popcorny007
2 жыл бұрын
@@mustardofdoom Thanks for the detailed explanation, it really puts things into perspective
@eragon_argetlam2 жыл бұрын
This is insane. I've watched many of your videos, but this is the only one so far that *really* seems like straight up magic.
@theencore3982 жыл бұрын
Papers so fast, they won't even let ya hold them, damn. As always really informative and fun video sir.
@ollllj2 жыл бұрын
nvidia GPUs of 2021 are heavily optimized for matrix multiplication, with a mode for sparse matrices. It is mostly used for up-scaling, but can also be used as great noise-filter (audio and raytracing-denoiser) ray-tracing is also useful for more realistic audio. The more general applications of this are pretty much anything with linear-algebra, where ever you multiply 2 or more matrices, most likely rootSolvingInverse*projection.
@danielng1765
2 жыл бұрын
Sorry, havn't dive into the paper bcoz am totally noob in AI, any idea which graphic card they r using in this paper? Am bout to get a new rig with RTX 3070, but had put onhold due to the progress in AI for photogrammetry is too fast..
@technewseveryweek8332
2 жыл бұрын
@@danielng1765 rtx 3090
@ollllj
2 жыл бұрын
@@danielng1765 the RTX30xx series cards are for private house holds, nvidia also makes graphics cards for datacenters (commonly used to train ai models or for things such as sorting long prioity-lists for the internet). The server-rack architecture is similar, but the bandwidth and parallelization is much higher, and it scales the price to "millions per unit".
@ollllj
2 жыл бұрын
@@danielng1765 for the common high-tier-gamer (or indie gamedev and ai-code-learner), the playstation5 has a gpu, that compares pretty well to the GTX 2070, This is 2020 technology, significantly slower memory access, significantly worse for motion blur than 30xx cards of 2021. GTX 2070 card is currently still a relative good value (damm all the cryptocurrency scammers/thiefs) to put in a new pc, that costs up to 1100usd new. A GTX3070 is significantly better (>2,5x of a 2070), and commonly fround in new PCS that cost over 1700 usd. the "ti" suffix makes a significant difference and is not to be overlooked (commonly means: 24% faster memory, +20% power consumption +50% more cuda-cores.) This in general seems to appeal more for higher resolution, displays (up to 4k)
@danielng1765
2 жыл бұрын
@@ollllj thks for the advise, I initially decided to get RTX 3070 due to available performance comparison based on agisoft metashape. RTX 3070 has the good cost/performance balance compare to others based on their standard samples. Shall check 3070ti as well if the result is available.
@Hexcede2 жыл бұрын
This is astounding! It's so fast you could train it on the user's hardware, no need to include a trained model ;)
@Zoza152 жыл бұрын
Its always exciting to see new videos on innovations in A.I and software Karoly ✌🏽👍🏽.. Assets creation is made a lot easier and faster it seems.
@TeamJackassTV2 жыл бұрын
Man do I love everything you put out! Thanks for the time and effort you put into these videos!
@101perspective2 жыл бұрын
I wonder how long until we will have interactive movies? Where they just film from a few angles and then at home you feed that footage into VR and can move around within the scene as it unfolds.
@heliusuniverse74602 жыл бұрын
what caught my attention is the neural representation thing. Can it be used for image compression? I imagine there's lots of room for improvement from other current methods like jpeg, which doesn't really understand the image
@frenzscivola3099
2 жыл бұрын
great idea! It would also be easy to generate data. What is the state of the art on this?
@taktuscat4250
2 жыл бұрын
Nvidia maxine exist as neural video compression
@alihms
2 жыл бұрын
I believe this is what they are aiming for. Extreme compresstion for photos and video while preserving the important details. Say a video of a footballer kicking a ball inside a stadium. The important details such as the facial expression and the actual movement are preserved. Non-important details such as the field, the spectators, the roaring sound can be compressed. During playback, these non-important details are then procedurally generated. This is somewhat anolagous to how we stored information in our brain.
@eamonia2 жыл бұрын
What a time to be alive, indeed! And your documentation of these accomplishments will be preserved forever. Thanks Doc 😊
@DanielHJeffery2 жыл бұрын
Already have it downloaded! Going to use this!!!
@nemonomen33402 жыл бұрын
For the program that showed more detail when zooming in: was it an AI that filled in missing detail or was it an AI that made the high def image less “costly” by simplifying the image when zoomed out?
@Robert_McGarry_Poems
2 жыл бұрын
The AI created the detail from that one single grainy, almost black and white image. It is so good at creating the next layer that it can procedurally generate ever increasing depth of image. Pretty neat stuff.
@jessiejanson1528
2 жыл бұрын
From my understanding, and i dont think in this case he explained it vary clearly... An AI seems to take a large image and train another AI with it in order to reduce the size of the image since the new AI will then be able to fill in the details based on its training. So each image would contain its own AI or be paired with one specific to it. So long as the end result is smaller its a win. Though the explanation like i said is lacking and it would have been great to see the original file size VS the end results file size.
@outlander234
2 жыл бұрын
@@jessiejanson1528 Well thats dissapointing in a way. But it could be used for compressing data immensly.
@sharky2782 жыл бұрын
Impressive the time scaling in just one year ("O"). I'm expecting a generalized model with a "kind of 3d segmentation" to change parameters in materials or add physics for the future ...in any case this is the first step for the "synthetic" rendering. Amazing ♥️
@phmfthacim2 жыл бұрын
Mind blowing results
@gurglenurgle65392 жыл бұрын
Great video as always!
@Devoun2 жыл бұрын
2 months from now it'll be finished training a week before we even start.
@GD155552 жыл бұрын
If it can also convert it to clean quad poly it will be amazing
@J3R3MI6
2 жыл бұрын
Yes retopology and UV mapping is the most annoying part of 3D design. We are close though 🔮
@AdityaTripathi2 жыл бұрын
I can't wait for research like these to end up in game engines, truly incredible!
@Turruc2 жыл бұрын
Your passion is absolutely contagious. This is amazing!
@StolenPw2 жыл бұрын
When I was about 17 in 2011 I tried making my own version of what is essentually NERF to make buildings into 3D models really quickly just using photos
@DonC876
2 жыл бұрын
How did the results look in the end ? Would love to see that.
@URB4NR3CON
2 жыл бұрын
I remember Microsoft had a program that did something similar for popular tourist destinations, forgot the name
@StolenPw
2 жыл бұрын
@@DonC876 It kind of ended up looking a lot like how google maps does their 3D models just a lot more manual and a lot more buggy but it did kind of work
@ChuckSploder
2 жыл бұрын
@@StolenPw do you still have that program, and could you make a video of it?
@RainFox84
2 жыл бұрын
@@URB4NR3CON Photosynth
@anthonyrepetto34742 жыл бұрын
I remember hearing how, in 2019, AI was going to be in another "winter"... same thing said in 2020, 2021... but this really is just the dawning of it! :p That gigapixel compression, in particular, will be adapted to "memorizing" the input-output map of software, so that you can just use a look-up table, instead of computing values in code. It'll let us fix bugs by flipping a bit on the look-up table, without needing to find a way to "correct the software"!
@Connor3G2 жыл бұрын
Incredible stuff to say the least. Imagine having a few pictures of your old house that turn into a full VR space in a few seconds...
@Settiis2 жыл бұрын
Out of all the AI shown in this channel that has blown my mind, this is probably one of the most impressive.
@silentbob12362 жыл бұрын
I'm curious about how accurate the models are. Photogrammetry has not had good accuracy in the past, I wonder if this changes that.
@Khether00012 жыл бұрын
Would be interesting if there was a follow up showing us where these technologies are eventually available in actual commercial products But I know this isn't the scope of the channel.
@themore-you-know
2 жыл бұрын
I can already tell you: - Video games production will just be... wtf-level of workflow improvement ; - Google Earth, which already have bazillion pictures, will either be banned or transformed into a life-like simulation ; - Military analysis and transport will make use of the above (Google Earth) ; - Urbanism, real-estate construction and sales of real-estate ; - Back to Google Earth, the applications are just crazy: imagine a traffic application where you can get a life-like visualization of the traffic, making it seem so much more reliable to the viewer than a red line on the road? - I'm going nuts thinking about all the crazy ways this will affect us in our day lives. ... aaaaah, just the VR-Google Earth life-like simulation is utopian/dystopian enough.
@Instant_Nerf
2 жыл бұрын
@@themore-you-know Im working with google earth studio.. combine that with blender -and Video editing .. you can create a realistic 3D scene. The problem is close up... the textures/models are so broken.kzread.info/dash/bejne/h6aqrdWudLbHfqg.html
@uirwi91422 жыл бұрын
Utterly astonishing!
@AykutKlc2 жыл бұрын
Google Street View with this would be mindblowing.
@eelcohoogendoorn80442 жыл бұрын
Its interesting how you have completely moved away from any attempt at explaining the papers you present. I suppose that makes sense for the 2-minute format; I always scroll to the results section first anyway. But a tiny bit more depth and context wouldnt hurt. I was assuming this papers content must be all about how to leverage a whole datacenter full of GPUs in parallel; but its even more mindblowing to see their abstract mentions a single GPU... now thats a bit of detail that would really add to the presentation of this work.
@DarkSwordsman2 жыл бұрын
This has me excited for the future of video games and other simulations. I obviously an enthralled about the idea of a "Full Dive" video game like Sword Art Online. Seeing things like this, Unreal Engine 5 with Lumen and Nanite, AI in general, as well as what Gaben and Elon have been doing for neural interfaces, as he excited for a future where we can be fully immersed in whatever scenario that we want. It's definitely a pipe dream of sorts. But I can imagine a future where we have insanely detailed, low cost simulations, as well as the ability to dive into these worlds with all of our senses. It is a driving factor for me to learn more about ML, AI, and video games.
@jimj2683
2 жыл бұрын
same here. Imagine how good GTA Earth could be!
@jageshkano2 жыл бұрын
Which Two Minutue Paper video does the clip at 0:30 come from? I remember seeing it in his videos before. I was interesting in looking into it further. Thanks to whoever can help!
@sabrango2 жыл бұрын
Wow, this shows deep learning is great for large data if we teach they properly!
@hellfiresiayan2 жыл бұрын
We're at the point where it has all become magic to me. One program does all this?? How???
@laurent-minimalisme
2 жыл бұрын
Deep learning abstraction... first layers do the stuff, just plug your application on the top.
@Wertsir
2 жыл бұрын
The same way your brain does, evolution.
@Andytlp
2 жыл бұрын
Yeah they give descriptions of a scene or draw primitive zones like this blob is water, this is land, this is trees, this zone is sky etc and a.i paints a picture. That is magic indeed. One upping that is telling a.i to do a task or write a program and it does it in a matter of minutes or hours.
@prestow2 жыл бұрын
Its becoming easier to accept we live in a simulation.
@simian.friends2 жыл бұрын
I love it whenever he says ''yes''
@bnutfilloyev2 жыл бұрын
wow! Great video!
@MarkEichin2 жыл бұрын
I'm not completely clear on what the gigapixel-image one is actually doing - taking a gigapixel image, building a model, and then keeping only the model (which is smaller? how much smaller?)
@mynameisal72 жыл бұрын
So could we use something like this to create a real time driving simulator? The AI could use the input data from something like google map's street view and edit it into an interactive 3D environment.
@noobcaekk2 жыл бұрын
Wow this is INCREDIBLE! I mean, there's just no comparison between the 2-month old and 1-month old papers. Unbelievable how crisp and smooth everything comes out to be.
@1MarkKeller2 жыл бұрын
This is incredible!!! The potentials in VR, AR alone ...
@Sven_vh2 жыл бұрын
Hey, so I'm kinda new to this whole AI enviorment but this looks amazing! Is this public? Like can I upload some pictures and the program maken an 3d object out of it?
@maythesciencebewithyou
2 жыл бұрын
There is a link in the description to it.
@brightmatter2 жыл бұрын
So I am curious; can the AI transition past context knowledge to a current context knowledge, and retrain on the fly? For instance, if you see a painting of a woman and contextualize it as 'old painting of woman with no background'. Then, as you zoom in switch the context to the current frame and re contextualize 'old painting of woman's head'. So portrait -> head -> face -> nose and eye -> eye -> pupil -> retina -> optic nerve -> light sensing cells -> cell wall -> DNA At some point you are only providing that last context to the AI to create the new image from pre-trained understanding. i.e. you reach a point where you are not caring about the reference material anymore.
@Robert_McGarry_Poems
2 жыл бұрын
I got that from this explanation. The layer that it produces is good enough to use as input. So, yeah at some level it is creating images based on pixels that it generated.
@ge2719
2 жыл бұрын
i imagine its trained based on images that exist, and learns what detail should exist in such an image, and then fills it in. so you can zoom in on a city because it knows what buildings, cars, roads, people look like. but it wont understand that zooming on on a persons face would reveal skin cells, unless it was given high resolution photos of skin, that transition to the microscopic level to reveal cells. also, if its given context that its a painting, at some point you would want it to assume its zooming on on oil, not assume that the artist was able to paint in individual cells in the painting because its a painting of a face.
@lelsewherelelsewhere94352 жыл бұрын
USE THIS ON THE PATTERSON BIGFOOT VIDEO! (The classic one for the 1970s)
@jmendezsj2 жыл бұрын
Another great paper. Amazing!
@wagnercs2 жыл бұрын
Great video! Thanks for all work! However…….. Is it possible for people outside the research team test it? Are all of these available for us mortals? Can we test this? Or these will only go to Nvidia? Sorry for the dumb questions…
@thegeekclub8810
2 жыл бұрын
The code is on GitHub! Dunno how hard it would be to actually get it running, and the GitHub page says you need a Nvida graphics card, but it is available to the public if you know what you’re doing!
@daanhoek1818
2 жыл бұрын
@@thegeekclub8810 I got it running just now. I'm on a GTX 1080 and it runs quite slow, but I can train the fox one and look around in low res. Works pretty well. Setting it up is quite impossible if you don't know what you're doing. But you can always try. I tried training one of my own datasets but it throws an error. Still working on that. edit: The GTX 1080 is probably much slower because it is not RTX . the Geforce RTX line of cards have tensor cores which are more optimized for this kind of job and my 1080 has none.
@Piyush10129
2 жыл бұрын
@@daanhoek1818 can you please share the link?
@wagnercs
2 жыл бұрын
@@thegeekclub8810 Thanks for the information!
@tobiascornille2 жыл бұрын
Seems super cool! Don't fully understand what the technique is doing exactly (in terms of which inputs and outputs), though.
@skarfie123
2 жыл бұрын
I find this with most of his videos
@georgri
2 жыл бұрын
It interpolates between given photos of a scene.
@Pixelarter
2 жыл бұрын
Basically they developed a multiresolution input encoding that simplifies and allows to highly parallelize the task, taking full advantage of the GPU. They apply it to different techniques (NeRF, Neural Gigapixel Image, Neural SDF, and Neural Volume ). From the paper website: *_Instant Neural Graphics Primitives with a Multiresolution Hash Encoding_* _We demonstrate near-instant training of neural graphics primitives on a single GPU for multiple tasks. In gigapixel image we represent an image by a neural network. SDF learns a signed distance function in 3D space whose zero level-set represents a 2D surface. NeRF [Mildenhall et al. 2020] uses 2D images and their camera poses to reconstruct a volumetric radiance-and-density field that is visualized using ray marching. Lastly, neural volume learns a denoised radiance and density field directly from a volumetric path tracer. In all tasks, our encoding and its efficient implementation provide clear benefits: instant training, high quality, and simplicity. Our encoding is task-agnostic: we use the same implementation and hyperparameters across all tasks and only vary the hash table size which trades off quality and performance._ _Abstract:_ _Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920x1080._
@tobiascornille
2 жыл бұрын
@@Pixelarter Thanks! Are the inputs to all of these tasks the same (2D images?)? Cause the tasks sound quite different, so it's cool that their encoding works for all of them
@Pixelarter
2 жыл бұрын
@@tobiascornille No. Some are 3D, some are 2D. From what I glance, the hash they developed just encode the position of the input at different resolutions, concatenates them and apply some transform, and feed as input along regular information.
@dprezzz15612 жыл бұрын
Amazing work. I cannot wait for a smartphone implementation.
@calibaba27392 жыл бұрын
Wow this technology is amazing. I’m a boxing fan. I hope this can replay some classic fights viewing from angles. Thank you 🙏👍
@brexitgreens2 жыл бұрын
Sir, your hype is both awesome and fully appropriate in equal amounts. I learned about NERFs only a few days ago. You don't exaggerate when you say that this was science fiction only four years ago. Back then, I intuited the possibility of AI photogrammetry - in the same way Star Trek intuites warp drive and the holodeck. And now - it is here. The tech straight out of my dream. What a time to be alive!
@imveryangryitsnotbutter2 жыл бұрын
If it only takes 5 seconds to render a still object, and that same rendering speed is applied to each frame of an actor's performance filmed in 60 FPS from multiple angles, then each minute of footage should take about 5 hours to render. If you're a small game dev studio, that means that you can basically feed your dedicated workstation a few minutes of footage and leave it running overnight, and you'll get the final animated asset rendered in a day or two. What a boon this would be for Myst-like adventure games! EDIT: Actually, not even a day or two! If it takes 2 seconds per frame, then even five minutes of footage would be rendered in just 10 hours! You could leave it running overnight, and it would be done the very next morning!
@Lord2225
2 жыл бұрын
It takes 5 seconds to train nn, and evaluation (rendering) is real-time. Acording to paper this animation (5:02) can run 133fps in 1080 on rtx3090
@magen6233
2 жыл бұрын
that would be quite heavy to replace your model on every frame of animation.
@Lord2225
2 жыл бұрын
@@magen6233 Idk I did not red whole code yet. From what I read, I understand that synchronization is done with m_training_stream (for training) and m_inference_stream (for rendering) (these are cuda's streams and are used for runing kernels asynchronously). Whole magic happens in Testbed::NerfTracer::trace() function and train_nerf. I think that they are coping sth but for sure not every frame (update_nerf_transforms function, copy sth every training step).
@SlinkySlonkyWaffle2 жыл бұрын
I wonder if this could be used to make 3D scanning through photogrammetry incredibly fast, since it can detect the geometry of such few pictures SO fast.
@jenkem44642 жыл бұрын
This is flippin' amazing!
@SYBIOTE2 жыл бұрын
This is just amazing but, other than image compression, I fail to imagine how this technique can be integrated into existing software
@TheChenchen
2 жыл бұрын
Dude you can create 3d models from photo alone
@skunko1871
2 жыл бұрын
I'd love to use this on Google Earth. Imagine you can slowly walk down the street instead of zipping down a few meters at a time
@Hexcede
2 жыл бұрын
Photogrammetry, self driving vehicles, and, an interesting and maybe not so feasible one, pre-rendering a complex scene and simplifying it to be displayed on lower end hardware using this technique. From what I read, it produces an SDF, which is super awesome because they're cheap to render and can offer a lot of mathematical meaning, e.g. with self driving vehicles
@SYBIOTE
2 жыл бұрын
@@Hexcede yes it creates 3d models, but how would you integrate it into existing software would be a bigger challenge. 3d models to be used in applications need to be highly optimised as well (topology , different maps and stuff) I get that this has amazing and varied applications but I fail to see how it can be seamlessly integrated into existing software say blender or reality capture, unity, etc
@ge2719
2 жыл бұрын
@@SYBIOTE well obviously it would need other processes added to it. for say if you wanted to use this to create cgi characters. youd probably start with a person in minimal clothes, then add the clothes on top after. if you want it for map making, then youd remove everything you want to be interactive and model those objects separately, so the level, the walls, floor, etc is created using this technique, and you dont have to cut things out of the model. combine this iwth teechniques for removing the specific lighting condition and being able to use an in engine lighting, which weve already seen in other papers. these things are all literally just research papers. how they get applied to software that is end user friendly, even for professionals using complex software, is years down he line, and will likely require companies like weta to create them.
@Vini-BR2 жыл бұрын
Can't wait to see that reconstruction applied to the Google Street View or similar
@tartarosnemesis6227
2 жыл бұрын
Yes, I also immediately thought of that when I saw the video. Like GTA in the real world.
@mikiqex
2 жыл бұрын
I'm wondering about the storage difference. Images are huge (there are a LOT of them), but this is probably way bigger. Or is it...?
@Vini-BR
2 жыл бұрын
@@mikiqex maybe the neural network could generate the intermediates in real time someday?
@mikeburns41152 жыл бұрын
Absolutely incredible!
@Toyversestore2 жыл бұрын
What a time to be alive.
@MrImperativeoz2 жыл бұрын
This channel latelly seems more about black magic than science.
@Zoza15
2 жыл бұрын
We are entering Dr Strange realms here 😂..
@colox972 жыл бұрын
with that scene in paris i had an epifany about this being used in google maps🤯 imagine how much better it could become
@Aliketie252 жыл бұрын
WHAT!!!!! This is amazing. What a time to be alive
@shabazzy2 жыл бұрын
Wow. Absolutely amazing. I could barely hold on to my papers!
@jorgemfgoncalves2 жыл бұрын
I am at a loss for words with these results. It's just astounding.
@PunmasterSTP Жыл бұрын
It’s incredible things have come this far. I’ve had fun playing around with Stable Diffusion, but I know that’s only a prelude to what we’ll see in the coming months and years.
@sieyk2 жыл бұрын
It's actually mindblowing how useful this will be for static environments.
@cks20206932 жыл бұрын
imagining importing all the google maps drive-by photos into this AI
@LucGendrot2 жыл бұрын
I couldn't hold on to my papers for this one. Incredible!
@PrinceWesterburg2 жыл бұрын
Sci-Fi "Zoom and enhance" on some blurry image is now reality!
@DreckbobBratpfanne2 жыл бұрын
I wonder what the next big thing in AI after deep learning will be. The pace is already brutally insane. How fast will it be with even better methods
@stoef2 жыл бұрын
This is truly incredible. Unbelievable! What a time to be alive!
@epiczeven63782 жыл бұрын
2 seconds??!! It´s here, it´s now, let´s digitize the entire world! :D
@attitudego2 жыл бұрын
So realistic avatars for VR gaming using your phone? Damn, the meta universe will be salivating over this paper.
@sneirox Жыл бұрын
What a time to be alive!
@sc0rpi0n02 жыл бұрын
This is a massive breakthrough in 3D photogrammetry for sure!
@emperorsascharoni95772 жыл бұрын
Wow this is so cool. What a time to be alive.
@Leryud512 жыл бұрын
Incredible, as always! Quick question though, I don’t understand what you mean by training, when examining the giga pixel picture generator ?
@camusalpha
2 жыл бұрын
The AI could read the huge gigapixel image and have a fully trained neural network (the mapping of 2D coordinates with RGB) of the whole image in less than 2 seconds.
@RamenPoweredShitFactory2 жыл бұрын
Amazing, this is so close to being real time.
@getbigwalt96042 жыл бұрын
This would be amazing for face scans into games!
@jenkem44642 жыл бұрын
Imagine movie making with this, or stage performances played after the fact in VR. It sounds like if you have enough cameras, like the original matrix setup, you'd be able to process possibly 1 second of film or movie in about 1-2 minutes...that's amazing! To have that kind of viewer angle independent data sounds like the dream of a VR holodeck style experience is closer than we think!
@MCSteve_2 жыл бұрын
Wait so this is like a ray marches solution or something to that caliber, absolutely brilliant, just wow.
@randomrfkov2 жыл бұрын
My mind and my papers are blown away.
@rickkay95482 жыл бұрын
phenomenal!