Positional Encoding was just like a black magic to me that just works. Now you introduced Integrated Positional Encoding which is black magic on top of black magic. How do you guys understand what is happening here
@michaelcopeman95777 ай бұрын
Simply brilliant.
@user-sc5xk6vy8j9 ай бұрын
I don't understand why distortion loss can be written like this. Doesn't the cubed term appear?? Is there anyone who knows??
@PunxTV1239 ай бұрын
how to use this?
@kristoferkrus9 ай бұрын
Sweet, looks like it works well and it's simple to use! Have you tried using a mixture of Gaussians as output and KL-divergence as loss function? It would be interesting to see how well that performs against your method. Granted, even a mixture of Gaussians will be sensitive to outliers unless you include some Gaussian with extremely large standard deviation; we just need some way to enable the network to be able to easily output values that can be interpreted as very large standard deviations.
@patricksullivan337210 ай бұрын
This is unreal. So impressive. Thank you for your work!
@MikeBarron10 ай бұрын
Crazy impressive! Approximately many input images were needed to render the movies at the end?
@jon_barron10 ай бұрын
Hey thanks Mike! Those results are around a thousand images: someone walking through the space holding down the shutter button on a DSLR, waving it around. It's a lot of images but it's surprisingly fast
@mene21723 ай бұрын
@@jon_barronso these are not frames grabbed from a video? Are they hi-res images?
@TheMazyProduction10 ай бұрын
We’re so back
@greg.skvortsov10 ай бұрын
That's a damn nice graphical explanation!
@iratemusic4575 Жыл бұрын
AI getting smarter, deep genius, and faster. Can AI or ChatGPT help autism get smarter?
@sinlife484 Жыл бұрын
amazing! I'm curious if it works for indoor scene reconstruction, could you please tell me ?
@saemranian Жыл бұрын
Awesome 😶🌫
@uttamg911 Жыл бұрын
Great work!! Are there any accepted submissions that didn’t make the cut for the highlight reel?
@TyroneHilpert Жыл бұрын
"PromoSM" ❗
@jeffreyalidochair Жыл бұрын
a practical question: how do people figure out the viewing angle and position for a scene that's been captured without that dome of cameras? the dome of cameras makes it easy to know the exact viewing angle and position, but what about just a dude with one camera walking around the scene taking photos of it from arbitrary positions? how do you get theta and phi in practice?
@addincui2617 Жыл бұрын
Great honor to share some works on CVPR as a non-scientist. Best wishes to CVPR.
@ratdreamsai6324 Жыл бұрын
I love all of the great AI creative works highlighted here. and I feel honored to be featured these other creative minds! 🐀❤
@ratdreamsai6324 Жыл бұрын
I love both the ai & non-ai work! It's the blend of tools, craft, and creative vision that can make these pieces extra special.
@StormOrMelody Жыл бұрын
@@ratdreamsai6324 we're entering a new era of unbounded human creativity <3
@ratdreamsai6324 Жыл бұрын
@@StormOrMelody Yes - What a time to be a live! People of all capabilities will be able to bring their visions to life in new, never before possible ways!
@abenedict85 Жыл бұрын
zoom. enhance. Zoom. Enhance. ZOOM! ENHANCE!!!!
@user-hi2bz8ds2h Жыл бұрын
Hi Jon, I saw the newest project provided by you and your colleagues, Ben Mildenhall. I want to say that it is so impressive and simultaneously useful for plenty of new ideas and projects. We are working on some interesting projects where we think Anti-Aliased Grid-Based Neural Radiance would be the best option to increase the effectiveness and productiveness. Hence, can we have some direct conversation about the project ? I am really looking forward to hearing from you.
@user-by6fr4dj4k Жыл бұрын
which lab did you cooperate in Harvard?
@magicnifties Жыл бұрын
The views of this vid will blow up very soon! 🙌 Great explanation on an advanced topic of AI. 🤓
@shubhashish7090 Жыл бұрын
can we exatrct the mesh from this to be used in any traditional game engine
@Instant_Nerf Жыл бұрын
Is video playback possible wirh nerf?
@jon_barron Жыл бұрын
Yeah check out kzread.info/dash/bejne/rHaHqo-kaarIhpc.html
@mahmoodmasarwa3374 Жыл бұрын
any way we can test it ?
@jon_barron Жыл бұрын
Sure! github.com/google-research/multinerf
@frenchmarty74462 жыл бұрын
Insanely cool
@ncmasters2 жыл бұрын
Is the code available yet?
@jon_barron2 жыл бұрын
Yep, here you go: github.com/google-research/multinerf
@ncmasters2 жыл бұрын
@@jon_barron Thanks
@changgongzhang66412 жыл бұрын
For the regularizer at 6:40, the minimum of Loss_dist is achieved by setting w(u) = 0 everywhere, right? wondering how it can become to a delta function?
@jon_barron2 жыл бұрын
You are correct, the reason that w(u) doesn't get set to zero everywhere is because that would cause the data term of the loss to be extremely high. In that animation I normalized w to sum to 1, which in practice is what happens during training because of the data term.
@changgongzhang66412 жыл бұрын
@@jon_barron Thanks a lot for your explanation!
@Askejm2 жыл бұрын
will it be publically availabe/is someone working on an implementation?
@jon_barron2 жыл бұрын
Yes, we'll be releasing code soon.
@Askejm2 жыл бұрын
@@jon_barron does it run on windows and will it have a pretrained model?
@saltygamer843510 ай бұрын
have you released the code?@@jon_barron
@bharatsingh4302 жыл бұрын
Looks pretty amazing! To import this to a graphics engine and be able to render new objects in this scene, we would also need the light-sources, material properties (diffuse/specular etc.) and normals (unless the depth maps are super accurate) - wonder if those could also be recovered by modifying this technique..
@timsousa3860 Жыл бұрын
Seems quite the challenge
@superatomic97612 жыл бұрын
this is amazing. does it work with images of people?
@willhart21882 жыл бұрын
I'd like to try this in VR.
@legoworks-cg5hk2 жыл бұрын
What a time to be alive
@BAYqg Жыл бұрын
dear fellow scholars
@Neptutron2 жыл бұрын
This is amazing! Given the precision of the depth maps is greater than the input of SVS, could this be used for more accurate photogrammetry?
@legoworks-cg5hk2 жыл бұрын
Exactly what I was wondering
2 жыл бұрын
@@legoworks-cg5hk hopefully RealityCapture or Agisoft adopts this fast
@legoworks-cg5hk2 жыл бұрын
@ is there a way to use reality capture without nvidia?
2 жыл бұрын
@@legoworks-cg5hk sadly no.... Agisoft can be used with any GPU, but is slow as hell without CUDA... one solution is cloud processing
@legoworks-cg5hk2 жыл бұрын
@ only problem with agisoft is that you have to pay for it to export models
@HaroldR2 жыл бұрын
This is great!
@Halcyox2 жыл бұрын
Quite excellent work that is done here!
@Padoinky2 жыл бұрын
Have to goggle this to even understand what it is about
@GmZorZ2 жыл бұрын
quite an interesting way of representing depth, how is it done? it’s about time black and white maps become a thing of the past!
@jon_barron2 жыл бұрын
This is the "turbo" color map: ai.googleblog.com/2019/08/turbo-improved-rainbow-colormap-for.html
@GmZorZ2 жыл бұрын
thank you so much! very informal, i’d have guessed more bit variation in an image presents much more detail, for me it’s really all about what you can fit in standardized 32bit formats! I will incorporate this in my own projects to further normalize the standard.
@marknadal96222 жыл бұрын
@@jon_barron is color/density at depth position determined by the training? In all the videos it seem to be implied that its known as input, but that clearly isn't possible from a 2D image without prior object size training. Sorry for dumb Q.
@kwea1232 жыл бұрын
@@jon_barron Thanks for the great article! I always used jet, and was wondering how turbo is different. Now turbo looks better to me!
@pretzelboi642 жыл бұрын
If you don't even know what color mapping is, I don't think you're qualified to talk about what should be things of the past lol
@ThetaPhiPsi2 жыл бұрын
The depth map is amazing, wow!
@khaledbouzaiene39592 жыл бұрын
can we can extract 3d model maps ?
@HA-cy4vx2 жыл бұрын
wowwwwwww
@exhibitscotland3602 жыл бұрын
Very Interesting.
@cailihpocisuM2 жыл бұрын
Late to the party, but incredible work!
@cailihpocisuM2 жыл бұрын
Amazing results, truly compelling!
@yasserothman40233 жыл бұрын
@1:42 what is x ?
@Q_203 жыл бұрын
amazing
@luke26423 жыл бұрын
Does changing the activation function to Siren help at all in larger NERF networks? My little colab experiments it seems to train fast, but it might not scale. Also, interested to see if a modern hopfield network could represent the NERF data well too?
@jon_barron3 жыл бұрын
So far I haven't seen any results where SIREN improves NeRF's test-set performance. SIREN primarily targets quickly minimizing training loss (and does a great job at it!) but doesn't really focus on generalization, and performance in NeRF is largely determined by how well the model generalizes to new views.
@JoshuaKolden3 жыл бұрын
I don’t want to be disparaging this or any of the other work on NeRF, but I don’t understand where the innovation is. I get the very strong impression it is just rehashing work that has already been done in graphics, and long ago at that. What is the compelling improvement that neural networks bring other then novelty, over prior work? We’ve been able to generate new camera angles, and volumetric reconstructions from still photography for literally decades.
@jon_barron3 жыл бұрын
I think an important distinction is that NeRF lets you construct models from images.
@eelcohoogendoorn80443 жыл бұрын
'We’ve been able to generate new camera angles, and volumetric reconstructions from still photography for literally decades.' Yes, there exist prior work in this area. Have you actually looked at any f the comparisons in the nerf papers? Do you feel they make an unfair comparison? They seem like very compelling improvements over the state of the art to me.
@kwea1233 жыл бұрын
I just found that the positional encoding has some similarity with traditional 2nd order differential equation used in physics (e.g. harmonic oscillator en.wikipedia.org/wiki/Harmonic_oscillator ) Original positional encoding: PE(x) = (sin(x), cos(x), sin(2x), cos(2x), ... sin(2^Lx), cos(2^Lx)) Harmonic oscillator w/o damping: x'' + w^2x = 0 whose solution is Asin(wt)+Bcos(wt), So PE(w) is actually the solution evaluated at t=1, 2, .. 2^L with different initial conditions (that lead to A=1, B=0 or B=1, A=0). IPE here: IPE(x, u, s) = (sin(u)e^(-s^2/2), cos(u)e^(-s^2/2), sin(2u)e^(-2s^2), cos(2u)e^(-2s^2), ...) Harmonic oscillator w/ damping: x''+2kwx'+w^2x = 0 (with k<1) whose solution is Ae^(-kwt)sin(w1t) + Be^(-kwt)cos(w1t) where w1=k*sqrt(1-w^2). Again IPE(w, u, s) corresponds to the solution evaluated at different t's with different initial conditions. Is this just a coincidence?
@jon_barron3 жыл бұрын
Great insight! I can't tell if it's a coincidence or a meaningful connection at first glance, but I'll investigate further.
@sheetalborar68133 жыл бұрын
Would this loss work in classification tasks as well? As it does not match the shape of the cross-entropy loss function
Пікірлер
Please apply this to Google Street View.
Positional Encoding was just like a black magic to me that just works. Now you introduced Integrated Positional Encoding which is black magic on top of black magic. How do you guys understand what is happening here
Simply brilliant.
I don't understand why distortion loss can be written like this. Doesn't the cubed term appear?? Is there anyone who knows??
how to use this?
Sweet, looks like it works well and it's simple to use! Have you tried using a mixture of Gaussians as output and KL-divergence as loss function? It would be interesting to see how well that performs against your method. Granted, even a mixture of Gaussians will be sensitive to outliers unless you include some Gaussian with extremely large standard deviation; we just need some way to enable the network to be able to easily output values that can be interpreted as very large standard deviations.
This is unreal. So impressive. Thank you for your work!
Crazy impressive! Approximately many input images were needed to render the movies at the end?
Hey thanks Mike! Those results are around a thousand images: someone walking through the space holding down the shutter button on a DSLR, waving it around. It's a lot of images but it's surprisingly fast
@@jon_barronso these are not frames grabbed from a video? Are they hi-res images?
We’re so back
That's a damn nice graphical explanation!
AI getting smarter, deep genius, and faster. Can AI or ChatGPT help autism get smarter?
amazing! I'm curious if it works for indoor scene reconstruction, could you please tell me ?
Awesome 😶🌫
Great work!! Are there any accepted submissions that didn’t make the cut for the highlight reel?
"PromoSM" ❗
a practical question: how do people figure out the viewing angle and position for a scene that's been captured without that dome of cameras? the dome of cameras makes it easy to know the exact viewing angle and position, but what about just a dude with one camera walking around the scene taking photos of it from arbitrary positions? how do you get theta and phi in practice?
Great honor to share some works on CVPR as a non-scientist. Best wishes to CVPR.
I love all of the great AI creative works highlighted here. and I feel honored to be featured these other creative minds! 🐀❤
I love both the ai & non-ai work! It's the blend of tools, craft, and creative vision that can make these pieces extra special.
@@ratdreamsai6324 we're entering a new era of unbounded human creativity <3
@@StormOrMelody Yes - What a time to be a live! People of all capabilities will be able to bring their visions to life in new, never before possible ways!
zoom. enhance. Zoom. Enhance. ZOOM! ENHANCE!!!!
Hi Jon, I saw the newest project provided by you and your colleagues, Ben Mildenhall. I want to say that it is so impressive and simultaneously useful for plenty of new ideas and projects. We are working on some interesting projects where we think Anti-Aliased Grid-Based Neural Radiance would be the best option to increase the effectiveness and productiveness. Hence, can we have some direct conversation about the project ? I am really looking forward to hearing from you.
which lab did you cooperate in Harvard?
The views of this vid will blow up very soon! 🙌 Great explanation on an advanced topic of AI. 🤓
can we exatrct the mesh from this to be used in any traditional game engine
Is video playback possible wirh nerf?
Yeah check out kzread.info/dash/bejne/rHaHqo-kaarIhpc.html
any way we can test it ?
Sure! github.com/google-research/multinerf
Insanely cool
Is the code available yet?
Yep, here you go: github.com/google-research/multinerf
@@jon_barron Thanks
For the regularizer at 6:40, the minimum of Loss_dist is achieved by setting w(u) = 0 everywhere, right? wondering how it can become to a delta function?
You are correct, the reason that w(u) doesn't get set to zero everywhere is because that would cause the data term of the loss to be extremely high. In that animation I normalized w to sum to 1, which in practice is what happens during training because of the data term.
@@jon_barron Thanks a lot for your explanation!
will it be publically availabe/is someone working on an implementation?
Yes, we'll be releasing code soon.
@@jon_barron does it run on windows and will it have a pretrained model?
have you released the code?@@jon_barron
Looks pretty amazing! To import this to a graphics engine and be able to render new objects in this scene, we would also need the light-sources, material properties (diffuse/specular etc.) and normals (unless the depth maps are super accurate) - wonder if those could also be recovered by modifying this technique..
Seems quite the challenge
this is amazing. does it work with images of people?
I'd like to try this in VR.
What a time to be alive
dear fellow scholars
This is amazing! Given the precision of the depth maps is greater than the input of SVS, could this be used for more accurate photogrammetry?
Exactly what I was wondering
@@legoworks-cg5hk hopefully RealityCapture or Agisoft adopts this fast
@ is there a way to use reality capture without nvidia?
@@legoworks-cg5hk sadly no.... Agisoft can be used with any GPU, but is slow as hell without CUDA... one solution is cloud processing
@ only problem with agisoft is that you have to pay for it to export models
This is great!
Quite excellent work that is done here!
Have to goggle this to even understand what it is about
quite an interesting way of representing depth, how is it done? it’s about time black and white maps become a thing of the past!
This is the "turbo" color map: ai.googleblog.com/2019/08/turbo-improved-rainbow-colormap-for.html
thank you so much! very informal, i’d have guessed more bit variation in an image presents much more detail, for me it’s really all about what you can fit in standardized 32bit formats! I will incorporate this in my own projects to further normalize the standard.
@@jon_barron is color/density at depth position determined by the training? In all the videos it seem to be implied that its known as input, but that clearly isn't possible from a 2D image without prior object size training. Sorry for dumb Q.
@@jon_barron Thanks for the great article! I always used jet, and was wondering how turbo is different. Now turbo looks better to me!
If you don't even know what color mapping is, I don't think you're qualified to talk about what should be things of the past lol
The depth map is amazing, wow!
can we can extract 3d model maps ?
wowwwwwww
Very Interesting.
Late to the party, but incredible work!
Amazing results, truly compelling!
@1:42 what is x ?
amazing
Does changing the activation function to Siren help at all in larger NERF networks? My little colab experiments it seems to train fast, but it might not scale. Also, interested to see if a modern hopfield network could represent the NERF data well too?
So far I haven't seen any results where SIREN improves NeRF's test-set performance. SIREN primarily targets quickly minimizing training loss (and does a great job at it!) but doesn't really focus on generalization, and performance in NeRF is largely determined by how well the model generalizes to new views.
I don’t want to be disparaging this or any of the other work on NeRF, but I don’t understand where the innovation is. I get the very strong impression it is just rehashing work that has already been done in graphics, and long ago at that. What is the compelling improvement that neural networks bring other then novelty, over prior work? We’ve been able to generate new camera angles, and volumetric reconstructions from still photography for literally decades.
I think an important distinction is that NeRF lets you construct models from images.
'We’ve been able to generate new camera angles, and volumetric reconstructions from still photography for literally decades.' Yes, there exist prior work in this area. Have you actually looked at any f the comparisons in the nerf papers? Do you feel they make an unfair comparison? They seem like very compelling improvements over the state of the art to me.
I just found that the positional encoding has some similarity with traditional 2nd order differential equation used in physics (e.g. harmonic oscillator en.wikipedia.org/wiki/Harmonic_oscillator ) Original positional encoding: PE(x) = (sin(x), cos(x), sin(2x), cos(2x), ... sin(2^Lx), cos(2^Lx)) Harmonic oscillator w/o damping: x'' + w^2x = 0 whose solution is Asin(wt)+Bcos(wt), So PE(w) is actually the solution evaluated at t=1, 2, .. 2^L with different initial conditions (that lead to A=1, B=0 or B=1, A=0). IPE here: IPE(x, u, s) = (sin(u)e^(-s^2/2), cos(u)e^(-s^2/2), sin(2u)e^(-2s^2), cos(2u)e^(-2s^2), ...) Harmonic oscillator w/ damping: x''+2kwx'+w^2x = 0 (with k<1) whose solution is Ae^(-kwt)sin(w1t) + Be^(-kwt)cos(w1t) where w1=k*sqrt(1-w^2). Again IPE(w, u, s) corresponds to the solution evaluated at different t's with different initial conditions. Is this just a coincidence?
Great insight! I can't tell if it's a coincidence or a meaningful connection at first glance, but I'll investigate further.
Would this loss work in classification tasks as well? As it does not match the shape of the cross-entropy loss function