Computerphile
8 жыл бұрын
425,911
1

Inside a Neural Network - Computerphile

Just what is happening inside a Convolutional Neural Network? Dr Mike Pound shows us the images in between the input and the result.
How Blurs & Filters Work (Kernel Convolutions): • How Blurs & Filters Wo...
Cookie Stealing: • Cookie Stealing - Comp...
Rob Miles on Game Playing AI: • AI's Game Playing Chal...
Secure Web Browsing: • Secure Web Browsing - ...
Deep Learning: • Deep Learning - Comput...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 312

@pw72257 жыл бұрын
Dr Pound is the best lecturer here. Very clear, intelligently funny, interesting topics. Would deserve his own channel
@theREAL9er7 жыл бұрын
The pictures he printed of the layers helped me grasp the concept so much better than other videos, so thank you
@drhasnainsikandar
2 жыл бұрын
Me too
@oliviamay8 жыл бұрын
Loving these videos with Dr. Pound, keep it up!
@mohammaddawas4815 жыл бұрын
This is the best explaination of what is going on inside a neural net! Now I can imagine it more clearly Thanks alot!
@ProphecySam8 жыл бұрын
I've been studying neural network for the last couple of months and haven't come across any resource that explains it with this perfection. You have made it so easy with the visualization. I'd really appreciate more videos on topics like RNN, how to set number of layers, filters etc (hyperparameters).
@SomethingUnreal8 жыл бұрын
It'd be really interesting to take a network trained to detect random objects as seen by a camera, then give it the live feed from a camera and watch the activation of each neuron in realtime as the object moves about in the camera's view, or rotates around the object, etc. I guess the earlier layers would change a lot, while the deeper layers (which have a better idea of what constitutes an object) would change less.
@Vulcapyro
8 жыл бұрын
Projects like this have been done, but only in the sense that they usually just output the most probable class(es) because that's usually the only real way to deal with the amount of information. For modern networks you should be able to visualize activations of a single layer in real-time, but the number of pixels you'd need for a given layer can range from thousands to millions. So doable, but probably not easy to visually parse just by looking at it.
@smileyball
8 жыл бұрын
Consider looking into visualization approaches (like saliency/heat maps/deconvolutional neural network) and approaches that focus on maximal activation (like Google's DeepDream)
@MadJDMTurboBoost
6 жыл бұрын
SomethingUnreal Id imagine if it was programmed properly and trained long enough, it may look similar to an fMRI.
@aplcc3236 жыл бұрын
Computerphile, you single handedly helped me regain my interest with computer science. Thank you very much for all your videos (:
@astropgn8 жыл бұрын
In the last video I asked how the images were in these various convolutions. I knew that they wouldn't be nothing like the input image, but I was very curious to see the process anyway. And now you make a video answering exactly what I wanted! Thank you so much! :)
@rhoneletobe8 жыл бұрын
So useful. As a CS student, this was more helpful than a ton of other DLNN stuff I've seen online. Thank you!
@tho2078 жыл бұрын
what a fantastic explanation, I loved the digits convolution representation hope to see more videos about this! (RNNs)
@displayoff8 жыл бұрын
I love this man.
@aungthuhein0077 жыл бұрын
Would love to have someone like him as my professor in my life!
@talhatariqyuluqatdis8 жыл бұрын
This guy is my second favourite on computerphile. Lovin these demos
@Aarrmehearties3 жыл бұрын
Massively interesting and well presented, even for my aging neural network!
@dolibert6 жыл бұрын
Mike and Rob, the stars of computerphile. Great content and nice puns. Keep it up guys
@ehsankiani5424 жыл бұрын
The best tutorial ever! Cheers, Mike!
@Kitsudote2 жыл бұрын
Oh wow, this video made me understand neurological networks in an insanely deep way. Thank you!
@cazino43 жыл бұрын
Excellent video! Visual seeing the neurons light up blew me away... It was like looking at an artificial, scaled down brain being imaged...
@kevintoner60684 жыл бұрын
wow.... I didn't expect to understand any of that, but it was all explained perfectly. It made sense. Awesome video
@harborned8 жыл бұрын
Fantastic video! Interesting to see "inside the mind" of a neural network
@Wurzelbrumpft18 жыл бұрын
love this series about machine learning
@davidm.johnston89946 жыл бұрын
Thank you Mike, and thank you Shaun, this video is really helping me in my quest! I'm making a small game in which I'm trying to make an AI using the tensorflow library.
@zakeryclarke24827 жыл бұрын
Please do a video on the maths of forward and back propagation and how they are implemented
@shifter654 жыл бұрын
Love the visualization!
@hotfrost_ Жыл бұрын
Thank you so much. This was very helpful.
@Pfaeff8 жыл бұрын
Doesn't google use those captchas as a crowd sourced labeling technique for their own deep learning stuff?
@emosp0ngebob
8 жыл бұрын
there's a computerphile video on that too somewhere, again a google project. you get shown two words, and the computer knows one of them and not the other, so when you type the two words in the computer learns what a word is. that's for transcribing libraries and things... i cant remember which computerphile video it was though.
@zubirhusein
8 жыл бұрын
They do
@daniellewilson8527
4 жыл бұрын
I wonder if there’s a website that someone can go to to do the image things to help train the deep learning systems?
@TheCritic6098 жыл бұрын
Great Video!
@SinanAkkoyun5 жыл бұрын
Dr Mike Pound, please make a tutorial series on q learning! In depth
@s.e.72683 жыл бұрын
it was very enjoyable, thanks for the video.
@crazygood1508 жыл бұрын
who needs dual monitors when you have dual PC! Great video btw
@RG-jv2nv8 жыл бұрын
This really clarified the previous video :)
@punkkap8 жыл бұрын
Brilliant video.
@Sebi00436 жыл бұрын
Thank you very much! Very helpful video!
@Fireking3008 жыл бұрын
Great video!
@probE4668 жыл бұрын
Could you maybe link to the actual code? Would be interesting to look at the implementation
@heaslyben8 жыл бұрын
That was really interesting! Thanks!
@hosmanadam5 жыл бұрын
Very enjoyable, thank you!
@shanesrandoms8 жыл бұрын
enjoying the neural net videos. looks like ANNs are coming back in after not really seeing much of it since the 90s. i remember my first exposure to the math ans theory behind this was an assembly program on my 8bit C64 back in the late 80s creating a 3 layer Back Propegation network
@sirivellamadhuphotos5 жыл бұрын
@ 5:58 I got the point i am searching for. Thank you very much..
@jambalaya2013 жыл бұрын
How do you even visualize the output of the NN? Crazy, this perspective is so insightful.
@ericklestrange62554 жыл бұрын
amazing, i didnt know u could visualize the high rank features
@kachrooabhishek2 жыл бұрын
i just love this guy
@sedthh8 жыл бұрын
please do more on this
@davidscarlett50978 жыл бұрын
How are the outputs of the multiple kernels at each layer managed? Are they somehow merged so that the kernels of the next layer all process the same input? Or do the 20 kernels of layer 2 operate on the 20 outputs of the layer 1 kernels respectively? And if the latter, then what happens when moving from a 20 kernel layer to a 50 kernel layer? Would some of the 20 kernels of the previous layer be duplicated twice, and others duplicated three times to make up the inputs to the 50 kernels in the new layer?
@haakonvt7 жыл бұрын
As always, a splendid video! However, every single clip taken from the angle where the pictures of the convolutions are visible, are out of focus. Pitty
@black_platypus8 жыл бұрын
Now I see it, too... For some reason, YT gave me the video at a lower resolution (not watching it in full screen mode, I hadn't noticed), and I was thinking "I don't understand all the people complaining about the video being "wobbly", the video looks fine to me"... Then I saw I wasn't watching it at 50 fps, so I changed the quality. I, too, find it a bit weird. I guess stabilization doesn't quite kick in as hard when there are more frames to interpolate between?
@mimArmand5 жыл бұрын
Very cool! @5:24 Grayscale is quite a few bits deep, 1-bit depth would be Black & White ( which is not the case in your images, looks like you have at least 16-bit images - if not 256-bit standard grayscale - )
@germangb87528 жыл бұрын
I'm taking a two credit course in deep learning next week!
@DracarmenWinterspring8 жыл бұрын
With all the edge detection going on, would it be harder to recognize a 4 if some versions had the top parts join at an angle, like the 4 in this font, versus the open version as in the video? Likewise a 7 with or without the strike through it? I mean, does it remember some kind of average of all the objects in a class or all of them / all of the sufficiently different ones (which might be hard for a large database)?
@CarterColeisInfamous8 жыл бұрын
13:16 what he wants to say is that if the images are segmented then its much easier. the segmentation problem is hard. That's why google captchas are all mushed up on each other. Google apparently fixed the segmentation problem by just training it to recognize multiple pairs of letters
@HyzerFlip8 жыл бұрын
Love the out of focus shots on the pictures...
@G3rain18 жыл бұрын
It would have been way more interesting to see different examples of the same number and how it tranlates into the same output.
@Zorbeltuss8 жыл бұрын
A thought that I've gotten when thinking about this and the previous episode, would it be possible to "reverse" the order of the convolutional neural network, getting a sort of idealized result, probably not extremely useful in most cases, but likely somewhat usable for seeing what extra data can be used to train it for more accurate results or perhaps some sort of data generation. Doing the same for a standard neural network would not result in any useable data I know, but it seems like it might be possible with the convolutional one.
@airraidsiren7 жыл бұрын
After watching this, the one thing I don't feel is completely explained is where the convolution kernel values come from. At first he says they are things "like Sobel edge detectors", but later says they are not manually entered, but rather learned values. That leaves the obvious question of how are they initialized? Do they start as just matrices with random entries? During the training, how are they adjusted? Is the "training" some kind of iterative search for kernel values that give the strongest response (e.g. the values that most consistently uniquely identify the one digit being learned and most strongly reject the other 8 digits?) I could use a bit more explanation on what the training process looks like and how it adjusts all the kernels.
@jonathanstrasburg36096 жыл бұрын
I realize this isn't likely to get a reply this late, but I'm trying to replicate the configuration of this network. What activation function are you using for the first fully connected layer? Is it dotplus with a renormalization? I'm assuming FC2 is a softmax layer, so maybe they are both softmax.
@beautifulsmall3 жыл бұрын
wrote python convolution algos on bitmaps around this time just to self learn python, filters and convolutions are amazing to see in action . Its a little scarry to see how far we are now in 2021 . Covid hasnt stopped SW engineers .
@Kubaizi6 жыл бұрын
very helpful, thanks
@Locut0s8 жыл бұрын
Very interesting! I wonder if this gives some insight into how neurones in our brains work on a very basic level?
@cmptrn825
8 жыл бұрын
Points for making me look at my screen with my head turned 90 degrees to the left until I realize I look like a crazy person
@deepquest7 жыл бұрын
Its a really good lecture to understand what is going on inside NN. I am using NN for target classification in thermal images. Is NN is a good approach to do that ? Or I should go for any other option.
@SapphireCrook7 жыл бұрын
I never knew YT even supported 50FPS. :O Also, cool computer learning. Today is a day of new smarts.
@mikejones-vd3fg8 жыл бұрын
so how do the nueral networks do this? is there speed advantages to this network vs just regular processing?
@caw25sha8 жыл бұрын
I am one of those strange people who draws a horizontal bar through the number 7. How would you deal with that? Would you need a separate set of 7+bar training digits (in effect an 11th character) and then map both 7 and 7+bar back to 7?
@timl2k11Күн бұрын
The person who interviews this guy doesn’t ask enough questions.
@CarterColeisInfamous8 жыл бұрын
14:36 google captcha api morphs the filter the more samples you get until the training data is useless, also those image based captchas are also being broken after all the success of imagenet
@andre.queiroz8 жыл бұрын
people interested in this experiment, you can actually do it in the Machine Learning course (Stanford) on Coursera
@bofk73068 жыл бұрын
Can you look at your last but one fully connected layer and calculate the typical "distance" between different digits? E.g. just euclidean distance on the normalized terms in FC1. Would those distances depend on your neural network you're using or would they be similar across all successful neural networks? That is could you say something like a 1 and a 7 are typically closer than a 0 and a 4.
@iroxudont8 жыл бұрын
>using windows as a host machine and linux as a guest
@ebolapie
8 жыл бұрын
Why anyone would use HyperV over KVM is beyond me.
@box8250
8 жыл бұрын
It might be university property
@michaelpound9891
8 жыл бұрын
Almost. Actually this was teamviewer - accessing a machine on another campus that is significantly more powerful that the one I have in this office. Deep learning doesn't really work in a guest due to needing the graphics card, which is tricky to get. The machine to the left is also Linux, and is used for deep learning, but someone else was using it at the time of this video!
@iroxudont
8 жыл бұрын
Michael Pound Why not SSH
@felixmerz62294 жыл бұрын
I imagined that each layer uses all its kernels on all the images of the previous layer. But that can't be right, hearing that the last convolutional layer here only outputs in a size of 50*4*4. Does that mean that there essentially are "kernel pipelines"? So kernel0 of layer1 will only be fed with the output of kernel0 of layer0?
@dino1303957 жыл бұрын
Seeing that a lot of people are confused by this video being 50fps, I'd want to clear that up. 50fps is a standard frame rate for television and video in general. 60fps is a standard for animated and generated images, like animations, or games. Sure, you can do either with both, but it's generally so that high-frame-rate TV broadcast are always 50, not 60 fps. The scale for TV: 25, 50, 100, 200 Hz The scale for Computers: 30, 60, 120 Hz (Hz = fps)
@denischikita11 ай бұрын
You look like smart guy. Well done
@anassgxz8 жыл бұрын
can you do a episode on histograms of oriented gradient
@SleeveBlade8 жыл бұрын
really interesting! I would be interested tot see if it is possible to start from the final convolution and see which image fits it the best, as in 'what looks the most like a 2'.
@ruben307
8 жыл бұрын
it would be interesting to know if there can be totally different pictures that just would get the same number. Similar to a hashing collisions.
@black_platypus
8 жыл бұрын
Well sure, that's basically the same concept (irreversible / one-way transformations giving you an abstract result)
@quakquak6141
8 жыл бұрын
I saw somewhere a neural network that was trained to fool convolutional neural networks, sometimes it produced normal images (in this case it would have produced a 2) other times it produced something that looked almost like pure noise but it was still able to fool the networks
@CDmc988 жыл бұрын
Shouldn't one be able to generate characters(letters, whatever) by going the other way around? I'm thinking what if you tell it to generate a picture from a fully connected layer?
@Embedonix8 жыл бұрын
Can you share the caffe scripts you used please?
@MrArunavadatta7 жыл бұрын
excellent..
@slayer6464648 жыл бұрын
amazing
@vincentphaneuf54727 жыл бұрын
at one point binary flashed on the screen as a background, here's a part of the translation: E¤ªÉü
@the1exnay8 жыл бұрын
would i be wrong in thinking that if you gave a convolutional neural network the ability to control where to click and what to type and gave it enough convolutions and kernals (perhaps beyond what current computers can handle) and trained it enough then it would be able to solve any captcha, even a new one with different interface that still used the same basic principles?
@fethiourghi4 жыл бұрын
The best explanation of CNN's thanx
@pequenoZero8 жыл бұрын
It would be nice, if you talked a bit about how much data is needed for a CNN to be any kind of useful. The datasets in this video seem extremely big. Specifically it would be nice to have an idea on how well it works on many "categories" with a low amount of data.
@tamebeverage4 жыл бұрын
Excuse me if I have missed something obvious, but I'm not sure I understand what the input of, say, C2 is. Is it a sort of average of all of the images produced by C1?
@andreyguskov16977 жыл бұрын
If the first convolution layer has 20 filter and the second one has 20, does thing mean that each C2 filter processes all 20 images from C1? That would make 400 images for C2 output
@limitless16927 жыл бұрын
i am so fascinated by artificial neural networks .
@theepicguy6575
4 жыл бұрын
Not so much of you started doing the actual math and understand exactly why it extracts such specific features from an image
@DrBlort8 жыл бұрын
How do you decide what the convolution kernels should be? Is that important, or could they be defined randomly at the beginning?
@rory4987
4 жыл бұрын
Neural network weights are set randomly and then learnt
@marcoswappner83314 жыл бұрын
Let's see if someone can help me out here. The first layer here outputs 20 24×24 images (or a 20 channel image) after performing all the convolutions. The second layer will output 20 20×20 images. But how are they constructed? How do they combine the 20 channels from the previous layer? I mean, they are not applying all 20 filters to each of the 20 channels, that'd be a 400 channel output. Do they simple add the convolutions for each channel up? So channel 1 of layer 2 is the sum of the convolution between kernel 1 of layer 2 with each of the 20 channels of layer 1?
@EQuivalentTube28 жыл бұрын
The first layer looks like something Andy Warhol would do.
@ezadviper8 жыл бұрын
MOOOOAAARRRR !!!!
@lohphat7 жыл бұрын
How to you replicate the learned connections to other systems? How is the "knowledge" abstracted for transport, backup, and further improvements? With discrete programming, the instructions are compact and finite and are easily copied.
@JohnHollowell8 жыл бұрын
I wonder, can you work backwards somewhat to get a general idea of what the original image looked like from the convolution layers?
@RoySchl
8 жыл бұрын
i don't think so, that would be like asking "what 5 numbers did i multiply to get 3600?" there is only on possibility if you do the multiplication, but many possibilities when you try to guess backwards, and with those convolutions it's the same thing just exponentially worse. basically you drop a lot of information
@DagarCoH
8 жыл бұрын
Well, I'd say yes, as if you have all the information every layer puts out, you just have to reverse the process the first layer did on the data it gave. Since you have many processes on the one image, there should be much redundancy and therefore a high certainty. If you however only have the output of the sixth convolution layer, I highly doubt that you could get much out of it.
@compuholic82
8 жыл бұрын
Partially. The problem is that (in general) a convolution is not a reversible operation. However, you can apply something that is known as a "matched filter" which is basically a convolution with the transposed filter kernel. If you go backwards through the network you can (to some degree) reconstruct the input signal. If you look at this paper you can see how the reconstructions look like: arxiv.org/pdf/1311.2901v3.pdf And just to prevent confusion: The author calls it "Deconvolution". But he isn't doing a "deconvolution" as he describes in his paper. He is applying a "matched filter".
@TheGameFreak0136 жыл бұрын
how do you know how many kernels, layers, etc. are best suited for your needs?
@m3n4lyf
5 жыл бұрын
That is an excellent question! Unfortunately it requires at least a moderate amount of knowledge in the subject matter to answer, so I doubt you'll be getting a satisfactory response from this resource any time soon.
@Remix00zero8 жыл бұрын
So would it be possible to use convolutional neural networks for something general like arbitrary image matching or are they limited to narrowly trained applications like the one here?
@2Cerealbox
8 жыл бұрын
You can. I think google uses a neural net for their "visually similar images" feature.
@paulhendrix8599
8 жыл бұрын
nice picture.
@ahmadibraheem11413 жыл бұрын
Hey I have a question. After the first conv layer, we are left with 20 images of 24*24 pixels. Do these 20 images transform into one 24*24 sized image, to be given as an input to the next conv. layer?
@TheAbdelwahab83
2 жыл бұрын
No, after the first conv layer you have like a volume (24*24*20), and it is the input to the next conv layer of size ( 5*5*20), so if you apply this kernel to that input volume you'll get one image of size 20*20, and because you have 50 filters of (5*5*20) so your output will be 20*20*50
@feliceserra1068 жыл бұрын
Why are there 4x4x50 neurons after the last conv-layer? I get 4x4x(20^2)x(50^4) neurons, if every 5x5 kernel runs over every image from the previous layer. I'm confused.. maybe the kernels in the following layers are 3-dimensional? Like 20 5x5x20 kernels in the second layer?
@feliceserra106
8 жыл бұрын
Now I understand. Thanks a lot!
@feliceserra106
8 жыл бұрын
Got it, thank you!
@tenalexandr1991
8 жыл бұрын
I have the same puzzle. Could you enlighten me?
@feliceserra106
8 жыл бұрын
In short: "Each kernel is 5x5xD, where D is the number of features in the previous layer" I dont know why their answers are not showing up on youtube. Maybe a google+ thing.
@elanvanbiljon52378 жыл бұрын
Is there any chance you could upload a copy of the source code for the CNN some where? (or even pseudo code) I am sure many people would greatly appreciate it :D
@JohnHollowell8 жыл бұрын
The 50fps makes the wobble more noticeable.
@unfa007 жыл бұрын
+1 for running GNU/Linux :)
@thejll Жыл бұрын
What if the training images have digits drawn at different scales?
@jacobdawson21098 жыл бұрын
I am curious, would it be possible to run this sort of neural network in reverse in order to produce the sort of "Deep Dream" images that you can see on the Internet? For instance, instead of asking the network 'what digit dose this image resemble?', ask 'what dose a 2 look like?'
@gpt-jcommentbot4759
Жыл бұрын
yes thats what deep dream is
@CarterColeisInfamous8 жыл бұрын
how were the kernels generated for this one?
@MeAtHome54 жыл бұрын
I always do a *very* firm two. 06:55