Inside a Neural Network - Computerphile
Just what is happening inside a Convolutional Neural Network? Dr Mike Pound shows us the images in between the input and the result.
How Blurs & Filters Work (Kernel Convolutions): • How Blurs & Filters Wo...
Cookie Stealing: • Cookie Stealing - Comp...
Rob Miles on Game Playing AI: • AI's Game Playing Chal...
Secure Web Browsing: • Secure Web Browsing - ...
Deep Learning: • Deep Learning - Comput...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com
Пікірлер: 312
Dr Pound is the best lecturer here. Very clear, intelligently funny, interesting topics. Would deserve his own channel
The pictures he printed of the layers helped me grasp the concept so much better than other videos, so thank you
@drhasnainsikandar
2 жыл бұрын
Me too
Loving these videos with Dr. Pound, keep it up!
This is the best explaination of what is going on inside a neural net! Now I can imagine it more clearly Thanks alot!
I've been studying neural network for the last couple of months and haven't come across any resource that explains it with this perfection. You have made it so easy with the visualization. I'd really appreciate more videos on topics like RNN, how to set number of layers, filters etc (hyperparameters).
It'd be really interesting to take a network trained to detect random objects as seen by a camera, then give it the live feed from a camera and watch the activation of each neuron in realtime as the object moves about in the camera's view, or rotates around the object, etc. I guess the earlier layers would change a lot, while the deeper layers (which have a better idea of what constitutes an object) would change less.
@Vulcapyro
8 жыл бұрын
Projects like this have been done, but only in the sense that they usually just output the most probable class(es) because that's usually the only real way to deal with the amount of information. For modern networks you should be able to visualize activations of a single layer in real-time, but the number of pixels you'd need for a given layer can range from thousands to millions. So doable, but probably not easy to visually parse just by looking at it.
@smileyball
8 жыл бұрын
Consider looking into visualization approaches (like saliency/heat maps/deconvolutional neural network) and approaches that focus on maximal activation (like Google's DeepDream)
@MadJDMTurboBoost
6 жыл бұрын
SomethingUnreal Id imagine if it was programmed properly and trained long enough, it may look similar to an fMRI.
Computerphile, you single handedly helped me regain my interest with computer science. Thank you very much for all your videos (:
In the last video I asked how the images were in these various convolutions. I knew that they wouldn't be nothing like the input image, but I was very curious to see the process anyway. And now you make a video answering exactly what I wanted! Thank you so much! :)
So useful. As a CS student, this was more helpful than a ton of other DLNN stuff I've seen online. Thank you!
what a fantastic explanation, I loved the digits convolution representation hope to see more videos about this! (RNNs)
I love this man.
Would love to have someone like him as my professor in my life!
This guy is my second favourite on computerphile. Lovin these demos
Massively interesting and well presented, even for my aging neural network!
Mike and Rob, the stars of computerphile. Great content and nice puns. Keep it up guys
The best tutorial ever! Cheers, Mike!
Oh wow, this video made me understand neurological networks in an insanely deep way. Thank you!
Excellent video! Visual seeing the neurons light up blew me away... It was like looking at an artificial, scaled down brain being imaged...
wow.... I didn't expect to understand any of that, but it was all explained perfectly. It made sense. Awesome video
Fantastic video! Interesting to see "inside the mind" of a neural network
love this series about machine learning
Thank you Mike, and thank you Shaun, this video is really helping me in my quest! I'm making a small game in which I'm trying to make an AI using the tensorflow library.
Please do a video on the maths of forward and back propagation and how they are implemented
Love the visualization!
Thank you so much. This was very helpful.
Doesn't google use those captchas as a crowd sourced labeling technique for their own deep learning stuff?
@emosp0ngebob
8 жыл бұрын
there's a computerphile video on that too somewhere, again a google project. you get shown two words, and the computer knows one of them and not the other, so when you type the two words in the computer learns what a word is. that's for transcribing libraries and things... i cant remember which computerphile video it was though.
@zubirhusein
8 жыл бұрын
They do
@daniellewilson8527
4 жыл бұрын
I wonder if there’s a website that someone can go to to do the image things to help train the deep learning systems?
Great Video!
Dr Mike Pound, please make a tutorial series on q learning! In depth
it was very enjoyable, thanks for the video.
who needs dual monitors when you have dual PC! Great video btw
This really clarified the previous video :)
Brilliant video.
Thank you very much! Very helpful video!
Great video!
Could you maybe link to the actual code? Would be interesting to look at the implementation
That was really interesting! Thanks!
Very enjoyable, thank you!
enjoying the neural net videos. looks like ANNs are coming back in after not really seeing much of it since the 90s. i remember my first exposure to the math ans theory behind this was an assembly program on my 8bit C64 back in the late 80s creating a 3 layer Back Propegation network
@ 5:58 I got the point i am searching for. Thank you very much..
How do you even visualize the output of the NN? Crazy, this perspective is so insightful.
amazing, i didnt know u could visualize the high rank features
i just love this guy
please do more on this
How are the outputs of the multiple kernels at each layer managed? Are they somehow merged so that the kernels of the next layer all process the same input? Or do the 20 kernels of layer 2 operate on the 20 outputs of the layer 1 kernels respectively? And if the latter, then what happens when moving from a 20 kernel layer to a 50 kernel layer? Would some of the 20 kernels of the previous layer be duplicated twice, and others duplicated three times to make up the inputs to the 50 kernels in the new layer?
As always, a splendid video! However, every single clip taken from the angle where the pictures of the convolutions are visible, are out of focus. Pitty
Now I see it, too... For some reason, YT gave me the video at a lower resolution (not watching it in full screen mode, I hadn't noticed), and I was thinking "I don't understand all the people complaining about the video being "wobbly", the video looks fine to me"... Then I saw I wasn't watching it at 50 fps, so I changed the quality. I, too, find it a bit weird. I guess stabilization doesn't quite kick in as hard when there are more frames to interpolate between?
Very cool! @5:24 Grayscale is quite a few bits deep, 1-bit depth would be Black & White ( which is not the case in your images, looks like you have at least 16-bit images - if not 256-bit standard grayscale - )
I'm taking a two credit course in deep learning next week!
With all the edge detection going on, would it be harder to recognize a 4 if some versions had the top parts join at an angle, like the 4 in this font, versus the open version as in the video? Likewise a 7 with or without the strike through it? I mean, does it remember some kind of average of all the objects in a class or all of them / all of the sufficiently different ones (which might be hard for a large database)?
13:16 what he wants to say is that if the images are segmented then its much easier. the segmentation problem is hard. That's why google captchas are all mushed up on each other. Google apparently fixed the segmentation problem by just training it to recognize multiple pairs of letters
Love the out of focus shots on the pictures...
It would have been way more interesting to see different examples of the same number and how it tranlates into the same output.
A thought that I've gotten when thinking about this and the previous episode, would it be possible to "reverse" the order of the convolutional neural network, getting a sort of idealized result, probably not extremely useful in most cases, but likely somewhat usable for seeing what extra data can be used to train it for more accurate results or perhaps some sort of data generation. Doing the same for a standard neural network would not result in any useable data I know, but it seems like it might be possible with the convolutional one.
After watching this, the one thing I don't feel is completely explained is where the convolution kernel values come from. At first he says they are things "like Sobel edge detectors", but later says they are not manually entered, but rather learned values. That leaves the obvious question of how are they initialized? Do they start as just matrices with random entries? During the training, how are they adjusted? Is the "training" some kind of iterative search for kernel values that give the strongest response (e.g. the values that most consistently uniquely identify the one digit being learned and most strongly reject the other 8 digits?) I could use a bit more explanation on what the training process looks like and how it adjusts all the kernels.
I realize this isn't likely to get a reply this late, but I'm trying to replicate the configuration of this network. What activation function are you using for the first fully connected layer? Is it dotplus with a renormalization? I'm assuming FC2 is a softmax layer, so maybe they are both softmax.
wrote python convolution algos on bitmaps around this time just to self learn python, filters and convolutions are amazing to see in action . Its a little scarry to see how far we are now in 2021 . Covid hasnt stopped SW engineers .
very helpful, thanks
Very interesting! I wonder if this gives some insight into how neurones in our brains work on a very basic level?
@cmptrn825
8 жыл бұрын
Points for making me look at my screen with my head turned 90 degrees to the left until I realize I look like a crazy person
Its a really good lecture to understand what is going on inside NN. I am using NN for target classification in thermal images. Is NN is a good approach to do that ? Or I should go for any other option.
I never knew YT even supported 50FPS. :O Also, cool computer learning. Today is a day of new smarts.
so how do the nueral networks do this? is there speed advantages to this network vs just regular processing?
I am one of those strange people who draws a horizontal bar through the number 7. How would you deal with that? Would you need a separate set of 7+bar training digits (in effect an 11th character) and then map both 7 and 7+bar back to 7?
The person who interviews this guy doesn’t ask enough questions.
14:36 google captcha api morphs the filter the more samples you get until the training data is useless, also those image based captchas are also being broken after all the success of imagenet
people interested in this experiment, you can actually do it in the Machine Learning course (Stanford) on Coursera
Can you look at your last but one fully connected layer and calculate the typical "distance" between different digits? E.g. just euclidean distance on the normalized terms in FC1. Would those distances depend on your neural network you're using or would they be similar across all successful neural networks? That is could you say something like a 1 and a 7 are typically closer than a 0 and a 4.
>using windows as a host machine and linux as a guest
@ebolapie
8 жыл бұрын
Why anyone would use HyperV over KVM is beyond me.
@box8250
8 жыл бұрын
It might be university property
@michaelpound9891
8 жыл бұрын
Almost. Actually this was teamviewer - accessing a machine on another campus that is significantly more powerful that the one I have in this office. Deep learning doesn't really work in a guest due to needing the graphics card, which is tricky to get. The machine to the left is also Linux, and is used for deep learning, but someone else was using it at the time of this video!
@iroxudont
8 жыл бұрын
Michael Pound Why not SSH
I imagined that each layer uses all its kernels on all the images of the previous layer. But that can't be right, hearing that the last convolutional layer here only outputs in a size of 50*4*4. Does that mean that there essentially are "kernel pipelines"? So kernel0 of layer1 will only be fed with the output of kernel0 of layer0?
Seeing that a lot of people are confused by this video being 50fps, I'd want to clear that up. 50fps is a standard frame rate for television and video in general. 60fps is a standard for animated and generated images, like animations, or games. Sure, you can do either with both, but it's generally so that high-frame-rate TV broadcast are always 50, not 60 fps. The scale for TV: 25, 50, 100, 200 Hz The scale for Computers: 30, 60, 120 Hz (Hz = fps)
You look like smart guy. Well done
can you do a episode on histograms of oriented gradient
really interesting! I would be interested tot see if it is possible to start from the final convolution and see which image fits it the best, as in 'what looks the most like a 2'.
@ruben307
8 жыл бұрын
it would be interesting to know if there can be totally different pictures that just would get the same number. Similar to a hashing collisions.
@black_platypus
8 жыл бұрын
Well sure, that's basically the same concept (irreversible / one-way transformations giving you an abstract result)
@quakquak6141
8 жыл бұрын
I saw somewhere a neural network that was trained to fool convolutional neural networks, sometimes it produced normal images (in this case it would have produced a 2) other times it produced something that looked almost like pure noise but it was still able to fool the networks
Shouldn't one be able to generate characters(letters, whatever) by going the other way around? I'm thinking what if you tell it to generate a picture from a fully connected layer?
Can you share the caffe scripts you used please?
excellent..
amazing
at one point binary flashed on the screen as a background, here's a part of the translation: E¤ªÉü
would i be wrong in thinking that if you gave a convolutional neural network the ability to control where to click and what to type and gave it enough convolutions and kernals (perhaps beyond what current computers can handle) and trained it enough then it would be able to solve any captcha, even a new one with different interface that still used the same basic principles?
The best explanation of CNN's thanx
It would be nice, if you talked a bit about how much data is needed for a CNN to be any kind of useful. The datasets in this video seem extremely big. Specifically it would be nice to have an idea on how well it works on many "categories" with a low amount of data.
Excuse me if I have missed something obvious, but I'm not sure I understand what the input of, say, C2 is. Is it a sort of average of all of the images produced by C1?
If the first convolution layer has 20 filter and the second one has 20, does thing mean that each C2 filter processes all 20 images from C1? That would make 400 images for C2 output
i am so fascinated by artificial neural networks .
@theepicguy6575
4 жыл бұрын
Not so much of you started doing the actual math and understand exactly why it extracts such specific features from an image
How do you decide what the convolution kernels should be? Is that important, or could they be defined randomly at the beginning?
@rory4987
4 жыл бұрын
Neural network weights are set randomly and then learnt
Let's see if someone can help me out here. The first layer here outputs 20 24×24 images (or a 20 channel image) after performing all the convolutions. The second layer will output 20 20×20 images. But how are they constructed? How do they combine the 20 channels from the previous layer? I mean, they are not applying all 20 filters to each of the 20 channels, that'd be a 400 channel output. Do they simple add the convolutions for each channel up? So channel 1 of layer 2 is the sum of the convolution between kernel 1 of layer 2 with each of the 20 channels of layer 1?
The first layer looks like something Andy Warhol would do.
MOOOOAAARRRR !!!!
How to you replicate the learned connections to other systems? How is the "knowledge" abstracted for transport, backup, and further improvements? With discrete programming, the instructions are compact and finite and are easily copied.
I wonder, can you work backwards somewhat to get a general idea of what the original image looked like from the convolution layers?
@RoySchl
8 жыл бұрын
i don't think so, that would be like asking "what 5 numbers did i multiply to get 3600?" there is only on possibility if you do the multiplication, but many possibilities when you try to guess backwards, and with those convolutions it's the same thing just exponentially worse. basically you drop a lot of information
@DagarCoH
8 жыл бұрын
Well, I'd say yes, as if you have all the information every layer puts out, you just have to reverse the process the first layer did on the data it gave. Since you have many processes on the one image, there should be much redundancy and therefore a high certainty. If you however only have the output of the sixth convolution layer, I highly doubt that you could get much out of it.
@compuholic82
8 жыл бұрын
Partially. The problem is that (in general) a convolution is not a reversible operation. However, you can apply something that is known as a "matched filter" which is basically a convolution with the transposed filter kernel. If you go backwards through the network you can (to some degree) reconstruct the input signal. If you look at this paper you can see how the reconstructions look like: arxiv.org/pdf/1311.2901v3.pdf And just to prevent confusion: The author calls it "Deconvolution". But he isn't doing a "deconvolution" as he describes in his paper. He is applying a "matched filter".
how do you know how many kernels, layers, etc. are best suited for your needs?
@m3n4lyf
5 жыл бұрын
That is an excellent question! Unfortunately it requires at least a moderate amount of knowledge in the subject matter to answer, so I doubt you'll be getting a satisfactory response from this resource any time soon.
So would it be possible to use convolutional neural networks for something general like arbitrary image matching or are they limited to narrowly trained applications like the one here?
@2Cerealbox
8 жыл бұрын
You can. I think google uses a neural net for their "visually similar images" feature.
@paulhendrix8599
8 жыл бұрын
nice picture.
Hey I have a question. After the first conv layer, we are left with 20 images of 24*24 pixels. Do these 20 images transform into one 24*24 sized image, to be given as an input to the next conv. layer?
@TheAbdelwahab83
2 жыл бұрын
No, after the first conv layer you have like a volume (24*24*20), and it is the input to the next conv layer of size ( 5*5*20), so if you apply this kernel to that input volume you'll get one image of size 20*20, and because you have 50 filters of (5*5*20) so your output will be 20*20*50
Why are there 4x4x50 neurons after the last conv-layer? I get 4x4x(20^2)x(50^4) neurons, if every 5x5 kernel runs over every image from the previous layer. I'm confused.. maybe the kernels in the following layers are 3-dimensional? Like 20 5x5x20 kernels in the second layer?
@feliceserra106
8 жыл бұрын
Now I understand. Thanks a lot!
@feliceserra106
8 жыл бұрын
Got it, thank you!
@tenalexandr1991
8 жыл бұрын
I have the same puzzle. Could you enlighten me?
@feliceserra106
8 жыл бұрын
In short: "Each kernel is 5x5xD, where D is the number of features in the previous layer" I dont know why their answers are not showing up on youtube. Maybe a google+ thing.
Is there any chance you could upload a copy of the source code for the CNN some where? (or even pseudo code) I am sure many people would greatly appreciate it :D
The 50fps makes the wobble more noticeable.
+1 for running GNU/Linux :)
What if the training images have digits drawn at different scales?
I am curious, would it be possible to run this sort of neural network in reverse in order to produce the sort of "Deep Dream" images that you can see on the Internet? For instance, instead of asking the network 'what digit dose this image resemble?', ask 'what dose a 2 look like?'
@gpt-jcommentbot4759
Жыл бұрын
yes thats what deep dream is
how were the kernels generated for this one?
I always do a *very* firm two. 06:55