Data Analysis 8: Classifying Data

Data Analysis 8: Classifying Data - Computerphile

For your eyes only! Classifying data isn't a spy trick. Dr Mike Pound creates a decision tree automatically from a data set. This is part 8 of the Data Analysis Learning Playlist: • Data Analysis with Dr ...
This Learning Playlist was designed by Dr Mercedes Torres-Torres & Dr Michael Pound of the University of Nottingham Computer Science Department. Find out more about Computer Science at Nottingham here: bit.ly/2IqwtNg
This series was made possible by sponsorship from by Google.
The Credit approval dataset can be found here: archive.ics.uci.edu/ml/datase...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 38

@Computerphile5 жыл бұрын
Check out the full Data Analysis Learning Playlist: kzread.info/head/PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba
@zerokelvin36265 жыл бұрын
Great video! This training, validation and testing is relevant for modeling and simulation in general, and you would be surprised how many scientists and practitioners get this wrong.
@randomnessgameful5 жыл бұрын
Love this series!
@jurietheron4 жыл бұрын
What a fantastic series! Will definitely rewatch it. I would love a video about image classification and validating results, confusion matrix ext.
@potatoMaster-wr3jz7 ай бұрын
You explained so many machine learning concepts easily within 15 minutes of this video. But this video aint popular like your cryptography and cybersecurity stuff,explains what the general audience likes
@andresg3110 Жыл бұрын
You are absolutely handsome and brilliant! I'm so happy to learn from you such a smart and kind soul thank you for sharing your talent with the world
@DerDieDasBoB5 жыл бұрын
Love the videos! He is really a good teacher - thanks for all the good explanations. but when i see the paper he draws on, it reminds me on 80's printer paper....are they still in use or what is it for?
@WhompingWalrus
5 жыл бұрын
Idk if it's true or not, but I've heard that some universities bought a quinjabillion metric clucktonnes of that paper way back when it was expected to be used massively for a long time, so they hand it out gladly to whoever has a use for it now.
@jasonspence
5 жыл бұрын
That's exactly what it is, and it's the standard Computerphile paper in all of the videos
@veeek8
2 жыл бұрын
Yeah nice touch isn't it, makes me feel like it's the 80s again 😂
@heyandy8894 жыл бұрын
that's pretty wild that you can automatically create a reasonable decision tree to classify arbitrary data towards an arbitrary target attribute. likewise one could imagine targeting the decision tree towards gender, or income; it sounds like the algorithm doesn't care, it just uses clustering techniques to best group the data to predict the target attribute.
@Fractus4 жыл бұрын
The use of 'precision' here sounds more like 'accuracy' in a truly scientific sense, that being how well it reflects a 'true' or correct outcome. In this vein 'precision' would be more like the ability of the system to repeatedly classify similar data, or the same sets, to the same outcome.
@jlopezg8
4 жыл бұрын
In classification, the definitions for precision and accuracy differ from those commonly used in science. Precision is defined as the proportion of instances correctly classified as positive (true positives) among all the instances classified as positive (true positives + false positives). Accuracy, on the other hand, is defined as the proportion of instances classified correctly (true positives + true negatives) among all instances. So, for example, imagine 100 people take a medical test. 20 are diagnosed with a disease, and among those, 15 do have the disease. Furthermore, of the 80 people not diagnosed with the disease, 5 do have the disease, so 75 people are correctly classified as not having the disease. As a result, the precision of the test is 15/20 = 75%, while the accuracy of the test is (15+75)/100 = 90%.
@jorgefontenlagonzalez84122 жыл бұрын
I loved the series, but I got a bit lost with this video. How does the content of video #8 relate to what was explained up to now? Does video #8 continue where video #7 left off, or does it take its output as an input in some way?
@4.0.44 жыл бұрын
I really want a video just on Support Vector Machines! (Example: why would a traditional neural network outperform it?)
@onuktav5 жыл бұрын
Computer says no 😁
@ramixnudles79585 жыл бұрын
How is "validation" different from "testing"?
@MusicBent
5 жыл бұрын
Ramix Nudles here is how I imagine it. The training data was used for training your model (obviously) so running the model on training data will always show 100% accuracy. The testing data is used by the model developer and is used to analyze he performance. The developer can look into the results and see any obvious mistakes and try and correct for them. The validation data would remain invisible to the developer, and would represent ‘new’ data points that the mode would see in the real world after the model has been developed and deployed. It should also perform well for on this with 0 developer interaction or knowledge of the data.
@MusicBent
5 жыл бұрын
Also, nice profile pic 👌🏻
@ramixnudles7958
5 жыл бұрын
@@MusicBent :-D
@jlopezg8
4 жыл бұрын
@@MusicBent Pretty much, but you mixed up test and validation data. Validation data is used to evaluate the model after training, or even while it's training on the training data, and see if it needs tweaking to improve its performance. But to make sure we ourselves don't overfit the model to the validation data, we evaluate the model on data unseen by the model (test data) to give a final unbiased assessment of its performance.
@synchro-dentally19653 жыл бұрын
I'm not sure what the majority of medical doctors would have to say, but I do hear apprehension on the use of AI to aid in diagnosing patients. Which is interesting, because wouldn't it just be another useful tool at their disposal, such as a stethoscope?
@leantide78805 жыл бұрын
So if the data set contains such attributes as gender, race, religion, languages spoken, etc., the machine learning could make modeling decisions on loan approvals for instance heavily based on such factors. Interesting.
@SiddharthPrabhu1983
5 жыл бұрын
Yes. That's precisely why ethics in AI is such a growing concern. Many organizations are working to ensure that these kinds of biases do not inadvertently (or intentionally) make their way into ML-driven decision engines.
@snippletrap
4 жыл бұрын
Only if those attributes are positively correlated with, say, debt default.
@abhishektyagi44285 жыл бұрын
Sir Could you please make a video explaining the resources you use to learn or enhance your programming skills
@heyandy889
4 жыл бұрын
have a look at reddit.com/r/learnprogramming
@abhishektyagi4428
4 жыл бұрын
@@heyandy889 thanks a lot
@Acampandoconfrikis3 жыл бұрын
I'm passing this exam thanks to you lol
@grainfrizz5 жыл бұрын
Neural network is gonna beat KNN, Tree, and SVM. But, no, I don't watch Siraj Raval anymore.
@hammad87075 жыл бұрын
lol ok
@KilgoreTroutAsf5 жыл бұрын
So data classifiers are a new way of building uncompromising bureaucratic rules that escape peer-review and public oversight and not even their creators understand. Got it.
@4.0.4
4 жыл бұрын
And that can be demonstrably (statistically) fairer (more likely to predict if you'll pay back your debt or not) than any human who decides based on emotion.
@KilgoreTroutAsf
4 жыл бұрын
@@4.0.4 What a wonderfully naive response.
@clarkkentglasses6443
4 жыл бұрын
@@4.0.4 who says the training data isn't biased?
@quillaja
3 жыл бұрын
I love this comment.