How to implement PCA (Principal Component Analysis) from scratch with Python

In the 7th lesson of the Machine Learning from Scratch course, we will learn how to implement the PCA (Principal Component Analysis) algorithm.
You can find the code here: github.com/AssemblyAI-Example...
Previous lesson: • How to implement Naive...
Next lesson: • How to implement Perce...
Welcome to the Machine Learning from Scratch course by AssemblyAI.
Thanks to libraries like Scikit-learn we can use most ML algorithms with a couple of lines of code. But knowing how these algorithms work inside is very important. Implementing them hands-on is a great way to achieve this.
And mostly, they are easier than you’d think to implement.
In this course, we will learn how to implement these 10 algorithms.
We will quickly go through how the algorithms work and then implement them in Python using the help of NumPy.
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: www.assemblyai.com/?...
🐦 Twitter: / assemblyai
🦾 Discord: / discord
▶️ Subscribe: kzread.info?...
🔥 We're hiring! Check our open roles: www.assemblyai.com/careers
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #DeepLearning

Пікірлер: 23

  • @luis96xd
    @luis96xd Жыл бұрын

    Wow, amazing video of the course! I liked the theory part and how it is implemented with numpy 😄👍 It was all well explained, thanks! 😄👏💯😁

  • @martinemond1207
    @martinemond12075 ай бұрын

    How would you go about reconstructing the original data from the X_projected based on PC1 and PC2, which kept only 2 dimensions from the original 4 dimensions?

  • @michelebersani7294
    @michelebersani72942 ай бұрын

    Good morning, this playlist is amazing and I was searching it for several weeks. I have a question about the interpretection of the eigenvectors. Why do the eigenvectors, of the covariance matrix, point in the direction of maximum variance?

  • @kachunpang7543
    @kachunpang7543 Жыл бұрын

    HI, I am wondering the output of 'np.linalg.eig(cov)' in line 20. According to NumPy documentation the first output is the eigenvalues and the second should be set of eigenvectors stored inside a matrix. However, in line 20 the you swap the names between eigenvector and eigenvalues but still get a pleasant plot after PCA. Could someone explain this part to me? Thanks.

  • @dylansavoia5755

    @dylansavoia5755

    Жыл бұрын

    Great observation and I think you're right, in fact. I've run the code swapping the two variable - i.e. eigenvalues, eigenvectors = np.linalg.eig(cov) - and you get a different plot. This wouldn't make sense as you cannot multiply a matrix and a vector if the dimensions aren't appropriate, but for how numpy works, I suspect there is an implicit "broadcasting" happening at np.dot in the transform() method (line 35) making the operation possible. TL;DR: Numpy doesn't raise an error, but the result you get is in fact wrong.

  • @yusmanisleidissotolongo4433
    @yusmanisleidissotolongo44333 ай бұрын

    Thanks so much for sharing.

  • @pranavgandhiprojects
    @pranavgandhiprojects8 ай бұрын

    Loved the vedio.....thanks man

  • @ASdASd-kr1ft
    @ASdASd-kr1ft Жыл бұрын

    Nice video!, but i have one doubt, why you have more variance in the principal component 2 than principal component 1, is it cuz the scale?

  • @MinhNguyen-cl9pq
    @MinhNguyen-cl9pq11 ай бұрын

    Line 19 seems to have a bug, as return values should be swapped based on Numpy documentation

  • @ernestbonat2440
    @ernestbonat2440 Жыл бұрын

    You should implement PCA with NumPy only. In fact, you need to use NumPy everywhere possible. The NumPy is the faster Python numerical library today. We should not teach based on some student understanding definition. We should teach students with real Python production code for them to find a job only. Everyone needs to pass the job interviews.

  • @iDenyTalent

    @iDenyTalent

    Жыл бұрын

    stop talking grandpa

  • @leoai0

    @leoai0

    Жыл бұрын

    @@iDenyTalent

  • @eugenmalatov5470
    @eugenmalatov5470 Жыл бұрын

    Sorry, the theory part did not explain anything to me

  • @thejll
    @thejll Жыл бұрын

    Could you show how to do pca with gpu?

  • @gokul.sankar29

    @gokul.sankar29

    Жыл бұрын

    you could try to use pytorch and replace the numpy arrays with pytorch arrays and similarly replace numpy functions with pytorch functions. You will have to read up a bit on how to use gpu with pytorch

  • @business_central
    @business_central Жыл бұрын

    all the ones explained by the girl are very clearly explained and walked through, this guy seems he just wants to be done and he is not really explaining much at all.

  • @igordemetriusalencar5861
    @igordemetriusalencar5861 Жыл бұрын

    Excellent video and beautiful OOP python programming, clean and easy to understand for a programmer, but OOP in data analysis is terribly ugly and not productive with a lot of not necessary abstraction with classes and methods. The functional paradigm is way way better for data analysis due to its easy (initial) concepts of data flow and functions that transform the data. This way anyone that learned "general system theory" could understand (managers, biologists, physicists, psychologists...) if you could do the same in a functional way would be amazing! (in Python, R, or Julia).

  • @0MVR_0
    @0MVR_03 ай бұрын

    > states 'from scratch' > proceeds to import numpy

  • @prithvimarwadi345

    @prithvimarwadi345

    3 ай бұрын

    well numpy is just a mathematical computational tool, you are using it to make your life simpler. from scratch means you are not using models already made by other people

  • @0MVR_0

    @0MVR_0

    3 ай бұрын

    @@prithvimarwadi345 proceeds to import numpy.cov and numpy.linalg.eig and calls the method 'from scratch'

  • @user-ns3ip9ub1c

    @user-ns3ip9ub1c

    Ай бұрын

    Are you asking to code from an assembly language standpoint?

  • @0MVR_0

    @0MVR_0

    Ай бұрын

    @@prithvimarwadi345 I would dispute that from scratch means translating all relevant mathematical equations into plain python algorithms. Principle Component Analysis can be shown through eigenvectors and linear algebra. Relying on imports is honestly lazy when exemplifying the process. I am going to refuse acknowledging the comment on assembly language.

  • @chyldstudios
    @chyldstudios Жыл бұрын

    You should implement PCA without using Numpy, just vanilla python (no external libraries). It's more pedagogically rigorous and leads to a deeper understanding.