Easy Web Scraping in Python using Pandas for Data Science

Ғылым және технология

In this video, I will be showing you how to easily web scrape data from websites in Python using the pandas library. Particularly, the read_html() function or practically pd.read_html() will be used to extract table data of National Basketball Association (NBA) player stats from www.basketball-reference.com/. We will then do some cleanup to produce the final data in the form of a dataframe. Finally, we will be doing a quick exploratory data analysis by making histogram plots.
🌟 Buy me a coffee: www.buymeacoffee.com/dataprof...
📎CODE: github.com/dataprofessor/code...
⭕ Playlist:
Check out our other videos in the following playlists.
✅ Data Science 101: bit.ly/dataprofessor-ds101
✅ Data Science KZreadr Podcast: bit.ly/datascience-youtuber-p...
✅ Data Science Virtual Internship: bit.ly/dataprofessor-internship
✅ Bioinformatics: bit.ly/dataprofessor-bioinform...
✅ Data Science Toolbox: bit.ly/dataprofessor-datascie...
✅ Streamlit (Web App in Python): bit.ly/dataprofessor-streamlit
✅ Shiny (Web App in R): bit.ly/dataprofessor-shiny
✅ Google Colab Tips and Tricks: bit.ly/dataprofessor-google-c...
✅ Pandas Tips and Tricks: bit.ly/dataprofessor-pandas
✅ Python Data Science Project: bit.ly/dataprofessor-python-ds
✅ R Data Science Project: bit.ly/dataprofessor-r-ds
⭕ Subscribe:
If you're new here, it would mean the world to me if you would consider subscribing to this channel.
✅ Subscribe: kzread.info...
⭕ Recommended Tools:
Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!
✅ Check out Kite: www.kite.com/get-kite/?...
⭕ Recommended Books:
✅ Hands-On Machine Learning with Scikit-Learn : amzn.to/3hTKuTt
✅ Data Science from Scratch : amzn.to/3fO0JiZ
✅ Python Data Science Handbook : amzn.to/37Tvf8n
✅ R for Data Science : amzn.to/2YCPcgW
✅ Artificial Intelligence: The Insights You Need from Harvard Business Review: amzn.to/33jTdcv
✅ AI Superpowers: China, Silicon Valley, and the New World Order: amzn.to/3nghGrd
⭕ Stock photos, graphics and videos used on this channel:
✅ 1.envato.market/c/2346717/628...
⭕ Follow us:
✅ Medium: bit.ly/chanin-medium
✅ FaceBook: / dataprofessor
✅ Website: dataprofessor.org/ (Under construction)
✅ Twitter: / thedataprof
✅ Instagram: / data.professor
✅ LinkedIn: / chanin-nantasenamat
✅ GitHub 1: github.com/dataprofessor/
✅ GitHub 2: github.com/chaninlab/
⭕ Disclaimer:
Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.
#dataprofessor #pandas #scraping #web #pd #webscraping #readhtml #scrape #webscrape #datascrape #datascraping #scrapingdata #scrapedata #dataframe #dataframes #jupyternotebook #jupyter #googlecolab #colaboratory #notebook #machinelearning #datascienceproject #randomforest #decisiontree #svm #neuralnet #neuralnetwork #supportvectormachine #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #machinelearningmodel

Пікірлер: 123

  • @KenJee_ds
    @KenJee_ds4 жыл бұрын

    I didn't know about this pandas functionality! Great video!

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Wow, it's Ken Jee! Thanks for the comment and kind words! I also subscribe to your channel, great content by the way, especially the 6-part DS project from scratch series.

  • @KenJee_ds

    @KenJee_ds

    4 жыл бұрын

    @@DataProfessor Thanks! I am loving your stuff as well. I need to start using colab more. Keep up the good work, the tutorials are very helpful!

  • @karthiavenger4577

    @karthiavenger4577

    3 жыл бұрын

    You great bro Down to earth

  • @muhammadjamalahmed8664
    @muhammadjamalahmed86644 жыл бұрын

    Please don't stop making videos. These videos really helps alot.

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thank you, glad it was helpful!

  • @TcRiverrat18
    @TcRiverrat18 Жыл бұрын

    Excellent work breaking this down. I have only used R, but this seemed incredibly intuitive. Thank you!

  • @HVjugo
    @HVjugo3 жыл бұрын

    I used this before, but I didn't knew that you can select the table using the brackets, awesome! Thanks for the video!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Glad it's helpful, thanks for watching!

  • @monicadesai7928
    @monicadesai79283 жыл бұрын

    Great Explanation of each step....right from opening file to end....because sometimes as a newbie we find difficult to which file to use from github also.....Thank you ....Great Video!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Wow thanks for the encouraging words, glad you’ve found the video helpful 😊

  • @da_ta
    @da_ta4 жыл бұрын

    Great well explained clear and excellent quality of sound. Thanks for doing this keep it up!

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks for the encouragement 😃

  • @melshae8630
    @melshae86302 ай бұрын

    Wow your video is the best , it took me forever to run this .This video helped me in 5 min. Thank you !!!

  • @nickolaisimmons4638
    @nickolaisimmons46382 жыл бұрын

    Wow this is a great video! Very well organised!

  • @givansot4581
    @givansot45812 жыл бұрын

    thanks a lot. I am doing a machine learning project and do web scraping in the same code...thanks this is better

  • @prashant381
    @prashant3812 жыл бұрын

    A query, in row 12 , why are we using .index along with df.drop ? why wouldn't df.drop work without it ?

  • @soufianelamsiah4337
    @soufianelamsiah43373 жыл бұрын

    what would be best for comparing prices between competitors?

  • @Moonlight-jx2sj
    @Moonlight-jx2sj3 жыл бұрын

    Amazing! your video helped me with my 1st homework in Data Mining. And also thinking to jump into data science, so Thank you so much! Like and Subscription!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Glad I could help! And welcome to Data science!

  • @legacylifey182
    @legacylifey1823 жыл бұрын

    Thank you so much for this concept it was really helpful respect !

  • @randyluong6275
    @randyluong62754 жыл бұрын

    this tutorial gets my subscription. Thank you Professor. :)

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Wow, glad to hear that, welcome aboard 😃

  • @vyacheslavgorkunov3790
    @vyacheslavgorkunov37904 жыл бұрын

    Thx for the video, was really helpful. I wish u more subscribers, man ;)

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks for the support! 😃

  • @manishabheemanpelly3580
    @manishabheemanpelly35803 жыл бұрын

    Thank you so much for this concept it was really time saving one!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Glad it was helpful!

  • @usmanafridi9668
    @usmanafridi96683 жыл бұрын

    Amazing! I am totally new to web scraping. I tried to scrape the website using beautiful soup library for 4 days now, but I can't get past the basics. You have extremely simplified it for me. For instance, I just scraped data from Wikipedia about the list of countries and their population and got the whole table in the first attempt. Thank you so much! I wonder if this can be used for other pages like LinkedIn, Glassdoor data collection? Because there are no tables there. Professor, thank you so much once again!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Glad to hear that the video was helpful! For non-tabular pages you may have to use beautifulsoup and/or selenium

  • @rogerwprice
    @rogerwprice4 жыл бұрын

    Fabulous - it's soooo easy when you know how!

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks for watching Roger, absolutely agreed with that 😃

  • @blankmedia01
    @blankmedia013 жыл бұрын

    Hey I tried using the code on Wikipedia to scrape tables on Wikipedia. When it comes to scraping on place with loads of other data and i just want to pull the table alone is there a method for that? As with current code im pulling whole page. And I just want the playoff stats... i think I'm supposed to creat dictionary then assign it to a dataframe but I dont know how when it comes to urls and websites.

  • @priyalshah8869
    @priyalshah88692 жыл бұрын

    How do I keep the url that the coloum tm has in my dataframe?

  • @engr.inigo.silva2000
    @engr.inigo.silva2000 Жыл бұрын

    Bravo Data Professor, nice lecture!

  • @AmitKumar-hm4gx
    @AmitKumar-hm4gx2 жыл бұрын

    Do you know if we can use this to scrape sites built with dynamic JS, and how do we do this if we have to login ?

  • @kalyanprasad4069
    @kalyanprasad40693 жыл бұрын

    How do we deal when we encounter the error "HTTP Error 403: Forbidden" while reading url with Pandas? How should we proceed in this case? Kindly advise.

  • @shankaricharan510
    @shankaricharan5104 ай бұрын

    Thanks a lot - this helped a lot.

  • @cllim80
    @cllim803 жыл бұрын

    Thank you for the clear explanation !

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    A pleasure! Thanks for watching 😃

  • @luciferkhusrao
    @luciferkhusrao4 жыл бұрын

    Awesome work by the hero! Keep teaching like this

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks for the encouragement 😃

  • @RyanLoh
    @RyanLoh2 жыл бұрын

    Can you also use df2019(df2019[‘Age’] == ‘Age’) to find the ages containing the word ‘Age’?

  • @ekoatm1914
    @ekoatm19143 жыл бұрын

    Matur nuwun sanget sedulur....

  • @pauloreis8868
    @pauloreis88684 жыл бұрын

    Hi, Professor! Thank you for the contents you brings to us, it really helps! \o/ Lately, I've been asking myself: How important is web scraping for a data scientist? How often do you web scrape? I just started learning it, I'll keep going and I wanted to know your thoughts about its relevance.

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Hi Paulo, webscraping comes in handy when you want to create your own dataset from available data on the internet. For example, you want to analyze the salary of data scientists from glassdoor database then you can do that with webscraping. Hope this helps 😃

  • @amoahs7779
    @amoahs77793 жыл бұрын

    Hi professor I truly enjoy your videos and have learnt a lot may God keep you successful in life. A question that's been on my mind is what laptop do you use as I really like the keyboard sound when you type unless you are using a external keyboard. Is it possible for you to show us a set-up of your desk ? Kind regards

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Hi, I'm using a MacBook Pro (2016) and yes the keyboard feel is good on this laptop although being a bit flat which is a good thing as it allows minimal effort in moving from one button to the next.

  • @vaasudhfp2874
    @vaasudhfp28743 жыл бұрын

    not working for other sites i did it for tripadvisor nothing came

  • @fazlaynur4509
    @fazlaynur45093 жыл бұрын

    Thanks bro, for your nice tutorials

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    It's my pleasure

  • @spacebird9430
    @spacebird94303 жыл бұрын

    hey professor, thankyou for the content. but i was wondering when we are scrapping by just passing the link how does it know to only read data from the table and not any other information.

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Hi, the function will detect HTML syntax. The syntax for tables in HTML is and the read_html() function finds these to figure out that they are tables and extracts the data.

  • @wisjnujudho3152
    @wisjnujudho31522 жыл бұрын

    this is exciting. i love pandas

  • @tannyamishra9291
    @tannyamishra92912 жыл бұрын

    Can you please explain how to read all the retrieved urls

  • @danniliu2544
    @danniliu25442 жыл бұрын

    Hi Data Professor, thanks for this video. It's very helpful. I'm a newbie starting out in data science and web scraping. Just wondering can you use pandas functionality for scraping data that are not laid out in table? and how would you do that? could you perhaps create a video on scraping non tabular data if you haven't already?

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    Great question, to web scrape non-tabular data you can look into using beautiful soup and also selenium libraries for Python

  • @danniliu2544

    @danniliu2544

    2 жыл бұрын

    @@DataProfessor thank you for the pointer, much appreciated!

  • @Panucci75

    @Panucci75

    2 жыл бұрын

    Exactly the question I was gonna ask. Thanks.

  • @nourarifi2642
    @nourarifi26424 жыл бұрын

    thank you for your video my question if there are many tables in so many pages (20000 page) what should I do ???

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    The pandas read_html function is suitable for a simple webpage with relatively few tables. For more complex and large volume of pages I would recommend to look into beautifulsoup and selenium.

  • @piyushyadav7162
    @piyushyadav7162 Жыл бұрын

    Hi! ken jee, I try your code of web screping on kaggle but I'm getting RLError: error. i try to solve but i cannot resolve ...please give me your suggestions

  • @DataProfessor

    @DataProfessor

    Жыл бұрын

    Hi Piyush, The pandas library allows scraping webpages that have tabular data such as from Wikipedia. It is really limited to those with a predefined table format. To scrape webpages I'd recommend looking into selenium and beautifulsoup

  • @XoreLP
    @XoreLP3 жыл бұрын

    Why did you use string.format instead of String concatination

  • @sanjj_1
    @sanjj_12 жыл бұрын

    f strings are more readable compared to the .format() method

  • @kwanpakshing
    @kwanpakshing3 жыл бұрын

    The video is great. But the screen text us way too small to read. Suggest that you can enlarge the font or reduce the white space in the screen to make the video no e readable

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Thanks for the suggestion, greatly appreciate it, yes in recent videos I have increased the font size.

  • @markslima1557
    @markslima1557 Жыл бұрын

    very cool thanks!

  • @badraboufirasse433
    @badraboufirasse4334 жыл бұрын

    Very helpful thank you!

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks Badr for the kind words!

  • @sangpark7656
    @sangpark76563 жыл бұрын

    Hi Professor does the original data need to be a html file to start with? Does the original data always need to have a table to extract data?

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Yes to both questions, that’s the limitation of this approach. Other than that selenium + beautifulsoup is a good combo to look into.

  • @sangpark7656

    @sangpark7656

    3 жыл бұрын

    I see. Thank you very much for the guidance!!@@DataProfessor

  • @narongtumsri-ubol1737
    @narongtumsri-ubol17373 жыл бұрын

    thank for knowledge

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    A pleasure, thanks for watching

  • @harshitsharma8131
    @harshitsharma81312 жыл бұрын

    what if there is no table on a web page ??

  • @lucianodomingues2290
    @lucianodomingues22903 жыл бұрын

    Great video Professor!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Glad you liked it!

  • @salikmalik7631
    @salikmalik76314 жыл бұрын

    Really awesome.. Data Professor

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Salik, Thanks!

  • @kennykern6292
    @kennykern62924 жыл бұрын

    This helped thanks!

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Glad it helped!

  • @sameermehdi3143
    @sameermehdi31432 жыл бұрын

    Thankyou so much sir

  • @raphaellutz2693
    @raphaellutz26932 жыл бұрын

    Very nice video

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    Thanks :)

  • @Troglodyte2021
    @Troglodyte20213 жыл бұрын

    A great tutorial!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Thank you!

  • @lyhuutai3339
    @lyhuutai33393 жыл бұрын

    how to save df to excel ? please

  • @nowdevoted1649
    @nowdevoted16493 жыл бұрын

    Superb, let me bring you some more guys to your channel

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Awesome, welcome to the channel!

  • @nowdevoted1649

    @nowdevoted1649

    3 жыл бұрын

    @@DataProfessor 🙏

  • @aniwahidaabdulrahim2538
    @aniwahidaabdulrahim25383 жыл бұрын

    Hello Professor, I would like to suggest you to publish a video about RSelenium which use with Selenium Webdriver for automation system testing :D Hope it may benefits others. This is just my humble suggestion.

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Great suggestion! I have played around with Selenium for Python and have found it pretty powerful. What I made so far was a short script that can take screenshots of my youtube channel's page (or any webpage).

  • @argiepoul7457
    @argiepoul74572 жыл бұрын

    What are the prerequisites to watch this tutorial? I know some python, is this ok?

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    Yes, beginner’s level of Python is sufficient to follow along.

  • @mj7146
    @mj71464 жыл бұрын

    Great content ! Any idea on how I can scrape data for example from linkedin Jobs Postings. I found Octoparse for this, any ideas?

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks Mert for the kind comment. pandas works only for tabular data from webpages. For linkedin posts, we'll probably have to use beautiful soup for that. I might make a future video about that, will put it into the to-do list.

  • @mj7146

    @mj7146

    4 жыл бұрын

    Data Professor thank you 🙏

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    @@mj7146 A pleasure!

  • @oguguaonyinyechi4980

    @oguguaonyinyechi4980

    4 жыл бұрын

    @@DataProfessor Hi Data Professor, we are still expecting this :grin:

  • @moatasimashraf6818
    @moatasimashraf68183 жыл бұрын

    (ImportError: lxml not found, please install it) I got this error. what is the solution?

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Hi, you can install lxml via pip install lxml

  • @moatasimashraf6818

    @moatasimashraf6818

    3 жыл бұрын

    @@DataProfessor Done it, thank U

  • @saulo_foot
    @saulo_foot3 жыл бұрын

    Every link turns into a df. How can I concatenate all the dfs?

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Hi, dfs can be concatenated using the pd.concat() function, you can play around with axis=0 or axis=1 depending on how you want to combine the dfs (side by side or stacked on top of the other)

  • @shwetaredkar734
    @shwetaredkar7344 жыл бұрын

    Informative.

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks Shweta for the kind comment!

  • @mootaz3944
    @mootaz39442 жыл бұрын

    i try it on ur channel ( just for testing lol )

  • @Papiii_benz
    @Papiii_benz3 жыл бұрын

    Thanks !

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Thanks for watching!

  • @jojushaji3010
    @jojushaji30103 жыл бұрын

    Ure awesome sr

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Thanks for the kind words

  • @qi8983
    @qi89832 жыл бұрын

    Awesome

  • @tareqmahmud3902
    @tareqmahmud39022 жыл бұрын

    You look like jomatech's big brother :O

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    Haha, I get that a lot. Joma and I should do a collab video 😆

  • @tareqmahmud3902

    @tareqmahmud3902

    2 жыл бұрын

    @@DataProfessor But Sir I learned a week's lesson from one of your 10 minute video. I can't be more grateful to you. Thank you.

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    @@tareqmahmud3902Thanks, glad to hear that they’re helpful! 😊

  • @alexwatson6370
    @alexwatson63703 жыл бұрын

    Don't name your variables str or you will shadow the string builtin

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    You're right, many thanks for pointing that out, why did I do that. I've changed it to url_link now.

  • @ishpandey7886
    @ishpandey78863 жыл бұрын

    Is this useful for every situation? I am trying to fetch data from glassdoor but this method is not working Link: "www.glassdoor.co.in/Job/bengaluru-data-analyst-jobs-SRCH_IL.0,9_IC2940587_KO10,22.htm"

  • @lolsucks3599
    @lolsucks35992 жыл бұрын

    Is there an api for sports results? or you have to do it via web scraping?

Келесі