Easy Web Scraping in Python using Pandas for Data Science
Ғылым және технология
In this video, I will be showing you how to easily web scrape data from websites in Python using the pandas library. Particularly, the read_html() function or practically pd.read_html() will be used to extract table data of National Basketball Association (NBA) player stats from www.basketball-reference.com/. We will then do some cleanup to produce the final data in the form of a dataframe. Finally, we will be doing a quick exploratory data analysis by making histogram plots.
🌟 Buy me a coffee: www.buymeacoffee.com/dataprof...
📎CODE: github.com/dataprofessor/code...
⭕ Playlist:
Check out our other videos in the following playlists.
✅ Data Science 101: bit.ly/dataprofessor-ds101
✅ Data Science KZreadr Podcast: bit.ly/datascience-youtuber-p...
✅ Data Science Virtual Internship: bit.ly/dataprofessor-internship
✅ Bioinformatics: bit.ly/dataprofessor-bioinform...
✅ Data Science Toolbox: bit.ly/dataprofessor-datascie...
✅ Streamlit (Web App in Python): bit.ly/dataprofessor-streamlit
✅ Shiny (Web App in R): bit.ly/dataprofessor-shiny
✅ Google Colab Tips and Tricks: bit.ly/dataprofessor-google-c...
✅ Pandas Tips and Tricks: bit.ly/dataprofessor-pandas
✅ Python Data Science Project: bit.ly/dataprofessor-python-ds
✅ R Data Science Project: bit.ly/dataprofessor-r-ds
⭕ Subscribe:
If you're new here, it would mean the world to me if you would consider subscribing to this channel.
✅ Subscribe: kzread.info...
⭕ Recommended Tools:
Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!
✅ Check out Kite: www.kite.com/get-kite/?...
⭕ Recommended Books:
✅ Hands-On Machine Learning with Scikit-Learn : amzn.to/3hTKuTt
✅ Data Science from Scratch : amzn.to/3fO0JiZ
✅ Python Data Science Handbook : amzn.to/37Tvf8n
✅ R for Data Science : amzn.to/2YCPcgW
✅ Artificial Intelligence: The Insights You Need from Harvard Business Review: amzn.to/33jTdcv
✅ AI Superpowers: China, Silicon Valley, and the New World Order: amzn.to/3nghGrd
⭕ Stock photos, graphics and videos used on this channel:
✅ 1.envato.market/c/2346717/628...
⭕ Follow us:
✅ Medium: bit.ly/chanin-medium
✅ FaceBook: / dataprofessor
✅ Website: dataprofessor.org/ (Under construction)
✅ Twitter: / thedataprof
✅ Instagram: / data.professor
✅ LinkedIn: / chanin-nantasenamat
✅ GitHub 1: github.com/dataprofessor/
✅ GitHub 2: github.com/chaninlab/
⭕ Disclaimer:
Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.
#dataprofessor #pandas #scraping #web #pd #webscraping #readhtml #scrape #webscrape #datascrape #datascraping #scrapingdata #scrapedata #dataframe #dataframes #jupyternotebook #jupyter #googlecolab #colaboratory #notebook #machinelearning #datascienceproject #randomforest #decisiontree #svm #neuralnet #neuralnetwork #supportvectormachine #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #machinelearningmodel
Пікірлер: 123
I didn't know about this pandas functionality! Great video!
@DataProfessor
4 жыл бұрын
Wow, it's Ken Jee! Thanks for the comment and kind words! I also subscribe to your channel, great content by the way, especially the 6-part DS project from scratch series.
@KenJee_ds
4 жыл бұрын
@@DataProfessor Thanks! I am loving your stuff as well. I need to start using colab more. Keep up the good work, the tutorials are very helpful!
@karthiavenger4577
3 жыл бұрын
You great bro Down to earth
Please don't stop making videos. These videos really helps alot.
@DataProfessor
4 жыл бұрын
Thank you, glad it was helpful!
Excellent work breaking this down. I have only used R, but this seemed incredibly intuitive. Thank you!
I used this before, but I didn't knew that you can select the table using the brackets, awesome! Thanks for the video!
@DataProfessor
3 жыл бұрын
Glad it's helpful, thanks for watching!
Great Explanation of each step....right from opening file to end....because sometimes as a newbie we find difficult to which file to use from github also.....Thank you ....Great Video!
@DataProfessor
3 жыл бұрын
Wow thanks for the encouraging words, glad you’ve found the video helpful 😊
Great well explained clear and excellent quality of sound. Thanks for doing this keep it up!
@DataProfessor
4 жыл бұрын
Thanks for the encouragement 😃
Wow your video is the best , it took me forever to run this .This video helped me in 5 min. Thank you !!!
Wow this is a great video! Very well organised!
thanks a lot. I am doing a machine learning project and do web scraping in the same code...thanks this is better
A query, in row 12 , why are we using .index along with df.drop ? why wouldn't df.drop work without it ?
what would be best for comparing prices between competitors?
Amazing! your video helped me with my 1st homework in Data Mining. And also thinking to jump into data science, so Thank you so much! Like and Subscription!
@DataProfessor
3 жыл бұрын
Glad I could help! And welcome to Data science!
Thank you so much for this concept it was really helpful respect !
this tutorial gets my subscription. Thank you Professor. :)
@DataProfessor
4 жыл бұрын
Wow, glad to hear that, welcome aboard 😃
Thx for the video, was really helpful. I wish u more subscribers, man ;)
@DataProfessor
4 жыл бұрын
Thanks for the support! 😃
Thank you so much for this concept it was really time saving one!
@DataProfessor
3 жыл бұрын
Glad it was helpful!
Amazing! I am totally new to web scraping. I tried to scrape the website using beautiful soup library for 4 days now, but I can't get past the basics. You have extremely simplified it for me. For instance, I just scraped data from Wikipedia about the list of countries and their population and got the whole table in the first attempt. Thank you so much! I wonder if this can be used for other pages like LinkedIn, Glassdoor data collection? Because there are no tables there. Professor, thank you so much once again!
@DataProfessor
3 жыл бұрын
Glad to hear that the video was helpful! For non-tabular pages you may have to use beautifulsoup and/or selenium
Fabulous - it's soooo easy when you know how!
@DataProfessor
4 жыл бұрын
Thanks for watching Roger, absolutely agreed with that 😃
Hey I tried using the code on Wikipedia to scrape tables on Wikipedia. When it comes to scraping on place with loads of other data and i just want to pull the table alone is there a method for that? As with current code im pulling whole page. And I just want the playoff stats... i think I'm supposed to creat dictionary then assign it to a dataframe but I dont know how when it comes to urls and websites.
How do I keep the url that the coloum tm has in my dataframe?
Bravo Data Professor, nice lecture!
Do you know if we can use this to scrape sites built with dynamic JS, and how do we do this if we have to login ?
How do we deal when we encounter the error "HTTP Error 403: Forbidden" while reading url with Pandas? How should we proceed in this case? Kindly advise.
Thanks a lot - this helped a lot.
Thank you for the clear explanation !
@DataProfessor
3 жыл бұрын
A pleasure! Thanks for watching 😃
Awesome work by the hero! Keep teaching like this
@DataProfessor
4 жыл бұрын
Thanks for the encouragement 😃
Can you also use df2019(df2019[‘Age’] == ‘Age’) to find the ages containing the word ‘Age’?
Matur nuwun sanget sedulur....
Hi, Professor! Thank you for the contents you brings to us, it really helps! \o/ Lately, I've been asking myself: How important is web scraping for a data scientist? How often do you web scrape? I just started learning it, I'll keep going and I wanted to know your thoughts about its relevance.
@DataProfessor
4 жыл бұрын
Hi Paulo, webscraping comes in handy when you want to create your own dataset from available data on the internet. For example, you want to analyze the salary of data scientists from glassdoor database then you can do that with webscraping. Hope this helps 😃
Hi professor I truly enjoy your videos and have learnt a lot may God keep you successful in life. A question that's been on my mind is what laptop do you use as I really like the keyboard sound when you type unless you are using a external keyboard. Is it possible for you to show us a set-up of your desk ? Kind regards
@DataProfessor
3 жыл бұрын
Hi, I'm using a MacBook Pro (2016) and yes the keyboard feel is good on this laptop although being a bit flat which is a good thing as it allows minimal effort in moving from one button to the next.
not working for other sites i did it for tripadvisor nothing came
Thanks bro, for your nice tutorials
@DataProfessor
3 жыл бұрын
It's my pleasure
hey professor, thankyou for the content. but i was wondering when we are scrapping by just passing the link how does it know to only read data from the table and not any other information.
@DataProfessor
3 жыл бұрын
Hi, the function will detect HTML syntax. The syntax for tables in HTML is and the read_html() function finds these to figure out that they are tables and extracts the data.
this is exciting. i love pandas
Can you please explain how to read all the retrieved urls
Hi Data Professor, thanks for this video. It's very helpful. I'm a newbie starting out in data science and web scraping. Just wondering can you use pandas functionality for scraping data that are not laid out in table? and how would you do that? could you perhaps create a video on scraping non tabular data if you haven't already?
@DataProfessor
2 жыл бұрын
Great question, to web scrape non-tabular data you can look into using beautiful soup and also selenium libraries for Python
@danniliu2544
2 жыл бұрын
@@DataProfessor thank you for the pointer, much appreciated!
@Panucci75
2 жыл бұрын
Exactly the question I was gonna ask. Thanks.
thank you for your video my question if there are many tables in so many pages (20000 page) what should I do ???
@DataProfessor
4 жыл бұрын
The pandas read_html function is suitable for a simple webpage with relatively few tables. For more complex and large volume of pages I would recommend to look into beautifulsoup and selenium.
Hi! ken jee, I try your code of web screping on kaggle but I'm getting RLError: error. i try to solve but i cannot resolve ...please give me your suggestions
@DataProfessor
Жыл бұрын
Hi Piyush, The pandas library allows scraping webpages that have tabular data such as from Wikipedia. It is really limited to those with a predefined table format. To scrape webpages I'd recommend looking into selenium and beautifulsoup
Why did you use string.format instead of String concatination
f strings are more readable compared to the .format() method
The video is great. But the screen text us way too small to read. Suggest that you can enlarge the font or reduce the white space in the screen to make the video no e readable
@DataProfessor
3 жыл бұрын
Thanks for the suggestion, greatly appreciate it, yes in recent videos I have increased the font size.
very cool thanks!
Very helpful thank you!
@DataProfessor
4 жыл бұрын
Thanks Badr for the kind words!
Hi Professor does the original data need to be a html file to start with? Does the original data always need to have a table to extract data?
@DataProfessor
3 жыл бұрын
Yes to both questions, that’s the limitation of this approach. Other than that selenium + beautifulsoup is a good combo to look into.
@sangpark7656
3 жыл бұрын
I see. Thank you very much for the guidance!!@@DataProfessor
thank for knowledge
@DataProfessor
3 жыл бұрын
A pleasure, thanks for watching
what if there is no table on a web page ??
Great video Professor!
@DataProfessor
3 жыл бұрын
Glad you liked it!
Really awesome.. Data Professor
@DataProfessor
4 жыл бұрын
Salik, Thanks!
This helped thanks!
@DataProfessor
4 жыл бұрын
Glad it helped!
Thankyou so much sir
Very nice video
@DataProfessor
2 жыл бұрын
Thanks :)
A great tutorial!
@DataProfessor
3 жыл бұрын
Thank you!
how to save df to excel ? please
Superb, let me bring you some more guys to your channel
@DataProfessor
3 жыл бұрын
Awesome, welcome to the channel!
@nowdevoted1649
3 жыл бұрын
@@DataProfessor 🙏
Hello Professor, I would like to suggest you to publish a video about RSelenium which use with Selenium Webdriver for automation system testing :D Hope it may benefits others. This is just my humble suggestion.
@DataProfessor
3 жыл бұрын
Great suggestion! I have played around with Selenium for Python and have found it pretty powerful. What I made so far was a short script that can take screenshots of my youtube channel's page (or any webpage).
What are the prerequisites to watch this tutorial? I know some python, is this ok?
@DataProfessor
2 жыл бұрын
Yes, beginner’s level of Python is sufficient to follow along.
Great content ! Any idea on how I can scrape data for example from linkedin Jobs Postings. I found Octoparse for this, any ideas?
@DataProfessor
4 жыл бұрын
Thanks Mert for the kind comment. pandas works only for tabular data from webpages. For linkedin posts, we'll probably have to use beautiful soup for that. I might make a future video about that, will put it into the to-do list.
@mj7146
4 жыл бұрын
Data Professor thank you 🙏
@DataProfessor
4 жыл бұрын
@@mj7146 A pleasure!
@oguguaonyinyechi4980
4 жыл бұрын
@@DataProfessor Hi Data Professor, we are still expecting this :grin:
(ImportError: lxml not found, please install it) I got this error. what is the solution?
@DataProfessor
3 жыл бұрын
Hi, you can install lxml via pip install lxml
@moatasimashraf6818
3 жыл бұрын
@@DataProfessor Done it, thank U
Every link turns into a df. How can I concatenate all the dfs?
@DataProfessor
3 жыл бұрын
Hi, dfs can be concatenated using the pd.concat() function, you can play around with axis=0 or axis=1 depending on how you want to combine the dfs (side by side or stacked on top of the other)
Informative.
@DataProfessor
4 жыл бұрын
Thanks Shweta for the kind comment!
i try it on ur channel ( just for testing lol )
Thanks !
@DataProfessor
3 жыл бұрын
Thanks for watching!
Ure awesome sr
@DataProfessor
3 жыл бұрын
Thanks for the kind words
Awesome
You look like jomatech's big brother :O
@DataProfessor
2 жыл бұрын
Haha, I get that a lot. Joma and I should do a collab video 😆
@tareqmahmud3902
2 жыл бұрын
@@DataProfessor But Sir I learned a week's lesson from one of your 10 minute video. I can't be more grateful to you. Thank you.
@DataProfessor
2 жыл бұрын
@@tareqmahmud3902Thanks, glad to hear that they’re helpful! 😊
Don't name your variables str or you will shadow the string builtin
@DataProfessor
3 жыл бұрын
You're right, many thanks for pointing that out, why did I do that. I've changed it to url_link now.
Is this useful for every situation? I am trying to fetch data from glassdoor but this method is not working Link: "www.glassdoor.co.in/Job/bengaluru-data-analyst-jobs-SRCH_IL.0,9_IC2940587_KO10,22.htm"
Is there an api for sports results? or you have to do it via web scraping?