Clean Excel Data with Python and Pandas - 5 Minute Python Scripts - Full Code Along Walkthrough

Ғылым және технология

In this video we'll cover the basics of how to clean your excel data using python.
We'll cover how we can load in excel files, change or modify their current cells to meet your requirements, and then rewrite back to a new excel file.
Kite helps fund the channel, thanks for checking them out and supporting me --
⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. www.kite.com/get-kite/?...
Thanks so much for all the support!! You all are far too good to me. 330+ Subscribers!!! Thank you all so much.
Here's the GitHub link:
github.com/Derrick-Sherrill/D...
If you have any suggestions for the next video please let me know!
Until next time!
*****************************************************************
Code from this tutorial and all my others can be found on my GitHub:
github.com/Derrick-Sherrill/D...
Check out my website:
www.derricksherrill.com/
If you liked the video - please hit the like button. It means more than you know. Thanks for watching!!
Useful Links
-----------------------------------------------------------------------------------------------------------------
Python Download:
www.python.org/downloads/
(Remember Python 3 is the future!)
I use Atom Text Editor for all my tutorials
Atom Text Editor:
atom.io/
Packages I often use in Python tutorials:
-Pandas
pandas.pydata.org/pandas-docs...
-Numpy
www.numpy.org/
-xlrd
xlrd.readthedocs.io/en/latest/
-TensorFlow
www.tensorflow.org/api_docs/p...
-Matplotlib
matplotlib.org/
-Django Framework
www.djangoproject.com/
-Beautiful Soup
www.crummy.com/software/Beaut...
(Install through Terminal $pip3 install ....)
Other Useful Services sometimes featured:
-Amazon Web Services (AWS)
aws.amazon.com/
-Microsoft Azure
azure.microsoft.com/en-us/
-Google Cloud
cloud.google.com/
-Juypter Notebooks
jupyter.org/
Always looking for suggestions on what video to make next -- leave me a comment with your project! Happy Coding!

Пікірлер: 95

  • @davestark3261
    @davestark32615 жыл бұрын

    Such a huge fan of the 'bitesize' format of these videos. Clear instructions, excellent explanations. Keep it up!!

  • @dlutherc
    @dlutherc4 жыл бұрын

    It's so much easier to learn the content when the important information is not separated by a lot of talk. These videos have helped tremendously, you do a great job!

  • @mimichui1
    @mimichui15 жыл бұрын

    Stumble to your channel and found these 5-min tutorials. Almost watched every single one. Thank you so much! Looking forward to watching more 5-min videos.

  • @JR-ub3yv
    @JR-ub3yv4 жыл бұрын

    These tutorials are some of the best I have ever seen! Your ability to clearly and concisely explain the concepts is exceptional. Looking forward to seeing more.

  • @MacTheDJCom
    @MacTheDJCom4 жыл бұрын

    These 5 minute tutorials are life!

  • @anthonyrojas9989
    @anthonyrojas9989 Жыл бұрын

    This is my new mentor man, the simplicity and clear explanation is on point.

  • @Regc10
    @Regc104 жыл бұрын

    what a legend! Love your tutorials :)

  • @big_cheese2162
    @big_cheese21624 жыл бұрын

    Thanks very much for all the great content, has been fantastic both for my work & coding in my spare time. Keep it up!

  • @josephtran1500
    @josephtran15005 жыл бұрын

    sheet1['First Name'] = sheet1['First Name, Last Name].map(lambda x: x.split(',')[0] sheet1['Last Name'] = sheet1['First Name, Last Name].map(lambda x: x.split(' ')[1] pandas is built on top of numpy which supports vectorized operations. No need to write the for loop. You can just call the column and .map() with a lambda expression inside

  • @CodeWithDerrick

    @CodeWithDerrick

    5 жыл бұрын

    Totally agree! Lambda functions are just difficult to teach effectively to the level of audience I’m targeting. 😬 without a doubt though yours is less code and faster than the for loop.

  • @johndunn6253
    @johndunn62535 жыл бұрын

    Fantastic, thanks for making these!

  • @koontzrob22
    @koontzrob225 жыл бұрын

    Really Awesome Video! It would be pretty cool if you did like a part 2 where you clean excel data (data with way more errors, random empty rows, wrong data types, misspellings etc) with python.

  • @zayamadin
    @zayamadin2 жыл бұрын

    Thank You! That's AWSOME!!!!

  • @udaracperera
    @udaracperera4 жыл бұрын

    Huge fan bother keep it up!!! Thanks

  • @vincenzo3292
    @vincenzo32925 жыл бұрын

    Great tutorial, thanks. Can use this at work. Like the Milan entry - definitely go to Italy.

  • @skyblue021
    @skyblue0215 жыл бұрын

    You rock Dude, thanks for your great work!

  • @TeverRus
    @TeverRus2 жыл бұрын

    My man, you are a genuis! Thank you so much! I'm going to use it at work on Monday :) Cheers!

  • @createyourlifestylenow
    @createyourlifestylenow5 жыл бұрын

    great, they way you explain it is very easy to follow for beginners

  • @rverm1000
    @rverm10002 жыл бұрын

    Cool taking machine learning now . This really helps

  • @keagankemp6275
    @keagankemp62752 жыл бұрын

    Wow starting my python journey and came across your channel, needless to say all this is a good find.

  • @dantedt3931
    @dantedt39315 жыл бұрын

    Awesome videos!Thanks!

  • @matony19
    @matony193 жыл бұрын

    awesome video! hope you continue

  • @dagudelo88
    @dagudelo884 жыл бұрын

    Very usefull and also a very clear explanation. keep it up :)

  • @quantum7401
    @quantum74014 жыл бұрын

    Very nice, reminds me of a good lecture.

  • @alessandroformiconi6242
    @alessandroformiconi62423 жыл бұрын

    Hi Derrick thank you, very good work and ALL IN 7 MINUTES, that's great for learning ... i have programmed in Java for years, but Python is so funny!

  • @xujerry3762
    @xujerry37624 жыл бұрын

    it is good lessons of your video, It makes me learn more about python. nice to keep in touch with you.

  • @vilw4739
    @vilw4739 Жыл бұрын

    Thanks so much for this,this might seem simple but can trouble you while dealing with huge datset.I was stuck for few days now after watching the video got an idea of what to do and what was i doing wrong!!

  • @santoshgujar5237
    @santoshgujar52372 жыл бұрын

    Thank you, Sir

  • @nethsz
    @nethsz3 жыл бұрын

    Thanks, I am beginer, it's really usefull, just need to install -xlrd and -xlsxwriter at first.

  • @nathanliu4018
    @nathanliu40184 жыл бұрын

    Love it!!

  • @osmankhaled4565
    @osmankhaled45654 жыл бұрын

    Excellent

  • @dhananjaykansal8097
    @dhananjaykansal80974 жыл бұрын

    Lovely!

  • @AshokKumar-eu4dd
    @AshokKumar-eu4dd4 жыл бұрын

    Hi Sherrill, i wanted to learn python course. while searching videos in KZread,i found ur channel and feels this right channel for beginner. i am basically excel user and not aware about python and any programmer . could you please advise me how to start my carrier in python and also request you to post basic videos from scratch in excel user. it would be appreciate if you can help me

  • @meryemgazanayi5665
    @meryemgazanayi5665 Жыл бұрын

    thank u sooooo much

  • @martin-xq7te
    @martin-xq7te4 жыл бұрын

    Great tutorial Derrick, how about a frequency tutorial showing how to display the count of names or numbers in an excel sheet.

  • @CodeWithDerrick

    @CodeWithDerrick

    4 жыл бұрын

    Hey Martin, thanks for the kind words! Would the groupby method work for what you’re thinking, or are you thinking displaying the counts of items across the entire worksheet (not just the count of of a name in a single column but all columns)?

  • @stephang5671
    @stephang56713 жыл бұрын

    Great video, on spot and no 'superivised typing' for us. At the moment I'm fighting with date-Time-fields. You could add a column 'Birth date' with dates in different formats ('200-12-30', '1998-11-28 05:25:59', '01.01.1985' (European)) and clean it up in a way that I can apply conditions on it (for e.g. find the over 18 year old). Just if you need ideas :-)

  • @robertcliffort2354
    @robertcliffort23542 жыл бұрын

    great.

  • @nicolaimartin7279
    @nicolaimartin72795 жыл бұрын

    great thx

  • @RS-el7iu
    @RS-el7iu4 жыл бұрын

    thanks 4d very nice explanation .... how can we write the sheet onto the same excel file but in a different sheet?

  • @josephtortolano786
    @josephtortolano7862 жыл бұрын

    Hey nice video I am working a lot with VBA at work and I was wondering why would you do this into python if you could easily do it in VBA ? Is it faster or more reliable with python ? Also can you show cases where python might awser problems that can't be done in VBA. That would be a very nice video in order to see differences and limits in both languages. Keep the good work 👏

  • @vincentsvlog1761
    @vincentsvlog17612 жыл бұрын

    Hello Derrick, Thanks for the fantastic video. I'm curious that is there any AI function to do so?

  • @ajith.p481
    @ajith.p4815 жыл бұрын

    Your are good, how to put pivot table and delimit the column content?

  • @deepakgiya
    @deepakgiya5 жыл бұрын

    Do you have any tutorials on search and replace column values in excel Reqmts: 1) Search a pattern in rows and then delete the row 2) Search a pattern in rows and then replace it with new pattern 3) Search a pattern in rows and remove that pattern and leave the rest in the rows 4) Delete a column based on search pattern 5) Save to new file

  • @everydayhappy965
    @everydayhappy9652 жыл бұрын

    hi, I am wondering do you have a good way to import a number of excel sheets but do not have to do the type in the import name many times. thanks

  • @thenickrodriquez
    @thenickrodriquez3 жыл бұрын

    at the 4:01 timestamp, what is the 1 for in the For loop?

  • @liestyaq
    @liestyaq3 жыл бұрын

    How can i preprocessing in text mining using excel file? I am a beginner .

  • @Achu.3.31
    @Achu.3.313 ай бұрын

    Bro please make a full course python in excel

  • @adityacodz3121
    @adityacodz31213 жыл бұрын

    Which IDE are you using?

  • @leechinheng7908
    @leechinheng79084 жыл бұрын

    Derrick, I am curious why you don't install "scripts" package in atom? It seems troublesome to run the script with the command "python3 xxx".

  • @jakeg9711
    @jakeg97114 жыл бұрын

    Is it possible to remove time stamps and change date formats (both US and Europe date formats in same column) of my excel data within python?

  • @belaidmabrouk29
    @belaidmabrouk295 жыл бұрын

    Hi derrick thank you so much for your effort . I have 1 question do pandas lib creat chart or graphic data assimilation! If no which is the most suitable lib on python

  • @CodeWithDerrick

    @CodeWithDerrick

    5 жыл бұрын

    Hey Belaid, Thanks for your kind words! There are a couple useful ones. Pandas does have a data visualization already built in called pandas Visualization. It's built on the package of Matplotlib. Plotly is another cool one to check out too if you need interactive graphing!

  • @belaidmabrouk29

    @belaidmabrouk29

    5 жыл бұрын

    @@CodeWithDerrick thank you so mach derrick, i will check it soon and give you my feedback. Thx and Have a nice day

  • @youssefahmad9112
    @youssefahmad91125 жыл бұрын

    Great video.. thanks.. I have a question please.. How can add (or append) rows to an existing Excel ?? Do you have a video about this ??

  • @CodeWithDerrick

    @CodeWithDerrick

    5 жыл бұрын

    Thanks for the kind words!! Where is the data that you’re adding to the sheet? We can append how we did in this video, merge the two together, and several other things. I’m happy to do an example with more specifics! 😀

  • @youssefahmad9112

    @youssefahmad9112

    5 жыл бұрын

    @@CodeWithDerrick Well.. the data are generated at the same code, like the Average of some values.

  • @youssefahmad9112

    @youssefahmad9112

    5 жыл бұрын

    @@CodeWithDerrick And to be more specific, I'm trying to create an Excel sheet - using python of course - that contains: Student name and his result in an exam that he did it in the python program. Till now I can create one sheet for each student, but I need one sheet for all the student . 😀 Hope I didn't talk very much 😅, And I would be grateful if you helped me. 💛

  • @marc10uae
    @marc10uae5 жыл бұрын

    Great tutorial - thanks for it.. but what is the advantage of doing this in python, vs direct in excel itself with the upper command and just adding the words with comma to a new column

  • @BiancaAguglia

    @BiancaAguglia

    5 жыл бұрын

    I think Python is the better choice when your clean-up tasks are more complicated than the one Derrick showed (and when you're more comfortable using Python than you are using Excel. 😁) For example, if the Excel spreadsheet has messy text that needs to be cleaned up using regular expressions. Another example is when you have to apply more complex functions to certain entries in the spreadsheet. Python is very powerful and it's worth learning, but I've seen Excel experts who can automate many spreadsheet operations simply by using Excel (and VBA). So, if you're already a pro in Excel, you might not see an improvement in your workflow by using Python. Personally, I recommend learning Python because it can help you far beyond cleaning up Excel spreadsheets. 😊

  • @wendzbrand

    @wendzbrand

    2 жыл бұрын

    the advantage of doing this in python is when you are cleaning a big set of data that excel could not handle.

  • @ThanhTruong-sf3pc
    @ThanhTruong-sf3pc3 жыл бұрын

    What's the fuck ? This video is really clear with few minutes ♥ Thanks god bring me to here

  • @skytell
    @skytell3 жыл бұрын

    After you outputted new data, column A in the excel display 0 thru 10 as row number - how do you get rid of that row number on the output of the excel file?

  • @CodingIsFun

    @CodingIsFun

    3 жыл бұрын

    sheet1.to_excel('output.xlsx', index=False)

  • @andrewc2174
    @andrewc21744 жыл бұрын

    I'm trying to do the command prompt in your video but when I enter what you wrote it doesn't do anything. I'm on windows, is there another step?

  • @limitless4766
    @limitless47664 жыл бұрын

    Am getting error each time I try to import panda , but I have already installed the panda module

  • @the_randomguy7989
    @the_randomguy79892 жыл бұрын

    Where will be that output file gets saved? My previous one is not updated also Please help

  • @alinajaved2165
    @alinajaved21652 жыл бұрын

    how to automate the data cleanup with python?

  • @kavankailey506
    @kavankailey506Ай бұрын

    how can do the next given question from my assignment, if anyone can suggest please • The size of these data sets is quite large. The weather data is provided in xlsx format and will need to be cleaned up and converted to a suitable format before you can use it in your program - you should discard any data that you don’t need to reduce the amount of time it will take to train your models. • The data set includes data from 2015 - 2021 inclusive, but 2021 does not contain the full year. Your predictions should be for the year 2022.

  • @mcnamarachiwaye6359
    @mcnamarachiwaye63594 жыл бұрын

    hie how do i sum up excel dataframe ($320), $350

  • @patalpunuoma
    @patalpunuoma3 жыл бұрын

    Thank you for the video really helpful. But it works only if every cell has a first name and last name. If there isn't the last name in the cell script breaks. (ValueError: not enough values to unpack (expected 2, got 1) Tried to write if statement but still doesn't work properly. What would be the solution? :)

  • @Frankenstein786

    @Frankenstein786

    Жыл бұрын

    You should look up exception handling, try except pass

  • @fishtheory7529
    @fishtheory75293 жыл бұрын

    Having trouble pulling up the Excel file in the command prompt. It states that there is no such file or directory. I have tried using the full path as well as setting the folder holding my excel files as the working directory. Unsure what the problem is.

  • @marcosfilho1815
    @marcosfilho18154 жыл бұрын

    Which IDE is it?

  • @RalphMartinez007
    @RalphMartinez0074 жыл бұрын

    How do you export a finished clean data to an existing sheet?

  • @CodingIsFun

    @CodingIsFun

    3 жыл бұрын

    This answer on Stackoverflow might help you out: stackoverflow.com/questions/42370977/how-to-save-a-new-sheet-in-an-existing-excel-file-using-pandas/42371251

  • @nishchitjain1
    @nishchitjain15 жыл бұрын

    How do I get Pandas ?

  • @sean7258
    @sean72582 жыл бұрын

    What if you had over 1000 rows of unknown names and multiple names contained either (Prof, Mr, Ms, Fr) at the start or (Jr, Bsc) at the end. Because that's my problem at the moment and I'm stumped......

  • @lilycheong3832
    @lilycheong38324 жыл бұрын

    questions is that if i want to spilt the file data into by ',' for eg '1234567890, FOOD, 10/UNIT, QTY 300', i want to spilt by '1234567890' , 'FOOD', ' 10/UNIT', 'QTY 300' , how can we do this ?

  • @manishsrivastava5611
    @manishsrivastava56113 жыл бұрын

    I am getting this error .. 'ValueError: Neither the `x` nor `y` variable appears to be numeric". help

  • @comptegmail273
    @comptegmail2732 жыл бұрын

    Hello sir, thank you so much for the tutorial. I'm actually stuck since my source in a CSV file. Except that sadly the file I'm working is extremely complex with indefinete columns since my main columns are repeated everyday based on the date. I've been stuck on this problem since over a week. Is there a way I could reach out to you and have your mail to maybe help solve this problem? Thanks a lot in advance.

  • @Frankenstein786

    @Frankenstein786

    Жыл бұрын

    You could try transposing your data frame when you initialize the program. That should swap the rows and columns and then you could index the date.

  • @vipin_optimistic179
    @vipin_optimistic1793 жыл бұрын

    Excel sheet se data kase nikalte hai

  • @johnbrady2930
    @johnbrady29302 жыл бұрын

    What would happen if somebody had a middle name (Joe Pat Murphy) or the surname is double barrelled (Moran Dylan)

  • @BAL31m89
    @BAL31m893 жыл бұрын

    hey i have a query, suppose we have 10 columns, and 100.. of rows. Columns will be like Date, Customer, city name, and then further are the Sales, price, items and some bunch of other columns, now suppose i have some abnormal value in the sales , price or any other cells in my excel data . Here is my query now. 1- I want to remove or replace that specific cell data but not whole row as other parameters are correct in the other columns. like may be i have some issue in my 'Item' column for that specific date. Also may be the wrong value is for one of the customer on that date, but fine for other. 2- I want to delete the whole row where i find any abnormal value for any specific column. 3- How to get back my final output sheet for both cases. please make a video for this case

  • @markross7231
    @markross72313 жыл бұрын

    Is that Excel in 365 your using on the Mac??

  • @djsanell
    @djsanell3 жыл бұрын

    @Derrick - I have a little problem. Whenever I export that to Excel, is not working. Is exporting in the same state with First Name and Last name into one cell. In Terminal is working fine with split between two but when I do export to xlsx format this functions is not working anymore. I use to export df.to_excel("Clean data.xlsx") ---- Anyone from here could help me with that problem? Many thanks all!

  • @AI_CANISTER

    @AI_CANISTER

    3 жыл бұрын

    please install openpyxl. it will work

  • @pomicsaviox9971
    @pomicsaviox99714 жыл бұрын

    How to split below Column value Column 058-10-1312 The Little Rascals Split as Column1 - 058-10-1312 Column2 - The Little Rascals

  • @vipin_optimistic179
    @vipin_optimistic1793 жыл бұрын

    My very big problem

  • @barrowmusics
    @barrowmusics4 жыл бұрын

    Please Derick can i get your mail, need to send you a file

  • @thehardikbhatia
    @thehardikbhatia4 жыл бұрын

    Contanct me , i want to work on a project with you

  • @Bozon_Higgsa
    @Bozon_Higgsa2 жыл бұрын

    ...

Келесі