Data Cleaning in Pandas | Python Pandas Tutorials

Take my Full Python Course Here: www.analystbuilder.com/course...
In this series we will be walking through everything you need to know to get started in Pandas! In this video, we learn about Data Cleaning in Pandas.
Datasets in GitHub:
github.com/AlexTheAnalyst/Pan...
Code in GitHub: github.com/AlexTheAnalyst/Pan...
Favorite Pandas Course:
Data Analysis with Pandas and Python - bit.ly/3KHMLlu
____________________________________________
SUBSCRIBE!
Do you want to become a Data Analyst? That's what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
____________________________________________
RESOURCES:
Coursera Courses:
📖Google Data Analyst Certification: coursera.pxf.io/5bBd62
📖Data Analysis with Python - coursera.pxf.io/BXY3Wy
📖IBM Data Analysis Specialization - coursera.pxf.io/AoYOdR
📖Tableau Data Visualization - coursera.pxf.io/MXYqaN
Udemy Courses:
📖Python for Data Analysis and Visualization- bit.ly/3hhX4LX
📖Statistics for Data Science - bit.ly/37jqDbq
📖SQL for Data Analysts (SSMS) - bit.ly/3fkqEij
📖Tableau A-Z - bit.ly/385lYvN
Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!
____________________________________________
BECOME A MEMBER -
Want to support the channel? Consider becoming a member! I do Monthly Livestreams and you get some awesome Emoji's to use in chat and comments!
/ @alextheanalyst
____________________________________________
Websites:
💻Website: AlexTheAnalyst.com
💾GitHub: github.com/AlexTheAnalyst
📱Instagram: @Alex_The_Analyst
____________________________________________
0:00 Intro
0:41 First Look at Data
2:34 Removing Duplicates
3:41 Dropping Columns
5:10 Strip
12:15 Cleaning/Standardizing Phone Numbers
21:29 Splitting Columns
24:58 Standardizing Column Values using Replace
28:40 Fill Null Values
29:42 Filtering Down Rows of Data
36:42 Outro
All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for

Пікірлер: 332

  • @fede77
    @fede775 ай бұрын

    For those struggling with the regular expression at 14:57 , you might need to explicitly assign regex = True (based on the FutureWarning displayed in the video). That is: df['Phone_Number'] = df['Phone_Number'].str.replace('[^a-zA-Z0-9]', '', regex=True)

  • @wenkanglee9596

    @wenkanglee9596

    5 ай бұрын

    gosh you're observant

  • @ronnelsupnet9850

    @ronnelsupnet9850

    5 ай бұрын

    Thank you!

  • @rhodaime79

    @rhodaime79

    5 ай бұрын

    My goodness. You saved me. I’ve been at this for about an hour. Thank you 🙏 thank you 🙏

  • @DevanshAsawa

    @DevanshAsawa

    5 ай бұрын

    Thanks a lot dude !!!!!! Helped a lot !!!!!!!

  • @rnjesus9950

    @rnjesus9950

    4 ай бұрын

    Legend.

  • @rahulraj3855
    @rahulraj3855 Жыл бұрын

    Fan from India I just got 2 offers from very good companies thanks to your videos and it helped me transition from a customer success support to Data Analyst

  • @rozakhan2811

    @rozakhan2811

    Жыл бұрын

    Hey tell me how can I do it too ri8 now I'm working as a customer support executive please help me to grow..

  • @dywa_varaprasad

    @dywa_varaprasad

    Жыл бұрын

    hey Rahul, how do you learn DA ? Can you share your experience it will be helpful for us!!

  • @sandeepthukral3018

    @sandeepthukral3018

    11 ай бұрын

    Hi bro is this course sufficient for beginner to land a job

  • @abdullahalmahfuz6700

    @abdullahalmahfuz6700

    10 ай бұрын

    Is this a spam comment?

  • @TamilTigers001

    @TamilTigers001

    7 ай бұрын

    ​@rozakhan2811 skills need is a basic thing...what you want..in that be strong..And way of Alex Teach Videos are Effective..

  • @tomaronson4419
    @tomaronson44194 ай бұрын

    For splitting the address at 21:29, you may want to add a named parameter to the value of 2, as in n=2: df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(',', n=2, expand=True)

  • @user-tm7uw4os1n

    @user-tm7uw4os1n

    3 ай бұрын

    This helps! Thank you so much!

  • @nataliarobinson5671

    @nataliarobinson5671

    3 ай бұрын

    Thank you very much

  • @OmarRabeh

    @OmarRabeh

    3 ай бұрын

    thank you very much

  • @OkallTheAnalyst

    @OkallTheAnalyst

    3 ай бұрын

    Thank you!

  • @janrelleelam3628

    @janrelleelam3628

    Ай бұрын

    OMG! Thank you so very much. I have been trying to figure this out for about four days now. I figured out the phone number issue and then how to split the address, but for the life of me splitting the address into named columns with the changes committed the df was not working. THANK YOU!

  • @sj1795
    @sj17955 ай бұрын

    Found this REALLY helpful! I love how you walk us through mistakes as well as explain WHY you do what you do throughout your videos. It adds so much value to each video. As always, THANK YOU ALEX!!

  • @DreaSimply21
    @DreaSimply217 ай бұрын

    I like how in some of your videos you show us the long way and then the short cut, instead of just showing the short cut. I think that way gives the person who is learning a better breakdown of what they are doing.

  • @jeanaimegakwerere8591
    @jeanaimegakwerere85919 ай бұрын

    Thank you sir, you can't imagine how i fill confident in cleaning data after completing this video with real data practices. Thank you once again.

  • @iinph
    @iinph5 ай бұрын

    thank you for your work Alex! I went through the entire video 1 by 1 twice and I can tell I learned a lot from this video , finally understanding why we need to learn Loops etc. and how simple cleaning methods work on Jupyter.

  • @farahandini3799
    @farahandini3799 Жыл бұрын

    I really like when you make mistakes, because it tells that no one perfect. I sometimes anxious when I watch tutorials and they seem to be so good. You also implicate the struggles that you experiencing throughout the process is real. Thanks for the tutorial Alex.

  • @emmanuelnwachukwu6071
    @emmanuelnwachukwu60719 ай бұрын

    This is the best video I have ever watched on data cleaning using pandas.. even the mistakes were good to learn from.

  • @morris9973
    @morris99734 ай бұрын

    I've been struggling with Pandas a bit and this video cleared some things for me! what frustrates me from the way my teachers would teach Pandas, their solutions are sometimes too efficient, in the sense that a student that started from zero who's taking an exam, will never be able to come up with these hyper efficient and elegant one-liners in their code. what I appreciate in your video is how you achieve the same results, but in a way that a beginner can easily remember and apply on an exam. thank you! I'll be checking out more of your videos.

  • @ashwanikumarkaushik2531
    @ashwanikumarkaushik2531 Жыл бұрын

    This is one of the best videos regarding data cleaning I have ever watched. Really crisp and covers almost all the important steps. It also dives deep into concepts that are really important, but you rarely see anybody applying them. Must watch for everybody, who is looking to get into data field or are already in the field.

  • @AlexTheAnalyst

    @AlexTheAnalyst

    Жыл бұрын

    Glad to hear it!

  • @margotonik
    @margotonik3 ай бұрын

    I enjoyed working on this project. Thank you Alex and a huge thank you to those guys who helped in the struggling minutes!

  • @danielblum5691
    @danielblum5691 Жыл бұрын

    Thank you for this video. I just finished this part of the data analytics course and I definitely learned something new and helpful.

  • @khaibaromari8178
    @khaibaromari81787 ай бұрын

    Simply amazing! Well-explained and comprehensive. Loved it!

  • @yashjohngaming2928
    @yashjohngaming29288 ай бұрын

    Best video available on internet so far for data cleaning in Pandas. Best explanation. 😇😇

  • @menyajasper4940
    @menyajasper49405 ай бұрын

    This is really very important to both the beginners and pro. Kudos!!

  • @bharatsaraswat
    @bharatsaraswat5 ай бұрын

    Very well done! Great video. I am working on analyzing and cleaning scraped data from web and this guide is helpful, especially where you mentioned the mistakes.

  • @A4O_TSL
    @A4O_TSL Жыл бұрын

    Alex your are the GOAT! for real thank you for all the tutorials and your help for everyone who want's to become a data analyst1

  • @AlexTheAnalyst

    @AlexTheAnalyst

    Жыл бұрын

    Glad to do it! :D

  • @millenniumkitten4107
    @millenniumkitten410710 ай бұрын

    Some of the phone numbers are removed while doing the formatting. If you look in the excel file, you'll see that some of the numbers are strings and some are integers. When you run the string method during the formatting, it replaces the numeric values with NaN and they are later removed completely. If you want to avoid losing that data you'll need to use df["Phone_Number"] = df["Phone_Number"].astype(str) before formatting. You also won't need to convert to string in the lambda after doing this.

  • @millenniumkitten4107

    @millenniumkitten4107

    10 ай бұрын

    If you want to replace the empty values in No Not Contact you'll need to use df["Do_Not_Contact"].astype(str).replace("","N") Technically those values are not empty, they are NaNs which is why replace is giving them 'NNN' instead of just the one 'N'. It's treating it as if NaN equals three blank spaces

  • @atomicafk8704

    @atomicafk8704

    9 ай бұрын

    that's what i've noticed too, great work

  • @jameslindsay4705

    @jameslindsay4705

    6 ай бұрын

    You are a genius, thanks :)

  • @jaldaamol46

    @jaldaamol46

    2 ай бұрын

    Thanks man, this worked.

  • @enyinnayajaja
    @enyinnayajaja Жыл бұрын

    Thank you Alex for this video on data cleaning with pandas. It is very detailed and explanatory

  • @MrValleMilton
    @MrValleMilton8 ай бұрын

    Great Pandas data cleaning video. Thank you very much for sharing your knowledge.

  • @MegaDave8520
    @MegaDave8520 Жыл бұрын

    And I was already looking for some Pandas tutorial. Thank you, Alex, this was much needed. :)

  • @AlexTheAnalyst

    @AlexTheAnalyst

    Жыл бұрын

    Glad to help!

  • @villjack
    @villjack Жыл бұрын

    My fav thing to do in pandas, thanks for making tutorial.

  • @50cent10891
    @50cent108918 ай бұрын

    Great video! I enjoyed learning from you! Thanks for making things easier to understand

  • @HunzaFolk
    @HunzaFolk3 ай бұрын

    I am studying Data Collection and Data Visualization at Kings College, your channel is reccomned by our lecturers to understand data cleaning.

  • @bennet5467
    @bennet54676 ай бұрын

    Thanks for this content, this was so helpful!! I think i have some optimizations, correct me if im wrong :D 27:04 instead of calling the replace function multiple times, you can create a mapping just like: replace_mapping = {'Yes': 'Y', 'No': 'N'} and call it like: df = df.replace(replace_mapping), so you dont have to specify mapping for each column and need to call .replace() just once. 34:16 instead of the for loop + manually dropping row per row, you can make use of the .loc function like: df = df.loc[df["Do_Not_Contact"] == "N"] in order to filter the rows based on filter criterium.

  • @ivanovalle9764

    @ivanovalle9764

    3 ай бұрын

    Where did you learn that you could use a dictionary format to replace multiple values in one line? this is really useful, thanks!

  • @yanpaucon1043

    @yanpaucon1043

    26 күн бұрын

    Thank You. 34:16 is really helpful. I appreciate your kindness.

  • @jtmoleleki3604
    @jtmoleleki36043 ай бұрын

    Thank you Alex. Your videos are very helpful. Now I can resume cleaning my data.

  • @georgekalathoor
    @georgekalathoor4 ай бұрын

    instead of applying lambda function to convert Phone_Number column elements to string , we can also use df['Phone_Number'] = df['Phone_Number'].astype(str) and use dictionary as an argument to be passed inside replace method to avoid Yes becoming YYes df['Paying Customer']= df['Paying Customer'].replace({'Y':'Yes','N':'No'})

  • @JK-tk2do
    @JK-tk2do8 ай бұрын

    Oh my.. I am going to watch every single video you created..

  • @rnjesus9950
    @rnjesus99504 ай бұрын

    After making it this far through the course over the last 2 months, looking at these last 4 videos I'm getting strong final exam vibes. Python has not felt intuitive to me at all, but I recognize its value. I guess it feels like taking Spanish 1 and having Spanish 2 tests. I'm definitely looking forward to applying what I've learned here to solidify the lessons more. I'm contracting for a company already and writing a proposal for them to transition to My SQL Server. I guess the fact that I feel overwhelmed with all the info means I'm actually learning how little I actually know, which is a good thing for growth in the long run. Rambling here, but I am incredibly thankful for the course, Alex.

  • @hamzaabdullahmoh
    @hamzaabdullahmoh9 ай бұрын

    A Glorious Thank You!! Please Keep This UP!!!!

  • @L3GAT0Dantes
    @L3GAT0Dantes11 ай бұрын

    If you're getting an error when trying to split the address, this is what worked for me; I had to remove the number of values to look for. df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(',', expand=True)

  • @arpandebnath6115

    @arpandebnath6115

    10 ай бұрын

    df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(pat=',', n=2, expand=True) use this you have to include pat

  • @toni_munoz

    @toni_munoz

    6 ай бұрын

    thank you!

  • @warinside7831

    @warinside7831

    5 ай бұрын

    what does that exactly?

  • @drumkick1397
    @drumkick139710 ай бұрын

    I discovered that replace() has an argument regex (regular expression). It is set as regex = True but when we change it to regex = False, it only looks for exact matches, meaning it won't change 'Yes' to 'Yeses', only 'Y' to 'Yes'. We can write df["Paying Customer"].replace('Y', 'Yes', regex = False) and it will work as expected.

  • @uchindamiphiri1381

    @uchindamiphiri1381

    6 ай бұрын

    mine didnt work lol

  • @chernobarry6035
    @chernobarry60354 ай бұрын

    Your explanation was super cool

  • @Niranga.555
    @Niranga.555 Жыл бұрын

    Hey Alex, Thanks for the super content ...!

  • @ramakrishnaraolakkaraju3750
    @ramakrishnaraolakkaraju37509 ай бұрын

    Thanks for the video. Helped a lot in understanding Pandas.

  • @FarizDarari
    @FarizDarari3 ай бұрын

    Many thanks for the dataset+code+video!!! 🔥🔥

  • @modern_jacob
    @modern_jacob11 ай бұрын

    If the df["Phone_Number"].replace('[^a-zA-Z0-9]', ''") is not working for you. Try, df["Phone_Number"].replace('[^a-zA-Z0-9]', ''", regex=True)

  • @ahmadfadlanamin9286

    @ahmadfadlanamin9286

    11 ай бұрын

    Thanks!

  • @vigneshwarsekar8351

    @vigneshwarsekar8351

    10 ай бұрын

    Hi, Thanks, If I try this, Index 2 , 11 and 17 becomes NAN when originally they are in correct format, Kindly help

  • @vigneshwarsekar8351

    @vigneshwarsekar8351

    10 ай бұрын

    Thanks a ton, been looking for it for almost a week

  • @manishaarya247

    @manishaarya247

    8 ай бұрын

    Thanxxxxxsss aaa lotttt🙌

  • @bolajiogunfowote8603
    @bolajiogunfowote86038 ай бұрын

    The video I needed to have a realistic practice in data cleaning.thanks

  • @dullfire8140
    @dullfire8140 Жыл бұрын

    man lets go,you are our hero who can not afford paid courses

  • @selimc3347
    @selimc3347 Жыл бұрын

    Your work are amazing. Thank you so Much

  • @alwaysbehappy1337
    @alwaysbehappy1337 Жыл бұрын

    Thanks Alex, Please post more videos.

  • @aaspirant5392
    @aaspirant53929 ай бұрын

    You are great, Alex. Your teaching skills excellent.

  • @AlexTheAnalyst

    @AlexTheAnalyst

    9 ай бұрын

    Thanks! 😃

  • @traetrae11
    @traetrae11 Жыл бұрын

    Thank you Alex. That Lambda example is going to be very useful.

  • @AlexTheAnalyst

    @AlexTheAnalyst

    Жыл бұрын

    Glad to hear it! :D

  • @anikkantisikder2179
    @anikkantisikder21797 ай бұрын

    For the address column: df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(",", n=2, expand = True). Defining only 2 was giving me an error. so i had to change it to n=2

  • @DreaSimply21

    @DreaSimply21

    7 ай бұрын

    This helped me, thank you! However, what does '"n" mean?

  • @bobojonkasymov2279

    @bobojonkasymov2279

    6 ай бұрын

    n=2 parameter indicates that the split should occur at most two times, producing three resulting parts.@@DreaSimply21

  • @championsadiq7411

    @championsadiq7411

    4 ай бұрын

    Thank you for this. It helped me a great deal

  • @gudiatoka
    @gudiatoka8 ай бұрын

    Great video mam, need more this type of tutorials

  • @nirmalpandey600
    @nirmalpandey600Ай бұрын

    Amazing explanations!

  • @neildelacruz6059
    @neildelacruz60598 ай бұрын

    Thanks for this absolutely great video.

  • @yvonnemukhono3566
    @yvonnemukhono3566Ай бұрын

    Very helpful, and well explained.

  • @yanpaucon1043
    @yanpaucon104326 күн бұрын

    Thank you so much, Alex. You are the Best

  • @user-to9vz6gh4b
    @user-to9vz6gh4b7 ай бұрын

    Alex, I loved the Video. It have Correct Explanation. Thank you so much for your Video. There is a Small Mistake while you are typing #Another Way to drop null value df.dropna(subset='Column_name',inplace = True). I hope you will notify the Error. Thank you. Have a Great day!

  • @md.shahriarabidswapnil604
    @md.shahriarabidswapnil6047 ай бұрын

    thank you very much. your video helped me a lot. good luck

  • @higiniofuentes2551
    @higiniofuentes25517 ай бұрын

    Thank you for this very useful video!

  • @selvas5043
    @selvas504310 ай бұрын

    Super Explanation Thanks

  • @17art3an
    @17art3an6 ай бұрын

    Thank you, great video!

  • @avinashparchake7935
    @avinashparchake79357 ай бұрын

    in Last_Name columns we can used replace function in order remove regular expression like ( ./-) code: df["Last_Name"]= df["Last_Name"].str.replace("[./_]","" ,regex= True)

  • @DreaSimply21

    @DreaSimply21

    7 ай бұрын

    OMG Thank youuuu!!! I knew someone on here had to know the answer to how to use regex lol.

  • @bolajiawofuwa8116

    @bolajiawofuwa8116

    5 ай бұрын

    Thanks

  • @Vikram_8621
    @Vikram_8621 Жыл бұрын

    Thank you Alex! 🙏

  • @SurendraSingh-bd5wc
    @SurendraSingh-bd5wc4 ай бұрын

    Really enjoyed the video

  • @fitnessfreak984
    @fitnessfreak984 Жыл бұрын

    Hey, Alex, I just Started your Pandas Tutorial, and I was waiting for Data Cleaning video, when i open my KZread, First your Video is seen.. This is boon for me 😇🥺 Thanks, I hope you will Upload Matploib, Numpy and Many More Libraries video ❤🤗

  • @AlexTheAnalyst

    @AlexTheAnalyst

    Жыл бұрын

    In the future, yes :)

  • @meryemOuyouss2002
    @meryemOuyouss20026 ай бұрын

    Thank you soo much sir you're really a great professor 👏❤

  • @jamilsonedu917
    @jamilsonedu9176 ай бұрын

    Using regular expressions for manipulating data is beneficial because it allows you to change strings as needed, especially when dealing with different types of strings.

  • @TheWhiteboard2017
    @TheWhiteboard2017 Жыл бұрын

    Alex i have a question regarding the part in 18:50 where you change the phone number column into string using the str() inside the lambda , can i get the same result using first df["Phone_Number"].astype() and then do the lambda ? or is there a nuance and it works only using str() ? Thanks for the great work !

  • @maryemmdini9408
    @maryemmdini94088 ай бұрын

    very well explained video thank youuuu

  • @vasavipasumarthi9601
    @vasavipasumarthi96013 ай бұрын

    Really u fone a good job i became a big fan of u thank u so much for doing this

  • @omkar8101
    @omkar8101 Жыл бұрын

    Thanks a lot Alex for the video ! This was exactly what I was looking for. May I request you to try and upload video on how to write Python ETL code which uses table in a cloud database like snowflake, saves it in a csv format, transforms it and then again uploads it on snowflake. And all these steps are being captured in a log file which is in txt format !

  • @MehmoodAyazKhan

    @MehmoodAyazKhan

    10 ай бұрын

    vouching for this @Alex. It'd be really appreciated TIA

  • @lukekulak7165
    @lukekulak7165 Жыл бұрын

    Yesss love these vids

  • @Elly-we9uc
    @Elly-we9uc6 ай бұрын

    Also, to clean the Do_Not_Contact field, one can use: df['Do_Not_Contact'] = df['Do_Not_Contact'].replace({'N': 'No', 'Y': 'Yes'})

  • @pichpanha6993
    @pichpanha69938 ай бұрын

    Thank you so much this awesome video

  • @ateebbinmuzaffar3136
    @ateebbinmuzaffar3136 Жыл бұрын

    Thanks for the detailed tutorial Alex. I was wondering, if i wanted to become a data scientist instead of a data analyst, would you recommend any people in the industry who I should follow? F.e is there an Alex the Data Scientist out there?😄

  • @hishamafzal1999
    @hishamafzal1999Ай бұрын

    Amazing Video

  • @pewolo_nyenh
    @pewolo_nyenh5 ай бұрын

    For explanation purposes, it is great. For getting the final result, I would have done differently though

  • @sumeetkajale3679
    @sumeetkajale3679 Жыл бұрын

    Hey alex, we don't need to take any course because you are there 😉 I am doing your bootcamp of becoming a data analyst

  • @AlexTheAnalyst

    @AlexTheAnalyst

    Жыл бұрын

    Do it! I try my best to bring the best free content I can :)

  • @nimrod4463
    @nimrod446310 ай бұрын

    Hey alex, could you please expand in detail about the lambda function? thank you.

  • @ryuhayabusa3540
    @ryuhayabusa3540 Жыл бұрын

    Thanks for this

  • @RaySpyder007
    @RaySpyder0073 ай бұрын

    Thanks brother!❤

  • @shotihoch
    @shotihoch2 ай бұрын

    Not an analyst (never wanted to be), but it was very interesting. Thanks!

  • @mastermatt6090
    @mastermatt60902 ай бұрын

    I was intimidated by the Machine learning module but now I am not. Thanks a lot dude

  • @adeolaa.366
    @adeolaa.3664 ай бұрын

    great video thank you. when we did the first lambda, the reason was because lambda is faster. so why did we go against using a lambda when it was time to check if the customer can be called or not?

  • @W.xtar777
    @W.xtar777 Жыл бұрын

    which one is better for data cleaning, Pandas or Excel ?

  • @alexandermackintosh1755
    @alexandermackintosh175511 ай бұрын

    Great video thanks! Can’t help thinking that tools like chatGPT, github copilot al, GPT engineer can pretty much tell you how to/do this all for you so maybe I am wasting my time learning this 😅

  • @chiraggaba8671
    @chiraggaba8671Ай бұрын

    really helpful

  • @YR-up8vk
    @YR-up8vk Жыл бұрын

    Thank you Alex for this detailed breakdown. Just a side note for those who don't like to use loops e.g. for, while For 31:00, you could do the following code 'df.drop(df[df['Do_Not_Contact'] == 'Y'].index, inplace=True'

  • @LuisRivera-oc6xh

    @LuisRivera-oc6xh

    Жыл бұрын

    I'd say that's complicating the code. You can simply do df = df[df['Do_Not_Contact'] != "Y"]

  • @vickygalih5571

    @vickygalih5571

    10 ай бұрын

    @@LuisRivera-oc6xh i literally use this at the first time learning pandas myself

  • @ghanem87

    @ghanem87

    9 ай бұрын

    df = df.drop(df[df['Do_Not_Contact'] == 'Y'].index) df = df.drop(df[df['Do_Not_Contact'] == ''].index) OR df = df[df['Do_Not_Contact'] == 'N']

  • @ZeuSonRed
    @ZeuSonRed9 ай бұрын

    Still Helpful Thanks

  • @SearchingforScraps
    @SearchingforScraps Жыл бұрын

    Great stuff ! Do a collab with Rob Mulla !

  • @dawewatwese6301
    @dawewatwese63018 ай бұрын

    Hi Alex, idk if you will see this comment. So I was doing the same codes, and I noticed when you eliminated the characters for the phone numbers at 14:57 you also deleted the phone numbers that did not have any characters in them. You can see that at index 3 for Walter White, before he had a phone number but after he had NaN. If you can tell me how to correct it, it would be very great. I also never commented on your videos, but i like them very much, they are very good, and helpful. Thanks for everything

  • @GlennLee-qz4st

    @GlennLee-qz4st

    5 ай бұрын

    Not sure if you're still looking for a solution, but from some online searching, I found a solution to avoid deleting phone numbers that did not have any error/contain no characters, by adding .astype(str) before .str.replace, this seems fix the issue and the code should look something like this: df["Phone_Number"] = df['Phone_Number'].astype(str).str.replace('[^a-zA-Z0-9]','',regex=True) Also note you'll have to add in regex=True manually. Maybe it's deleting as it somehow interpret whole number as non-numeric and deleting it erroneously, not 100% sure tho, still a beginner, and it might cause issue with other types of data.

  • @HarshKumar-ws3wv
    @HarshKumar-ws3wv3 ай бұрын

    Sir, in your opinion : Jupyter vs Pycharm? Which is better for data cleaning ?

  • @sachinmaroky4600
    @sachinmaroky4600 Жыл бұрын

    thank you

  • @PlayerOne-GT
    @PlayerOne-GT5 ай бұрын

    Perfect 👍

  • @user-re4ip5ms9w
    @user-re4ip5ms9w15 күн бұрын

    Question is it bad that i dont specify the .str in this df["Do_Not_Contact"].replace("Y", "Yes")

  • @sdivi6881
    @sdivi68814 ай бұрын

    If any one is getting an error on df['Address'].str.split(",",2, expand=True), you can omit 2 and use df["Address"].str.split(",", expand=True)

  • @internetgirl3099
    @internetgirl30993 ай бұрын

    For phone number why don't you convert each record into str first, and then when you apply the reg expression, you can get rid of Nan and Na all together with other stuff?

  • @frybait0626
    @frybait062610 ай бұрын

    Hi Alex! Want to ask whats the point of data cleaning and visualization on pandas if there is PowerBI? PBI is more of a click and drag interface and much more user friendly compared to pandas if its just Cleaning and Visualizing stuff. Is pandas much speedier in terms of raw performance compared to PBI?

  • @lazy_cat_traveller

    @lazy_cat_traveller

    9 ай бұрын

    Not the author but your question is effectively - why should we use Python/R/other programming language vs Excel/PowerBI software for the aforementioned purpose. Several reasons, but the main one - python/r would be much more suitable when working with big amount of data (aka the size of which is measured in GB and not MB). Trying to use excel/powerbi in such cases would lead to pretty long and painful struggles of programs to try to execute what you want from them and can sometimes straight up lead to the loss of data. But if you just want to clean monthly sales report for instance, then yes Excel/Powerbi would be a more user-friendly option.

  • @OlasunkanmiOluwaseunBaba-jd7qm
    @OlasunkanmiOluwaseunBaba-jd7qmАй бұрын

    THank you for the video When trying to filter using the DNC column, Couldn't we have done df = df[df['Do_Not_COntact'] !== 'Y']

  • @malakilikemokaaa1385
    @malakilikemokaaa13855 ай бұрын

    Python is so fun

  • @G2Chanakya
    @G2Chanakya9 ай бұрын

    My only doubt is, you saw the first 20 rows and decide only \ or .. or _ could be preceding, or only "Nan" or "N/A" is only there in that row, while replacing it. What if the 50th row has "%Mike" as a name or what if "Null" is there one of the columns?? How do we deal with it. Great recap for me other than this. Thank you.

  • @jacobsanointed9981
    @jacobsanointed99817 ай бұрын

    I have a lot of questions. How do we ask questions privately?

  • @user-up3fr8ke7g
    @user-up3fr8ke7g Жыл бұрын

    Nice one Alex. Don't forget to add comments to the code! 🙂

  • @AlexTheAnalyst

    @AlexTheAnalyst

    Жыл бұрын

    lol for sure!

  • @jesustorralba2360
    @jesustorralba2360Ай бұрын

    The pd.read_excel(r[filenamem])

  • @abhinavrastogi1699
    @abhinavrastogi16999 ай бұрын

    Hi Nice explanation. But in this data cleaning you have simply remove NA values. But as per my understanding we need to fill NA values, I am not clear about the logic to fill in. If you can provide video on how to fill NA values it will help us a lot. Thanks Abhinav

  • @Skibadee99
    @Skibadee9910 ай бұрын

    enjoying the video and appreciate this is a beginner video, but for 11:56, i have written a function to handle non alphabetical chars from start and end of column def remove_special_characters(text): if isinstance(text, str): return re.sub(r'^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+$', '', text) else: return text df['Last_Name'] = df['Last_Name'].apply(lambda x: remove_special_characters(x))