How To Compare CSV Files For Differences in Python

Ғылым және технология

Do you have a need to understand how to compare two CSV files for differences? In this video tutorial, we look at comparing CSV files with Python pandas. When you want to compare CSV files for differences, there can be a number of options and we show three different ways to approach this.
⏲⏲⏲TIMESTAMPS⏲⏲⏲
Beginning 00:00
Problem overview 00:21
Reviewing output 01:23
An important thing to note 03:10
Code review 03:36

################ Lets be Social! ##################
Website - dataanalyticsireland.ie/
Twitter - / dataanalyticsi1
Facebook - / dataanalyticsirl
Linkedin - / data-analytics-ireland
Pinterest - www.pinterest.ie/dataanalytic...
#CSV #comparefiles #dataanalyticsireland #dataanalytics

Пікірлер: 31

  • @Ricled100
    @Ricled1002 жыл бұрын

    Great video! I am new to python and really enjoyed you taking the time to explain your code and how it works! Looking forward to more videos

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Thank you so much, I'll be posting more soon, just a bit tied up with something at the moment, probably be next week!

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI kzread.infocommunity

  • @finnmccool8671
    @finnmccool86712 жыл бұрын

    Great tips.

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Thank you, your welcome!

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI kzread.infocommunity

  • @CMondi27
    @CMondi272 жыл бұрын

    May I know what would be the best approach to find the differences between two Excel or CSV file if they contain duplicate ids in each files. For instance, Excel 'A' has 123 as an Id but it is repeated 5 times with different column value in Excel A, where as Excel B with 123 id has 7 rows with different column values. I'm really searching to find the difference for this scenarios. Thanks.

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    I'd have to research this for you, but my initial thoughts would be to run a script to correct the ids you want, then when they all unique, make that column a primary key, so duplication will not happen going forward??

  • @CMondi27

    @CMondi27

    2 жыл бұрын

    @@DataAnalyticsIreland Umm, that's right, I was able to create a script which works fine for unique id's, but id's being duplicate in large number is the one I couldn't able to crack it yet. Would be really glad if you able to get any insight on it. Thanks.

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Are you able to supply your logic, with some sample made up data please and will have a look?

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI kzread.infocommunity

  • @findthetruth3021
    @findthetruth30212 жыл бұрын

    Can you please find the percentage of discrepancy/mismatch between the two databases? for example, I can say 30% of the data1(csv1) is different than data2(csv2). Is it possible to do that?

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Thanks for your message! To confirm for the files you are comparing, not after you load them into a database?

  • @findthetruth3021

    @findthetruth3021

    2 жыл бұрын

    @@DataAnalyticsIreland yes let's say two CSV files but with 10 columns and 300 rows of each of them. Once we done with the comparison, then we need to indicate or mention the percentage of the difference between them. For example I am saying that through the comparison I found out the first CSV was 50% different that the second CSV this needs to be decided based on the comparison we have done before. Thanks again for your prompt answer. If you didn't get my message again I am so happy to get in touch with you in Skype and inform you even share my questions test with you. Have a great day.

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Hi Sorry for the delay. I have the code tweaked for this, will be doing a video on it hopefully tomorrow. Essentially the output will show the percentage match as a number ( i.e 50, 10,100 etc) in a data frame. This can then be used as you please. Hope this works for you. DAI

  • @yeturuvenkataarunkumarredd297
    @yeturuvenkataarunkumarredd2972 жыл бұрын

    How do we configure both the csv files which are located in two different unix paths..

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Hi, sorry for the delay, I was away and only getting a chance to look at it now. I don't personally use any Unix system, but found this, and wonder is it useful to you? www.oreilly.com/library/view/python-standard-library/0596000960/ch13s04.html Data Analytics Ireland

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI kzread.infocommunity

  • @sarvansps
    @sarvansps2 жыл бұрын

    Since we know the difference is in year column.. we have checked only for that column! What if we have 100 columns in those two csv files and how to compare the column values ?

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Hi! I was looking at this, and then realised that possibly method 2 above might give you your answer? What it will do is show you the differences between the two data frames, and only print out those rows that have differences on them. You can take those rows from each file and do a comparison. In my example I have compared the first to the second file, what you could do is: (A) Create the output from the first comparison, and save it to a new data frame ( say df_a_diff for example) (B) Repeat step A above but in reverse, and then call the second one df_b_diff. (C) Now compare these two data frames to see where your differences are. Does this help? Data Analytics Ireland

  • @sarvansps

    @sarvansps

    2 жыл бұрын

    @@DataAnalyticsIreland Thanks! It works for me 👍🏻

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    @@sarvansps excellent good to hear!

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    2 жыл бұрын

    Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI kzread.infocommunity

  • @Indrail4k
    @Indrail4k Жыл бұрын

    Giving Key Error with method 3 in get_loc raise KeyError(key) from err

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    Жыл бұрын

    Hi, I need to see the full code if you can share, so can investigate further. Thanks, Data Analytics Ireland

  • @hemant943
    @hemant943 Жыл бұрын

    Can you please share your code

  • @hemant943

    @hemant943

    Жыл бұрын

    Share plz

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    Жыл бұрын

    Hi, thanks for visiting the channel, have a look at this page, if you have any questions come back! dataanalyticsireland.ie/2021/08/07/how-to-compare-csv-files-for-differences/

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    Жыл бұрын

    Just did, can you see it?!

  • @hemant943

    @hemant943

    Жыл бұрын

    @@DataAnalyticsIreland thank you so much It was really Helpfull for me

  • @DataAnalyticsIreland

    @DataAnalyticsIreland

    Жыл бұрын

    Your welcome, glad I could help you!

Келесі