gsub Function in R - Converting Revenue Data to Numeric

Ғылым және технология

Revenue data is often tracked with special characters, namely dollar signs and commas. While this is great for visually representing the data, it's not so great when we try to run numeric calculations on that revenue data.
You may think we can just use as numeric in r (as.numeric() function), but if the variable you're trying to convert to a number includes those dollar signs and commas, you'll just be left with NA values.
In this video, I show the use of the handy gsub() function to remove characters from variables in R. After doing that, we can then run our as.numeric() function as usual and convert our revenue data to a numeric.
While this example uses a simple revenue data set, it can be applied across the board, whenever you need to remove characters - even if you don't plan on converting to a numeric value later!
Finally, we combined it all into one statement inside of dplyr to help make our R code more compact and more readable.

Пікірлер: 6

  • @vanakornsirijongprasert1726
    @vanakornsirijongprasert17269 ай бұрын

    Thank you, this tutorial is clear and so easy to follow. Big thumb up!

  • @danieldbonneau

    @danieldbonneau

    9 ай бұрын

    Thanks for the feedback, glad I could help!

  • @CoachPegasus
    @CoachPegasus Жыл бұрын

    if we need to convert in date column from from 6-1-2022 to 6/1/2022... How can we do it? with gsub it shows as NAN..

  • @danieldbonneau

    @danieldbonneau

    Жыл бұрын

    I'm not sure why gsub() isn't working for that. Make sure you don't have any conversion to numeric occurring after the code. It would be weird for a character substitution to produce an NA in this scenario. A roundabout alternative to restructure this would just be to paste together the individual components. After making sure it's a date column and not a character column, you can do something like: library(lubridate) # Use this to read in month, day, year functions new_date

  • @CoachPegasus
    @CoachPegasus Жыл бұрын

    2012-06-04 WM$Date = sub("\\-", "/", WM$Date) They turn into NAN NA NA NA "2012/06-04" NA NA thanks

  • @danieldbonneau

    @danieldbonneau

    Жыл бұрын

    You'll want to make sure you're using gsub(), and not sub(). sub() will only replace the first occurrence, while gsub() will replace it "globally". That change will make sure the "2012/06-04" is properly written as "2012/06/04". As for the NA's, it's hard to say without seeing your data further, there may be something else that's causing that to occur. *Edit: Make sure there's no conversion to numeric after you do the substituion. It would be strange for a character substitution to produce an NA in this case, so it may be something else if you're running additional code. Depending on what you need the dates for and why you're trying to make this change, you may want to look into the lubridate package and one of it's functions. Happy to help further if you want to share more about your specific use case so I can help you get this all squared away.

Келесі