Natural Language Processing|BagofWords

Here is the detailed discussion of Bag of words document matrix. We will also be covering how we can can implement with the help of python and nltk.
Github link: github.com/krishnaik06/Natura...
For more videos please check the below url
Deep Learning : goo.gl/iwek57
Statistics in ML :goo.gl/x7mkUH
Feature Engineering:goo.gl/6wiaGt
Data Preprocessing Techniques: goo.gl/YfC9Kc
Machine learning: goo.gl/XhHdCd

Пікірлер: 90

  • @oshtontsen5428
    @oshtontsen54283 жыл бұрын

    Krish, thank you for investing so much time and effort into making all of these videos . I really appreciate it. These videos have greatly helped me jump-start my career in machine learning. I am now a full-time machine learning engineer at a startup and just wanted to mention that you were a huge help in the start of that journey. Cheers.

  • @pranayp1950

    @pranayp1950

    3 жыл бұрын

    Congrats mate. How did you apply to that startup ?

  • @kunalkumar2717
    @kunalkumar27173 жыл бұрын

    this series has been so good. sometimes more than concept understanding, we need things in sequence to let our mind comprehend. this is in order. thanku krish sir!

  • @glenn8781
    @glenn87812 жыл бұрын

    Amazing. Love how you teach from the basic level

  • @nasaruddin36
    @nasaruddin364 жыл бұрын

    Really great tutorial. This was very helpful for me. Thank you very much. Please keep posting quality video like this. Love from BD.

  • @padhiyarkunalalk6342
    @padhiyarkunalalk63424 жыл бұрын

    Sir you and your lectures bott are great. Thanks for making videos for us.

  • @amosavi4730
    @amosavi47302 жыл бұрын

    Thank you very much, dear Krish. Well done videos. Simply explains the complicated subjects.

  • @saurabhbadave6447
    @saurabhbadave64474 жыл бұрын

    Really sir your way of teaching is nice and simple.Great work.

  • @ashishbomble8547
    @ashishbomble85474 жыл бұрын

    guriji ...jis tareke se aap ye samjate ho ...bohot badiya ..guruji...apke hu aabhari hey ....bhagwan apko lambi aayu dede..

  • @vikrantnag86
    @vikrantnag865 жыл бұрын

    Great work Krish. Can you please make a Video on text analytics using R. That will be great help. Thanks

  • @akhilvarma5708
    @akhilvarma57084 жыл бұрын

    Awsome explanation Sir, fan of your explanation.. Hats off

  • @nasreenbanu2245
    @nasreenbanu22452 жыл бұрын

    Sir,Hats off sir for ur efforts.This is the best NLp tutorial.

  • @gh504
    @gh5042 жыл бұрын

    Amazing explanation of each and every line of code

  • @manuchowdary7848
    @manuchowdary78484 жыл бұрын

    Hii sir your videos helped me a lot to understand NLP basics thanku sir create more useful videos like this

  • @balive053
    @balive0533 жыл бұрын

    Your videos are great! Thank you very much!

  • @sandipansarkar9211
    @sandipansarkar92114 жыл бұрын

    Superb video for practice.Thanks

  • @kafeelbutt
    @kafeelbutt4 жыл бұрын

    this channel enchance my skills

  • @seyitahmetozturk721
    @seyitahmetozturk7213 жыл бұрын

    perfect explanation. thanks for your effort :)

  • @chessketeer
    @chessketeer10 ай бұрын

    Thank you. You are a great man.

  • @aadilraf
    @aadilraf3 жыл бұрын

    Thanks Krish! Super helpful!

  • @MechiShaky
    @MechiShaky4 жыл бұрын

    It's a great video Krish ..keep it going . But why don't you use Spacy for NLP , i feel it is more faster than NLTK

  • @rahuldey6369
    @rahuldey63692 жыл бұрын

    7:46 if you notice carefully, the nouns are only getting lemmatized but the verbs are not getting lemmatized. Will not it cause a generalization problem?

  • @harikrishnanm5109
    @harikrishnanm51093 жыл бұрын

    It was really helpful. Can u make videos on Grammer Correction using Rule based methord, Language Models & classifiers.

  • @rajeshwarsehdev2318
    @rajeshwarsehdev23184 жыл бұрын

    Well, Explained !!

  • @sandrasandji6620
    @sandrasandji66204 жыл бұрын

    it is okay, i resolved my problem.thanks

  • @datascience3008
    @datascience30082 жыл бұрын

    Thank you so much krish

  • @REDROSE-be3br
    @REDROSE-be3br2 жыл бұрын

    Could u please make a video for latent dirichlet allocation and how tf-idf + lda together works?

  • @gauravsahani2499
    @gauravsahani24994 жыл бұрын

    Thankyou so much sir!

  • @jinks6887
    @jinks68872 жыл бұрын

    Dhanyavaad Sir

  • @suvarnadeore8810
    @suvarnadeore88103 жыл бұрын

    Thank you krish sir

  • @chetanmundhe8619
    @chetanmundhe86194 жыл бұрын

    very good expalination

  • @debatradas9268
    @debatradas92682 жыл бұрын

    thank you so much

  • @amruthasankar3453
    @amruthasankar3453 Жыл бұрын

    Thankyou sir❤️🔥

  • @satishvavilapalli24
    @satishvavilapalli245 жыл бұрын

    nice expl bro..

  • @siddharthamahendra4980
    @siddharthamahendra49802 жыл бұрын

    Hi Krish Thanks for the informative video series Quick question to you though if we do stopword removal and not get removed doesn’t it completely change the meaning For eg. We have not conquered anyone Conquered anyone. This is very different from the meaning How do we tackle negation words

  • @talharauf3111
    @talharauf31112 жыл бұрын

    Thankz A lot Sir

  • @ayushsingh-qn8sb
    @ayushsingh-qn8sb4 жыл бұрын

    Can you please make a video on regular expression library

  • @iEntertainmentFunShorts
    @iEntertainmentFunShorts3 жыл бұрын

    BOW also may suffer from Curse of dimensionality issue isn't it ? So what we can do for that, Any further improvement over that issue at some extent.

  • @ammarahemadkhan8570
    @ammarahemadkhan85704 жыл бұрын

    Countvectorizer by default removes the punctuations and lowers the alphabets. Then why are we doing it separately? Please respond.

  • @tejashshah5202

    @tejashshah5202

    3 жыл бұрын

    I believe it was just for demonstration purpose that it is a good habit to lower case and remove punctuation and also to demonstrate the functionality of CountVectorizer()

  • @hariprasad1744
    @hariprasad17444 жыл бұрын

    Can you please give numbering to the videos in playlist. That will be useful to us

  • @naderbouchnag3
    @naderbouchnag3 Жыл бұрын

    👏👏👏👏👏👏👏👏

  • @rohandawar484
    @rohandawar4844 жыл бұрын

    Hi Krish, I really enjoyed this playlist, could you also help in the concepts for syntactic processing? Thanks in advance!

  • @akash-lz2dq

    @akash-lz2dq

    3 жыл бұрын

    sir can you tell me why we used the toarray function because we already get the sparse matrix by vectorization and also the toarray is representing the same sparse matrix ?any use of toarray in this ?

  • @shindepratibha31
    @shindepratibha314 жыл бұрын

    Very well explained. I still have a doubt. How .lower() and .split() are helping to clean the text? Can anyone please explain?

  • @mosart03

    @mosart03

    4 жыл бұрын

    As I think for example if you have a word like "Good" and "good" it won't make any sense to treat them and two different words. There is one more library under nltk that is VADER in which you're not recommended to use lower as for VADER "GREAT" and "great" have a different level of excitement in the sentence.

  • @sathishkumar-kp4hk
    @sathishkumar-kp4hk4 жыл бұрын

    want more videos on NLP and deeplearning

  • @bhargavreddy588
    @bhargavreddy5885 жыл бұрын

    Nice Videos Krish. Can you please make a video on how to get the data from web(Google Reviews ect) using python.

  • @krishnaik06

    @krishnaik06

    5 жыл бұрын

    I guess u have to use web scraping

  • @navaneethansuresh8680
    @navaneethansuresh86804 жыл бұрын

    Hi krish , great video. Can u pls explain why we have used fit_transform instead of fit and what is the difference between fit, transform and fit_transform?

  • @cristianovivk4935

    @cristianovivk4935

    4 жыл бұрын

    its actually very simple i assume u know what fit and transform does separately........this both actions are done at same time in fit_transform.

  • @indirajithkv7793
    @indirajithkv77932 жыл бұрын

  • @RaviKumar-sw1wc
    @RaviKumar-sw1wc4 жыл бұрын

    Hi Krish, as u explained @11:00 how we decide it is a +ve or -ve sentence..?

  • @cristianovivk4935

    @cristianovivk4935

    4 жыл бұрын

    bro he meant....that after we have bag of words we can train model n thn test it.. n then model will tell us whthnr its +ve or -ve

  • @lakshmisuvarchalasarva7942
    @lakshmisuvarchalasarva79423 жыл бұрын

    Hi sir, I tried following same sir, new to spyder IDE, I could not see Numpy array bag of words matrix post running the code. Can someone help. Thank you

  • @vinaymn3602
    @vinaymn36024 жыл бұрын

    Can you post video on web scrapping

  • @saicharanreddyy.p6873
    @saicharanreddyy.p68734 жыл бұрын

    For if condition if word not in set or if not word in set How is it executing for both codes pls explain

  • @arijitdiganto4166
    @arijitdiganto41662 жыл бұрын

    I had a question in the last table, we saw the number of occurrences of words in a sentence how can we know which column represents which word? will the columns be in order according to the descending order or occurrence frequency?

  • @Ajay-ku5fn

    @Ajay-ku5fn

    2 жыл бұрын

    Column has nothing to do with frequency order. CountVectoriser creates a map of all the unique words in the corpus. Here in the example 144 words are unique in 31 sentences, so the matrix size is 31*144. The map is represented like {{1,word1},{2,word2},......,{144,word144}} and while creating the vector from sentence it will create array of 144 size for each sentence and if the word at index is present it will write it's frequency or if not present it will write 0 in that array.

  • @RaviKumar-mu4ne

    @RaviKumar-mu4ne

    Жыл бұрын

    @@Ajay-ku5fn 114*

  • @ganeshsubramanian6217
    @ganeshsubramanian62172 жыл бұрын

    This is really good. One question: If Stemming always give history as histori and other meaningless words, why do we event do that? Any way Lemmatization does the job...why cannot we directly do that?

  • @ArshdeepSingh..

    @ArshdeepSingh..

    Жыл бұрын

    Bcz meaning of that word doesn't make a difference in implementation. Historical, history both will stemmed to "histori" . & Counted as one entity while vectorizing

  • @sandrasandji6620
    @sandrasandji66204 жыл бұрын

    i've some problem with this section, when i write review = re.sub('[a-zA-Z]',' ',sentences[i]) all follow step content uniqly stopwords, please i need explanation or help. thanks

  • @praja110

    @praja110

    3 жыл бұрын

    use except for symbol ^

  • @vijaysista3894
    @vijaysista38943 жыл бұрын

    Is Spyder a better IDE than Jupyter ?

  • @kirankumar-sn4db
    @kirankumar-sn4db3 жыл бұрын

    hi krish naik how to check ur data in github

  • @learn-with-lee
    @learn-with-lee4 жыл бұрын

    Hello Krish! Thanks for uploading and explaining in great details , have a question let' s say we have around 500 txt messages or paragraphs , how do we go about it. is there any way ? please reply

  • @08ae6013

    @08ae6013

    4 жыл бұрын

    www.kaggle.com/parulpandey/getting-started-with-nlp-a-general-intro ... have a look at this link. I hope it helps you

  • @monarchbaweja
    @monarchbaweja3 жыл бұрын

    Please use a nice mic. It's become too horrible in the sense of audio quality. Meanwhile content is Perfect, Keep going.

  • @mohakgangwani6109
    @mohakgangwani61093 жыл бұрын

    Sir, you forgot to explain max_features in CountVectorizer.

  • @ganeshrajv130
    @ganeshrajv1304 жыл бұрын

    how the dimension is (31,114)? Please can you Explain ?

  • @cristianovivk4935

    @cristianovivk4935

    4 жыл бұрын

    31 is for tht total sentences and 114 is for the words

  • @chinmaya007
    @chinmaya0074 жыл бұрын

    Nice explanation sir...but when I am implementing your code it's showing some error....can anyone help me please!!

  • @BreakItGaming

    @BreakItGaming

    4 жыл бұрын

    what error you are getting

  • @sandrasandji6620
    @sandrasandji66204 жыл бұрын

    i don't know if it is my nltk's version, but i don't see different between lemmatization and stemming, both return me same thing. thanks

  • @akhandpratap__
    @akhandpratap__3 жыл бұрын

    Lemmatization 6:40

  • @padhiyarkunalalk6342
    @padhiyarkunalalk63424 жыл бұрын

    🤝🤝🤝🤝🤝👌👌👌👌👌

  • @datasciencegyan5145
    @datasciencegyan51454 жыл бұрын

    after creating x why 1 0 and other numbers are showing in different colors

  • @datasciencegyan5145

    @datasciencegyan5145

    4 жыл бұрын

    is it for representation

  • @RejoicingKrishna

    @RejoicingKrishna

    4 жыл бұрын

    ​@@datasciencegyan5145 I think that it is automatically coloured that way in Spyder

  • @cristianovivk4935

    @cristianovivk4935

    4 жыл бұрын

    its just for representation...so that we can spot which 1's and 0's easily

  • @bijaynayak6473
    @bijaynayak64734 жыл бұрын

    Hello Krish, if we convert the words to upper to lower then there will be a situation where US and us meaning will be the same so how to handle such situations?

  • @rajsinghmaan3095

    @rajsinghmaan3095

    4 жыл бұрын

    My suggestion would be create a list of words to be ignored from lowercase

  • @cristianovivk4935

    @cristianovivk4935

    4 жыл бұрын

    as far as i know its bttr not to use short forms so instead of US ....USA will make more sense also us word is less likely to make any impact....it will b removed by stopwords......

  • @darsh727
    @darsh7273 жыл бұрын

    You sound like MSD