Natural Language Processing|BagofWords
Here is the detailed discussion of Bag of words document matrix. We will also be covering how we can can implement with the help of python and nltk.
Github link: github.com/krishnaik06/Natura...
For more videos please check the below url
Deep Learning : goo.gl/iwek57
Statistics in ML :goo.gl/x7mkUH
Feature Engineering:goo.gl/6wiaGt
Data Preprocessing Techniques: goo.gl/YfC9Kc
Machine learning: goo.gl/XhHdCd
Пікірлер: 90
Krish, thank you for investing so much time and effort into making all of these videos . I really appreciate it. These videos have greatly helped me jump-start my career in machine learning. I am now a full-time machine learning engineer at a startup and just wanted to mention that you were a huge help in the start of that journey. Cheers.
@pranayp1950
3 жыл бұрын
Congrats mate. How did you apply to that startup ?
this series has been so good. sometimes more than concept understanding, we need things in sequence to let our mind comprehend. this is in order. thanku krish sir!
Amazing. Love how you teach from the basic level
Really great tutorial. This was very helpful for me. Thank you very much. Please keep posting quality video like this. Love from BD.
Sir you and your lectures bott are great. Thanks for making videos for us.
Thank you very much, dear Krish. Well done videos. Simply explains the complicated subjects.
Really sir your way of teaching is nice and simple.Great work.
guriji ...jis tareke se aap ye samjate ho ...bohot badiya ..guruji...apke hu aabhari hey ....bhagwan apko lambi aayu dede..
Great work Krish. Can you please make a Video on text analytics using R. That will be great help. Thanks
Awsome explanation Sir, fan of your explanation.. Hats off
Sir,Hats off sir for ur efforts.This is the best NLp tutorial.
Amazing explanation of each and every line of code
Hii sir your videos helped me a lot to understand NLP basics thanku sir create more useful videos like this
Your videos are great! Thank you very much!
Superb video for practice.Thanks
this channel enchance my skills
perfect explanation. thanks for your effort :)
Thank you. You are a great man.
Thanks Krish! Super helpful!
It's a great video Krish ..keep it going . But why don't you use Spacy for NLP , i feel it is more faster than NLTK
7:46 if you notice carefully, the nouns are only getting lemmatized but the verbs are not getting lemmatized. Will not it cause a generalization problem?
It was really helpful. Can u make videos on Grammer Correction using Rule based methord, Language Models & classifiers.
Well, Explained !!
it is okay, i resolved my problem.thanks
Thank you so much krish
Could u please make a video for latent dirichlet allocation and how tf-idf + lda together works?
Thankyou so much sir!
Dhanyavaad Sir
Thank you krish sir
very good expalination
thank you so much
Thankyou sir❤️🔥
nice expl bro..
Hi Krish Thanks for the informative video series Quick question to you though if we do stopword removal and not get removed doesn’t it completely change the meaning For eg. We have not conquered anyone Conquered anyone. This is very different from the meaning How do we tackle negation words
Thankz A lot Sir
Can you please make a video on regular expression library
BOW also may suffer from Curse of dimensionality issue isn't it ? So what we can do for that, Any further improvement over that issue at some extent.
Countvectorizer by default removes the punctuations and lowers the alphabets. Then why are we doing it separately? Please respond.
@tejashshah5202
3 жыл бұрын
I believe it was just for demonstration purpose that it is a good habit to lower case and remove punctuation and also to demonstrate the functionality of CountVectorizer()
Can you please give numbering to the videos in playlist. That will be useful to us
👏👏👏👏👏👏👏👏
Hi Krish, I really enjoyed this playlist, could you also help in the concepts for syntactic processing? Thanks in advance!
@akash-lz2dq
3 жыл бұрын
sir can you tell me why we used the toarray function because we already get the sparse matrix by vectorization and also the toarray is representing the same sparse matrix ?any use of toarray in this ?
Very well explained. I still have a doubt. How .lower() and .split() are helping to clean the text? Can anyone please explain?
@mosart03
4 жыл бұрын
As I think for example if you have a word like "Good" and "good" it won't make any sense to treat them and two different words. There is one more library under nltk that is VADER in which you're not recommended to use lower as for VADER "GREAT" and "great" have a different level of excitement in the sentence.
want more videos on NLP and deeplearning
Nice Videos Krish. Can you please make a video on how to get the data from web(Google Reviews ect) using python.
@krishnaik06
5 жыл бұрын
I guess u have to use web scraping
Hi krish , great video. Can u pls explain why we have used fit_transform instead of fit and what is the difference between fit, transform and fit_transform?
@cristianovivk4935
4 жыл бұрын
its actually very simple i assume u know what fit and transform does separately........this both actions are done at same time in fit_transform.
❤
Hi Krish, as u explained @11:00 how we decide it is a +ve or -ve sentence..?
@cristianovivk4935
4 жыл бұрын
bro he meant....that after we have bag of words we can train model n thn test it.. n then model will tell us whthnr its +ve or -ve
Hi sir, I tried following same sir, new to spyder IDE, I could not see Numpy array bag of words matrix post running the code. Can someone help. Thank you
Can you post video on web scrapping
For if condition if word not in set or if not word in set How is it executing for both codes pls explain
I had a question in the last table, we saw the number of occurrences of words in a sentence how can we know which column represents which word? will the columns be in order according to the descending order or occurrence frequency?
@Ajay-ku5fn
2 жыл бұрын
Column has nothing to do with frequency order. CountVectoriser creates a map of all the unique words in the corpus. Here in the example 144 words are unique in 31 sentences, so the matrix size is 31*144. The map is represented like {{1,word1},{2,word2},......,{144,word144}} and while creating the vector from sentence it will create array of 144 size for each sentence and if the word at index is present it will write it's frequency or if not present it will write 0 in that array.
@RaviKumar-mu4ne
Жыл бұрын
@@Ajay-ku5fn 114*
This is really good. One question: If Stemming always give history as histori and other meaningless words, why do we event do that? Any way Lemmatization does the job...why cannot we directly do that?
@ArshdeepSingh..
Жыл бұрын
Bcz meaning of that word doesn't make a difference in implementation. Historical, history both will stemmed to "histori" . & Counted as one entity while vectorizing
i've some problem with this section, when i write review = re.sub('[a-zA-Z]',' ',sentences[i]) all follow step content uniqly stopwords, please i need explanation or help. thanks
@praja110
3 жыл бұрын
use except for symbol ^
Is Spyder a better IDE than Jupyter ?
hi krish naik how to check ur data in github
Hello Krish! Thanks for uploading and explaining in great details , have a question let' s say we have around 500 txt messages or paragraphs , how do we go about it. is there any way ? please reply
@08ae6013
4 жыл бұрын
www.kaggle.com/parulpandey/getting-started-with-nlp-a-general-intro ... have a look at this link. I hope it helps you
Please use a nice mic. It's become too horrible in the sense of audio quality. Meanwhile content is Perfect, Keep going.
Sir, you forgot to explain max_features in CountVectorizer.
how the dimension is (31,114)? Please can you Explain ?
@cristianovivk4935
4 жыл бұрын
31 is for tht total sentences and 114 is for the words
Nice explanation sir...but when I am implementing your code it's showing some error....can anyone help me please!!
@BreakItGaming
4 жыл бұрын
what error you are getting
i don't know if it is my nltk's version, but i don't see different between lemmatization and stemming, both return me same thing. thanks
Lemmatization 6:40
🤝🤝🤝🤝🤝👌👌👌👌👌
after creating x why 1 0 and other numbers are showing in different colors
@datasciencegyan5145
4 жыл бұрын
is it for representation
@RejoicingKrishna
4 жыл бұрын
@@datasciencegyan5145 I think that it is automatically coloured that way in Spyder
@cristianovivk4935
4 жыл бұрын
its just for representation...so that we can spot which 1's and 0's easily
Hello Krish, if we convert the words to upper to lower then there will be a situation where US and us meaning will be the same so how to handle such situations?
@rajsinghmaan3095
4 жыл бұрын
My suggestion would be create a list of words to be ignored from lowercase
@cristianovivk4935
4 жыл бұрын
as far as i know its bttr not to use short forms so instead of US ....USA will make more sense also us word is less likely to make any impact....it will b removed by stopwords......
You sound like MSD