Text Representation | NLP Lecture 4 | Bag of Words | Tf-Idf | N-grams, Bi-grams and Uni-grams

In natural language processing, text representation plays a vital role in capturing the meaning and structure of textual data. This video explores three fundamental text representation techniques: Bag of Words, Tf-Idf (Term Frequency-Inverse Document Frequency), and N-grams (Uni-grams and Bi-grams). Each method has its unique approach to encoding and extracting information from text, making it essential for data scientists and NLP enthusiasts to grasp these concepts.
Assignment - colab.research.google.com/dri...
============================
Do you want to learn from me?
Check my affordable mentorship program at : learnwith.campusx.in
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
✨ Hashtags✨
#TextRepresentation #BagOfWords #TfIdf #NGrams #NLP #DataScience #machinelearning
⌚Time Stamps⌚
00:00 - Intro
01:10 - Plan of Attack
02:56 - Introduction
03:25 - What is feature extraction from text?
04:49 - Why do we need feature extraction?
07:30 - Why is this difficult to do?
11:00 - What is the core idea behind this?
12:12 - What are the Techniques?
14:24 - Common Terms
18:00 - One Hot Encoding
33:25 - Bag of Words
57:45 - N-grams/Bi-grams/Tri-grams
01:13:45 - Benefits of N Grams
01:14:25 - Disadvantages N Grams
01:16:34 - Tf-Idf
01:38:46 - Custom Features
01:41:45 - Assignment

Пікірлер: 131

  • @sauravagarwal8928
    @sauravagarwal8928Ай бұрын

    This is one of the legendary videos I have seen. I’m into SEO and trying to wrap my head around semantic SEO. Some experts in the semantic SEO industry use technical jargons and fail to explain how semantics engines like Google work. But your series helped me understand every single bit of it. I don’t know python coding, but now I understand how Google algorithms work to rank any document. I understand the type of computation they do behind the screen. The video is pure gold! I mean it! This helps me as a search engine optimiser and makes me better understand machine and human interaction. Thank you so much 🙏 ☺️

  • @raj4624
    @raj46242 жыл бұрын

    oh bhai.. unbelievable... 2hrs of content......genuinely dil se shukriya sir appko....

  • @art4eigen93
    @art4eigen932 жыл бұрын

    This playlist is necessary for basic to advanced NLP engineers. Please do upload the complete series Sir. Your contribution is life saving.

  • @ashwanibhardwaj4930
    @ashwanibhardwaj49302 жыл бұрын

    Please carryon this series and we would like to learn advance NLP using deep learning/langauge models,sota techniques once basic NLP is done.

  • @bishowlamsal7319
    @bishowlamsal73192 жыл бұрын

    Huge respect sir, You deserve more than million followers. Love from Nepal ❤️❤️❤️❤️

  • @naveedkaimkhami2695
    @naveedkaimkhami26952 ай бұрын

    I was confused to select word embedding technique for my fyp project and found this video life saving. Thank youu soo muchh !!!

  • @shubhamgattani5357
    @shubhamgattani53573 ай бұрын

    I cannot find any reason not to like this video. It's amazing!

  • @MuhammadAfzal-xl7wd
    @MuhammadAfzal-xl7wd2 ай бұрын

    thank you so much. you explain the concept in a very very simple way. once again thank u so much 🙂🙂🙂🙂

  • @rachitsingh4913
    @rachitsingh49132 жыл бұрын

    For me You are the best data science teacher. ❤❤❤❤❤

  • @hvjmlops
    @hvjmlops2 жыл бұрын

    Respect for your hardwork

  • @amitkumar2005
    @amitkumar20053 ай бұрын

    Superb explanations !

  • @siyays1868
    @siyays1868 Жыл бұрын

    Thanku so much sir for a wonderful explaination. Hatts off to u always!

  • @harisumanth
    @harisumanth2 жыл бұрын

    Almost 2 hours...Respect!

  • @asifpervezpolok2243
    @asifpervezpolok22432 жыл бұрын

    the best tutorial i found from you.

  • @gauravverma4433
    @gauravverma44332 жыл бұрын

    It was awesome .. love you sir... thanx for your efforts

  • @somyarathee
    @somyarathee2 жыл бұрын

    Best series on NLP

  • @bhanu0925
    @bhanu09252 жыл бұрын

    Thank you for another great session

  • @BTStechnicalchannel
    @BTStechnicalchannel Жыл бұрын

    Your explanation is so great!! Vo bhi hindi me. Thanks a lot!!💙

  • @abhinavkr5131
    @abhinavkr5131 Жыл бұрын

    Bohot tutorials dekha but aap best ho sir

  • @piyushpathak7311
    @piyushpathak73112 жыл бұрын

    I am following your Ml playlist sir you have great explanation, sir please complete xgboost and DBSCAN algorithm in this playlist and please start series on Deep learning..

  • @AkashBhandwalkar

    @AkashBhandwalkar

    2 жыл бұрын

    I'm following it as well

  • @campusx-official

    @campusx-official

    2 жыл бұрын

    Will do it in January

  • @AkashBhandwalkar

    @AkashBhandwalkar

    2 жыл бұрын

    @@campusx-official woaahhh! Thank you sooooo much! This made my day! 🥳🥳🥳

  • @749srobin

    @749srobin

    2 жыл бұрын

    @@campusx-official which year january sir ?

  • @debojitmandal8670

    @debojitmandal8670

    Жыл бұрын

    @@campusx-official hi sir based on your example using tri gram the vocabulary is decreasing to 5 so i dont follow your this part when u said the vocabulary increases as the n gram increases

  • @machinelearningspace6977
    @machinelearningspace69772 жыл бұрын

    Teaching style awesome... Go ahead.

  • @deeptisingh93
    @deeptisingh932 жыл бұрын

    Thank you sir...Really itne easy way me smjhane ke liye

  • @IRFANSAMS
    @IRFANSAMS2 жыл бұрын

    Sir..thank you for the wonderful video

  • @diwakargupta0
    @diwakargupta0 Жыл бұрын

    Awesome content and explanation sir 👐

  • @uditsaurabh
    @uditsaurabh5 ай бұрын

    awesome video

  • @shivamgarg3890
    @shivamgarg3890 Жыл бұрын

    This channel is highly underrated...

  • @siddharth4251
    @siddharth425111 ай бұрын

    Thank you very much Nitish sir!

  • @vivekathilkar6555
    @vivekathilkar65552 жыл бұрын

    Appreciate your efforts

  • @shahmuhammadraditrahaman9904
    @shahmuhammadraditrahaman99042 жыл бұрын

    Incredibile ❤️

  • @mohaiminrahat4974
    @mohaiminrahat49742 жыл бұрын

    Congratulations sir for 10K Subscribers.

  • @vaibhavmoharkar2349
    @vaibhavmoharkar23495 ай бұрын

    THANKYOU SIR

  • @rajeevranjan5007
    @rajeevranjan50072 жыл бұрын

    Great Video Sir.

  • @daljeetsinghranawat6359
    @daljeetsinghranawat63596 ай бұрын

    KUDOS TO YOU SIR ..............loving this series

  • @HarshVardhan-jj9xh
    @HarshVardhan-jj9xh5 ай бұрын

    Thanks a lot Sir. My Phd is on NLP only .your videos helps me a lot in understanding overall concepts . Your efforts are very sincere and dedicated 💯

  • @forgotabhi

    @forgotabhi

    5 ай бұрын

    I am getting started with NLP :) I am still doing my UG can you tell me your experience in the field?

  • @HarshVardhan-jj9xh

    @HarshVardhan-jj9xh

    5 ай бұрын

    @@forgotabhi Its amazing field and day by day u will came to know new models and architectures.

  • @hritikroshanmishra3630

    @hritikroshanmishra3630

    3 ай бұрын

    @@forgotabhi which college?

  • @learnfromIITguy
    @learnfromIITguy Жыл бұрын

    wow , after watching this video, I am confident on feature engineering

  • @Sara-fp1zw
    @Sara-fp1zw Жыл бұрын

    Congratulations on 36K subs, soon we gonna cross 100K IA :)

  • @shahu6015
    @shahu6015 Жыл бұрын

    Congratulation for 100K subscribers in advance.

  • @basit-qx7ys
    @basit-qx7ysАй бұрын

    i love the way sir explains, i am not able to grasp the fundamental concepts but not able to imagine myself to code for NLP without any guidance ,Any suggestions what other materials and sources I should follow ?

  • @gajanankhapre2425
    @gajanankhapre24252 жыл бұрын

    Very good flow sir . Kindly upload next in NLP series

  • @hitinyadav3321
    @hitinyadav33212 жыл бұрын

    Amazing video

  • @shaiksalavuddin5976
    @shaiksalavuddin59762 жыл бұрын

    Thank you🌹

  • @nikhiljagtap1669
    @nikhiljagtap16692 жыл бұрын

    at 55:24 , BOW doesn't consider the sequence of sentence but since we gonna perform Tokenization before this, we gonna lose some words that'd mess the sequence anyway. isnt that right?

  • @sachi-4750
    @sachi-47502 жыл бұрын

    Thank you so much sir😊🙏

  • @ronylpatil
    @ronylpatil2 жыл бұрын

    Many Many Congratulations to you Sir for 10k Subs🥳🥳🥳

  • @hritikroshanmishra3630

    @hritikroshanmishra3630

    3 ай бұрын

    😁😁😁😁 182 k

  • @gautampatadiya6096
    @gautampatadiya60963 ай бұрын

    well done buddy #nlp #nlptuts #nlpeasytuts

  • @takeshrao733
    @takeshrao7333 ай бұрын

    Very nice and very good start point. Can you pls suggest which text representation algo suited for log analysis.

  • @maukaladka4100
    @maukaladka4100 Жыл бұрын

    Hello sir, I have had doubt on this topic how conversion is taking place, watch lots of video read lot's of blogs but no one can make me understand like u did. Hat's off to u keep up the great work.

  • @chauhanabhishek9593
    @chauhanabhishek95932 жыл бұрын

    Thank u sir .

  • @HimanshuSharma-we5li
    @HimanshuSharma-we5li2 жыл бұрын

    You are a 💎.

  • @avishinde2929
    @avishinde2929 Жыл бұрын

    thank you sir ji

  • @mehulsuthar7554
    @mehulsuthar75542 ай бұрын

    i have one doubt can we normalize the vector engineering features? I think normalizing the vector will still contain the info that was previously their but in the lower scale for reducing computation. let me know if this is the correct approach

  • @user-dd3te4rh8j
    @user-dd3te4rh8j11 ай бұрын

    Feature extraction from text / text representation/ text vectorization - changing text to numbers so that model can understand Bag of words -

  • @balrajprajesh6473
    @balrajprajesh6473 Жыл бұрын

    2 hours of pure diamond mine.

  • @GhostRider....
    @GhostRider.... Жыл бұрын

    very nice explanation sir

  • @richaaggarwal07
    @richaaggarwal072 жыл бұрын

    Please make more videos on NLP !!!

  • @ayushroy6208
    @ayushroy62082 жыл бұрын

    Sir suppose length of sentences are unequal..... Tab kya padding ke alava aur koi option nahi hai in case of Tfidf Or ngrams etc?

  • @gauravlochab9614
    @gauravlochab9614 Жыл бұрын

    Can you add RNN, LSTMs, and modern NLP using transformers!? Loved the content. Huge respect. Ps banjara market ka lamp! XD

  • @ronylpatil
    @ronylpatil2 жыл бұрын

    Sir NLP series is really amazing, please recommend me best book for NLP because in few days I have an interview which will totally on NLP.

  • @tusarmundhra5560
    @tusarmundhra55608 ай бұрын

    awesome

  • @technicalhouse9820
    @technicalhouse9820Ай бұрын

    maza aya sir qasam sa

  • @SatyaIITI
    @SatyaIITI Жыл бұрын

    Hi Nitis sir, where can we get these notes in pdf format.so that it will be helpful while doing revision.

  • @230489shraddha
    @230489shraddha2 жыл бұрын

    Thanks a lot sir .... Can you also upload a video on RNN & LSTM.

  • @ranjithkumar947
    @ranjithkumar9474 ай бұрын

    for tf idf, campusx term came 4 times but sir you considered it only thrice any reason for it? Anyway there we are getting +1 in realtime. Could you please reply me for this?

  • @solvinglife6658
    @solvinglife6658 Жыл бұрын

    Sir please continue the playlist!!!!!

  • @ridoychandraray2413
    @ridoychandraray2413 Жыл бұрын

    Thank you sir?

  • @whothefisyash
    @whothefisyash22 күн бұрын

    fr maza aagya ekdm

  • @749srobin
    @749srobin2 жыл бұрын

    sir ji , removing stopwords took 3hours 26 min , tokenization karne mein ghabraahat c ho rhi hai

  • @mihirnaik3383
    @mihirnaik33832 жыл бұрын

    Hi Buddy, Great content! This video cleared all my doubts regarding BoW and TF IDF🙌 Are you going to take any NLP projects in future based on Machine Learning models?

  • @campusx-official

    @campusx-official

    2 жыл бұрын

    Yes

  • @mihirnaik3383

    @mihirnaik3383

    2 жыл бұрын

    @@campusx-official Thank you!

  • @sidindian1982

    @sidindian1982

    Жыл бұрын

    @@campusx-official Sir codes missing in the list ... BOW , TFIDF ..pls share

  • @Howto-ty4ru
    @Howto-ty4ru Жыл бұрын

    cv.fit_transform(df['eng']) How can we apply fit_transform on text? I think I do not understand this part

  • @hari8568
    @hari85684 ай бұрын

    The example you gave for bigrams better than uni gram being able to differentiate the 2 sentences in vector space doesn't really make much sense to me, suppose instead of not I used a synonym of "very "instead like "extremely " then these 2 sentences should be similar in vector space but bigram model will say its different, so its actually not handling the word not rather just handling an unknown word differently

  • @jai40403
    @jai404035 ай бұрын

    Where can I get these notes ?

  • @avinashbhardwaz5717
    @avinashbhardwaz57177 ай бұрын

    Sir , i dont understand for idea of tf idf at 1:20:09. Since you said jo word document mein jyada hain but corpua mein kam hain. I confused in that way that how its possible. Since corpus mein to hoga hi hoga jyaga or equal.kindly clarify sir.

  • @manavahuja4418
    @manavahuja44182 жыл бұрын

    Sir will you make a video for nlp project....something good for resume..?

  • @vijayraghuwanshi4486
    @vijayraghuwanshi448611 ай бұрын

    I have tried the assignment on kaggle if any one tried and want to discuss please let me know.

  • @saumyakumari3441
    @saumyakumari34412 жыл бұрын

    Many many congratulations for 10k sub. 🎊🎊🎊

  • @campusx-official

    @campusx-official

    2 жыл бұрын

    Thanks

  • @campusx-official

    @campusx-official

    2 жыл бұрын

    Thanks

  • @yashjain6372
    @yashjain6372 Жыл бұрын

    best

  • @bananamaker4877
    @bananamaker487710 ай бұрын

    Liked and shared your video. Subscribed your channel. What else can I do for you. You are doing a great job.

  • @furry2fun
    @furry2fun11 ай бұрын

    share the link for collab notebook

  • @ananyakumari6807
    @ananyakumari68072 жыл бұрын

    Sir, can you please share your code notebook?

  • @gautamkushwaha8724
    @gautamkushwaha87248 ай бұрын

    why don't you keep the resource in the description, like the code link..

  • @joyeetamallik5063
    @joyeetamallik50632 жыл бұрын

    Thank you so much for such wonderful vedio. Sir Do u take any online classes as well?

  • @campusx-official

    @campusx-official

    2 жыл бұрын

    No, not right now

  • @campusx-official

    @campusx-official

    2 жыл бұрын

    No, not right now

  • @MrKB_SSJ2
    @MrKB_SSJ2 Жыл бұрын

    23:00

  • @datagyan5489
    @datagyan54892 жыл бұрын

    How to join Mentorship program

  • @sidindian1982
    @sidindian1982 Жыл бұрын

    1:23:40 - Campusx - word in IDF is repeated 4 times sir , .. Loge( 4/4) = 0

  • @Tusharchitrakar

    @Tusharchitrakar

    3 ай бұрын

    It's repeated only 3 times bro

  • @yashgaming827
    @yashgaming827 Жыл бұрын

    sir please share the one note link

  • @backclover9651
    @backclover96512 жыл бұрын

    Bag of words minuets?

  • @MrKB_SSJ2
    @MrKB_SSJ2 Жыл бұрын

    1:38:48

  • @Sara-fp1zw
    @Sara-fp1zw Жыл бұрын

    hi nitish sir, im faceing some problem with spell checker function def spell_correct(text): return TextBlob(text).correct().string it is taking so much on assignment dataframe, is there any fastest approach to check and correct spelling in log(n) times ?

  • @sidindian1982

    @sidindian1982

    Жыл бұрын

    Run the file in google collab ,., because of GPU ... runs faster ...

  • @rushikeshmalpe3715
    @rushikeshmalpe37152 жыл бұрын

    Deep learning start Karo sir please 👍👍👍❤️

  • @IRFANSAMS
    @IRFANSAMS2 жыл бұрын

    Please teach us BERT ALGORITHM

  • @user-iv5fr9mr2n
    @user-iv5fr9mr2n11 ай бұрын

    54:00

  • @nabinadhikari5426
    @nabinadhikari5426 Жыл бұрын

    Please share this notebook source file to us !

  • @abdullahilawal3220
    @abdullahilawal322010 ай бұрын

    You teaching method is good but you making it local only to Indian student not International for all to use. Please Make a new version of all your videos on NLP to English so everyone can learn from,🙏

  • @MrKB_SSJ2
    @MrKB_SSJ2 Жыл бұрын

    40:34

  • @forgotabhi
    @forgotabhi5 ай бұрын

    when i perform bagofwords method like the video in kaggle notebook on the imdb data it says memory exceeded and just restarts the notebook :( what to do?

  • @user-qq7qi5kk5u

    @user-qq7qi5kk5u

    2 ай бұрын

    same issue i tried in my machine but it said memory exceeded it need 18.1Gib after applying ohe

  • @forgotabhi

    @forgotabhi

    2 ай бұрын

    @@user-qq7qi5kk5u guess we're poor lol

  • @SulemanKhan-nk4lc
    @SulemanKhan-nk4lc2 жыл бұрын

    Sir please recommend some ML books

  • @campusx-official

    @campusx-official

    2 жыл бұрын

    kzread.info/dash/bejne/pXWuusGmdpvdd9I.html

  • @rafibasha4145
    @rafibasha41452 жыл бұрын

    Please complete NLP,Interview series and ML series

  • @nikeshmali8506
    @nikeshmali85065 ай бұрын

    how can i get OneNote notes

  • @ritwiksingh4937

    @ritwiksingh4937

    21 күн бұрын

    by writing in ur notebook

  • @nikhiltiwari1616
    @nikhiltiwari1616 Жыл бұрын

    Sir, please share the lectures python notebook file/

  • @kislaykrishna8918
    @kislaykrishna89182 жыл бұрын

    Sir, my question is: I have list of entities and a text.Like this: List=["Data Scientist", "Bihar", "Krishna"] Text=" I am Krishna. I am from Bihar . I want to be a Data Scientist" I want result like: "I am [Entity]Krishna[Entity]. I am from [Entity]Bihar[Entity] . I want to be a [Entity]Data Scientist[Entity]" Please help me with code to get this result.Thanx🙏

  • @priyaravind18

    @priyaravind18

    2 жыл бұрын

    Did you get the code?

  • @kislaykrishna8918

    @kislaykrishna8918

    2 жыл бұрын

    @@priyaravind18 List=["Data Scientist", "Bihar", "Krishna"] text = ' I am Krishna. I am from Bihar. I want to be a Data Scientist' for entity in List: if entity in List: text = text.replace(entity,'[Entity]'+entity+'[Entity]') print(text)