Text Classification using Transformers | BERT | Custom Dataset | with code
Ғылым және технология
Hello friends...,
#textclassification #transformers
In this video I will show you, how easily you can train a Text Classification or Sentiment Analysis model using the Transformers Python package, the Transformers Python package is straightforward to use.
In this video, I will train a BERT Base Uncased model on IMDB movie review dataset.
Transformers Python package
huggingface.co/docs/transform...
Github repository for the code
github.com/RajKKapadia/Transf...
You can contact me to build any Chatbot and AI/ML/DL work.
My Fiverr profile - www.fiverr.com/rajkkapadia
My Upwork profile - www.upwork.com/freelancers/~0...
My LinkedIn profile - / rajkkapadia
You can share your views on this video in the comment section.
If you like my work, subscribe to my channel for more new videos.
Enjoy the life, Feel the music.
Peace.
Пікірлер: 45
Great video and Github repo. My question was "what format do I need to put my data in?" and you answered something that I couldn't find anywhere in the HuggingFace docs- simply, it depends on the model you're fine tuning. So thank you for making a video that is still one of the most clear resources a year later!
Wow, this is exactly what I was hoping to find to get started with transformers! I noticed that the other tutorials didn't include the folder structure of the code, but yours does. Thank you so much for sharing!
Been looking for a video to get started with transformers. Thank you very much for this...
The information provided in this app is very useful for today's generation, very hard work and attempts done by the maker,kudos for such type informative matters.
Very interesting and knowledgeable materials,best efforts.
very helpful video. If anyone else has problems with the torch sigmoid method in the get_prediction function (getting an error saying it requires two positional arguments), just create a static sigmoid method but apply autograd.Variable(method-input-variable) on the input to that method
V informative ,thanks
Sir in my case I've multiple columns/attributes like id, timing, name, and class etc the thing I've to do is to classify the reviews as fake or real. How can I use BERT in this case? For the Pre-processing what should i do?
@rajkkapadia
27 күн бұрын
Use only those columns that are useful...
ran into this problem: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) on this line: get_prediction('I am not happy to see you.')
@rajkkapadia
Жыл бұрын
I will check and update the code if needed...
"RuntimeError: Placeholder storage has not been allocated on MPS device! " How about this error in the last?
@rajkkapadia
8 ай бұрын
Check if you have a GPU device, sometimes it fails to run on CPU...
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! I am facing this error in the get_prediction
@rajkkapadia
Жыл бұрын
Hi, I am not sure about the error, you can start printing each variable one by one to get the actual error...
Amazing video! ... Thank you very much for sharing your knowledge
In get_prediction function, you used "trainer". When I try to use my model in another ipynb file, I get error, because trainer is not initialized in new ipynb file. What is the solution of this problem? To use my model, do i have to do these processes every time?
@rajkkapadia
Жыл бұрын
If you please watch the video till the end, I have shown this as well...
@sadkchris9785
Жыл бұрын
@@rajkkapadia I have watched, still can't use get_prediction function in other ipynb file.
@sadkchris9785
Жыл бұрын
@@rajkkapadia Can I use the function without trainer row?
@rajkkapadia
Жыл бұрын
@@sadkchris9785 you need to pass the path to the model in the ipynb file when you create a new instance of the model...
@sadkchris9785
Жыл бұрын
@@rajkkapadia I can reach my model from other ipynb file, i did everything in your video. My error is NameError: name 'trainer' is not defined. Because trainer was initialized in main ipynb file.
Why are you doing binary classification please do multi class and mukti label
@rajkkapadia
Жыл бұрын
Hi, if you watch carefully, I have shown a way to do that as well...
Can I used this model to detect cyber attacks ?
@rajkkapadia
Жыл бұрын
It is a text classification model...
I see that your model is just the default model for classification provided by the transformers library, AutoModelForSequenceClassification. Have you tried making a more complex model using Keras, for example: using a transformers layer as input followed by a number of hidden layers (RELU + dropout)? What are the situations in which such a model (more complex) should provide better results than the more basic one (default AutoModelForSequenceClassification)?
@rajkkapadia
Жыл бұрын
Hi, I have not tried that yet, but we can play around, there is one point though, Transformers are made using Pytorch, while you want to use Tensorflow, I am not sure that will gel up... But we can try this approach using Pytorch...
@WoWmastersonTuralyon
Жыл бұрын
@@rajkkapadia Thanks for the quick response, a PyTorch approach would be great as well! I am currently trying to solve the following task: classify emails into 6 classes. I want to use the email bodies (after carefully selecting the relevant parts of the body -- ignore links, automated messages, and so on) and the email subjects. How can you build a model that uses multiple inputs? I tried concatenating the strings into a single input, but I don't think this is the right approach, as they would lose their independency.
@rajkkapadia
Жыл бұрын
@@WoWmastersonTuralyon You can use different input layers for each input and then concatenate them, make sure the dimensions are right...
@Chuukwudi
Жыл бұрын
Hi @WoWmastersonTuralyon, I just thought of the same. I am trying BERT on a binary classification task. The solution provided here quickly overfits the data in less than 2 epochs. performance on evaluation data quickly becomes shit after 3 epochs. I think best approach would be to freeze the weights of BERT and add a few layers, with a bit of regularisation as needed. Have you found a way of doing this right now ?
@smitm.1342
Жыл бұрын
I have a headline text to body text matching and classification task. Not sure how to tokenise both columns, body text contains 4-5 lines. What could be the solution?
Hi, not able to install datasets module...
@rajkkapadia
Жыл бұрын
pypi.org/project/datasets/
@sayedabdulsamad1047
Жыл бұрын
@@rajkkapadia Thanks, I was able to install. But while training the model with three labels I faced this problem - ValueError: Target size (torch.Size([8])) must be the same as input size (torch.Size([8, 3]))
@rajkkapadia
Жыл бұрын
@@sayedabdulsamad1047 I am not sure, make sure the size of the target, and feature is as required by the model...
@sayedabdulsamad1047
Жыл бұрын
@@rajkkapadia yeah looking for some info on that only. I tried both one hot encoded and approach and normal one
@rajkkapadia
Жыл бұрын
@@sayedabdulsamad1047 Are you trying multi class classification... Then you should look for that on Hugging Face...