Sultan Al Awar - Generating Customers Insights with Topic Modelling and HuggingFace SetFit Method

Ғылым және технология

PyData
Website: www.pydata.org
LinkedIn: / pydata-global
Twitter: / pydata
Stop data skimming and dive deep into your customer voices! Are you working with a load of unstructured reviews and you would like to gain an understanding on what customers are commenting about? This hands-on tutorial equips you with powerful text analysis techniques to unlock hidden insights and inform data-driven decisions. Whether you're an experienced data scientist or analyst or just starting out, this session will guide you through two text classification approaches:
1) Classic Topic Modelling: Uncover recurring themes and trends within customer comments using generative probabilistic modelling approach like LDA (Latent Dirichlet Allocation).
2) SetFit Few-Shot Learning: Fine-tune a HuggingFace (HF) sentence transformers model with minimal data to automatically categorise and label reviews, offering deeper insights into key strengths as well as opportunities for improvement.
Upon completing the tutorial, you will be equipped with hands-on experience gained through the utilisation of a Google Colab notebook provided beforehand which enable you to effectively apply the tutorial's knowledge and achieve the following outcomes:
- Apply topic modelling with necessary text pre-processing and feature engineering techniques to discover underlying topics in a collection of text.
- Fine-tune a HF transformer on a small labeled dataset using set-fit few-shot learning method
- Evaluate the performance of the fine-tuned transformers model
- Use the fine-tuned model to generate classification themes on unlabelled data
- Develop a baseline evaluation mechanism to monitor the model in production
Please follow these steps to prepare for the tutorial:
1) Set up Google Colab.
2) Download the data and notebooks folders from this repository: rb.gy/ovru2m.
This will allow you to run the notebooks and follow along with the tutorial using Google Colab!
Ready to transform your understanding of multi text classification on customers data? Join me and unleash its power!
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our KZread videos to help with discoverability? Find out more here: github.com/numfocus/KZreadVi...

Пікірлер: 1

  • @FeverBonus
    @FeverBonus10 күн бұрын

    What about LLMs

Келесі