Evaluate LLMs with Language Model Evaluation Harness

Ғылым және технология

In this tutorial, I delve into the intricacies of evaluating large language models (LLMs) using the versatile Evaluation Harness tool. Explore how to rigorously test LLMs across diverse datasets and benchmarks, including HellaSWAG, TruthfulQA, Winogrande, and more. This video features the LLaMA 3 model by Meta AI and demonstrates step-by-step how to conduct evaluations directly in a Colab notebook, offering practical insights into AI model assessment.
Don't forget to like, comment, and subscribe for more insights into the world of AI!
GitHub Repo: github.com/AIAnytime/Eval-LLMs
Join this channel to get access to perks:
/ @aianytime
To further support the channel, you can contribute via the following methods:
Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
UPI: sonu1000raw@ybl
#openai #llm #ai

Пікірлер: 12

@TheIITianExplorer2 ай бұрын
I love you man, ❤ You are awesome, keep uploading 😊
@Techonsapevole2 ай бұрын
Thanks, great LLM tips
@vishnubhatlaprasanth524011 күн бұрын
if i want to evaluate the LLM using custom data set is that possible using the GIT repo that you have provided here?
@joserfjunior89402 ай бұрын
I LIKE THIS... nice job man !
@bdoriandasilvaАй бұрын
nice! thank you for the video!
@muhammedajmalg64262 ай бұрын
nice work
@sagarbhaskar868827 күн бұрын
do we have to add any dataset?
@A7_-22 күн бұрын
Can i do it on llava model
@krishnapriya98812 ай бұрын
PackageNotFoundError: No package metadata was found for bitsandbytes. I am getting this error even though bitsandbytes is installed and my cuda version is 12.1, please help me with this
@saumyajaiswal65852 ай бұрын
What about langsmith?It does the same thing right?
@abhijoy.sarkarАй бұрын
How to do it on whole mmlu?
@araara21422 ай бұрын
I need rag chatbot part 2 video, please release, my exam is coming

Evaluate LLMs with Language Model Evaluation Harness

Ғылым және технология

Пікірлер: 12

Келесі