ZERO Cost AI Agents: Are ELMs ready for your prompts? (Llama3, Ollama, Promptfoo, BUN)

Пікірлер: 31

@kenchang34563 ай бұрын
And now I know why I subscribed with alerts on.
@thunken3 ай бұрын
would be cool if you had finger puppets :)
@drlordbasil
3 ай бұрын
second.
@miikalewandowski7765
3 ай бұрын
😂😂
@indydevdan
3 ай бұрын
lemao
@alew3tube3 ай бұрын
I would add to your list: tool/function calling as fundamental for a LLM
@indydevdan
3 ай бұрын
Yeah great call out. Definitely a fundamental requirement for LLMs. Especially for agentic workflows.
@bogdantanasa137419 күн бұрын
Thank you very much for sharing this, I've hit a few bumps in the road to make it work but managed it so thank you for the details. I find interesting that when asked to chose from a list of options, models sometimes choose sentence case instead of the exact lower case as instructed - i'd think no biggie as some people would anyway chose when just answering Yes or No :) In more advanced tests I found that the answers were ACCEPTABLE (though asserts would not be easy to describe :( ). Besides obviously checking myself, I checked a second opinion from a more expensive and advanced model and some of those answers were found to be acceptable because the reasoning makes sense - an example with the SQL NQL test. It would be interesting promptfoo would include a 'get a second opinion' with a different agent; after all, we're trying to automate everything so why not the test evaluations themselves?:)
@AGI-Bingo3 ай бұрын
I currently enjoy the groq free era, and i dont mind using it for development, but as production goes, i wouldnt want my or others private data going to any corp, so going local is definitely on the way to go
@wellbishop3 ай бұрын
Pretty smart guy you are. Tks for sharing your divinity with us, poor mortals.
@reagansenoron67633 ай бұрын
Hi Dan, thanks a lot for sharing your knowledge. With around 700K open-source LLMs around, its really hard to pick a decent one. Usually we sort it by most downloaded or most likes but its not enough. This benchmarking will really help. BTW, I followed the readme and running bun elm throws 'error: Script not found "elm"
@acllhes3 ай бұрын
Good stuff
@larsfaye2923 ай бұрын
In my opinion, the LPU (such as what Groq is developing) is going to be built into future PCs, dedicated for the sole task of running local models.
@indydevdan
3 ай бұрын
I'm betting on apple doing this with the M4 / M5 devices. Not the LPU exactly but the 'apple LPU' equivalent.
@fontenbleau
2 ай бұрын
any models needs huge amounts of memory, chip even don't matter because reflect just speed, but without hundreds Gbs in SSD and RAM it won't even starts. In all Apple history any memory play the smallest role, again they bring all attention on CPUs but not important and really expensive part, Apple stuck in their doctrine, it's hopeless.
@mylesholloway92233 ай бұрын
Left a comment under the Git Hub repo, may have forgotten to include the package.json. When running "bun i" I get an error because there are no dependencies in a package.json
@kterry697
3 ай бұрын
Yes... forgotten to include the package.json. When running "bun i" I get an error because there are no dependencies in a package.json
@indydevdan
2 ай бұрын
wow massive noob mistake. fixed. thank you.
@EternalKernel3 ай бұрын
you should test claude3 hiaku
@6lack5ushi3 ай бұрын
I love your videos and posts, but even with ELM's the biggest Issue I find. unless it's a nasty bug inherent to my system. INSTRUCTION FOLLOWING! I would rather have the legacy GPT-4 than any 4-turbo model. because it follows commands WAY BETTER! I have a terrible feeling MMLU and other benchmarks are hiding the fact models may get more capable but less reliable. or "lazy" I thought it was bloated initial prompts (and human moderating and creating illogical gaps where it just omits) but I think its more sinister. We are optimising for the benchmarks but do not bench mark instruction following in said benchmarks!
@robertmazurowski5974
3 ай бұрын
This is not a psychological phenomen, I used gpt 4 since the beginning and I could see when they were dumbing down maybe quantizing their model. Literally something has broken all my automations last weekend. I changed to the new GPT4 turbo model which is supposed to be better than the previous ones according to benchmarks. Unfortunately it sucks. It cannot catch instructions like the previous one used to.
@6lack5ushi
3 ай бұрын
@@robertmazurowski5974 Same issue, I use "GPT-4" endpoint that points to GPT-4-0613* (I think). currently the best but also before the super massive context. but 100% same thing happened to me. Try Lamma 3 I had some success but nowhere near legacy GPT- 4
@BangaloreYoutube
3 ай бұрын
Now I'm sad my workflows didn't breakare they not complex enough 😅 I switched a few to ollama through grok nothing seems broken yet!!
@robertmazurowski5974
3 ай бұрын
@@BangaloreKZread In my case it worked before swapping to the new GPT Turbo model. They model doesn't catch instructions properly. Before last weekend GPT 4 turbo was able to process 3-4 function calls based on a prompt, and then answer with another 3-4 functions calls if needed. It cannot do it any more.
@indydevdan
3 ай бұрын
"models may get more capable but less reliable" - this is a great call out and observation. I agree with you, instruction following is ULTRA important especially as models improve. If they can't follow your instructions, the capabilities they have are essentially useless. Another interesting finding with MMLU and other benchmarks is that model providers have started TRAINING ON THE BENCHMARKs which is you've trained a model before, you know is a HUGE problem (model contamination). Both of these call outs highlight the importance of what we discuss in this video: having your own domain specific prompt tests to validate the 'true' value of the LLM for your use cases.
@fraugdib3834Ай бұрын
Effin' Righteous man... Have a metric --> Use it often --> Know exactly where you stand in reference to an ever expanding whirlwind of clickbait and noise.
@fontenbleau2 ай бұрын
Apple models not impressed me at all, maybe they deliberately published only the smallest and useless ones. Normal quality LLM like llama 3 70 billions in best quality 8bit GGUF need 90GB of RAM just to start. All these hardware makers can't provide such, everyone showing the powerful CPUs which will be wasted in that laptops as Microsoft required just 16Gb of RAM. Using SSD here impossible, wearing out fast. 128GB of DDR4 costs me exactly $400 which is half of decent GPU or all these fancy laptops.
@xyster73 ай бұрын
just remove that echo, invest in better mike and you will be better than other this type channels