HunYuan DiT 1.0 - Open Source & Better Than Stable Diffusion 3?

Ғылым және технология

Overall, in HunYuan-DiT's tests, it scores 59.0% vs 56.7% for Stable Diffusion 3, making it objectively better. More than 50 professional evaluators performed the evaluations, so it must be true... or is it? For you to be able to judge for yourself, I pit HunYuan-DiT against the weaker (but freely available) SDXL in a battle for AI supremacy!
Want to support the channel?
/ nerdyrodent
/ hunyuan-dit-than-10582...
GitHub Repo:
github.com/Tencent/HunyuanDiT
== More Nerdy Stuff! ==
* Installing Anaconda for MS Windows Beginners - • Anaconda - Python Inst...
* Installing ComfyUI for Beginners - • How to Install ComfyUI...
* ComfyUI Workflows for Beginners - • ComfyUI Workflow Creat...
* Faster Stable Diffusions with Hyper SDXL - • Hyper-SD - Better than...
* Make A Consistent Character in ANY pose - • Reposer = Consistent S...
* Make an Animated, Talking Avatar - • Create your own animat...

Пікірлер: 79

  • @ritpop
    @ritpopАй бұрын

    Using the negatives in Chinese isn't a problem for the SD xl version? Isn't that unfair?

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    I suggest giving it a go and running your own tests too! You’d be surprised how many other languages do things… 😉

  • @MrSporf
    @MrSporfАй бұрын

    Oh nice! We need a greater variety of models like this one. I also managed to get it running in just 6GB VRAM too - you don't need 11 now

  • @hakuhyo174
    @hakuhyo174Ай бұрын

    Not convinced but it’s nice to have more options given recent fiasco at Stability. The game changer would be some adapter allowing new base model to use existing SD ecosystem, eg Lora, etc.

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    Fine tuning & loras are next 😊

  • @swannschilling474
    @swannschilling474Ай бұрын

    Thanks Nerdy! 😊

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    No problem 😊

  • @timothywcrane
    @timothywcraneАй бұрын

    You said that the HunYuan couldn't do text, but it did do Hanzi (I am not sure of the correctness - would have to EN-CN OCR). Glad to see this model. Dual or more locale models coming out of China in many modals have been skyrocketing in qual and quant lately.

  • @TheDoomerBlox

    @TheDoomerBlox

    Ай бұрын

    It does do text, and it does perform better when the English prompt is translated to Chinese by a high quality translation model first. Of course, it doesn't do ENGLISH text very well, but who's counting?

  • @TamalPlays
    @TamalPlaysАй бұрын

    thanks for making video on this

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    It's my pleasure

  • @CharlesLijt
    @CharlesLijtАй бұрын

    China No.1, thanks Rodent

  • @MilesBellas
    @MilesBellasАй бұрын

    Fascinating comparison.

  • @drawmaster77
    @drawmaster77Ай бұрын

    that outro song is fire 🤣

  • @styrke9272
    @styrke9272Ай бұрын

    Thanks for the video

  • @mithrillis
    @mithrillis9 күн бұрын

    These new models do look pretty nice! I do find it odd that we end up dedicating more and more GBs to text understanding and seemingly smaller and smaller fraction of GBs to actual images... I wonder if this impacts how much we would be able to finetune these models. I also would not mind seeing some models going the other direction, terrible natural language understanding but cramped with image data, like a PonyXXL...

  • @lex_darlog_fun
    @lex_darlog_funАй бұрын

    Dear Nerdy Rodent, could you please publish a link to your udio-generated outro? I'd like to extend it to a full track.

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    I may do a short one day 😉

  • @lex_darlog_fun

    @lex_darlog_fun

    Ай бұрын

    @@NerdyRodent Hope this day comes sooner than later. I was hooked by the minimalistic yet vibing melody since your udio review.

  • @soloones7141
    @soloones7141Ай бұрын

    When I try to queue prompt, it gives me this error "DiffusersPipelineLoader: - Value not in list: pipeline_folder_name: 'ckpts' not in [ ]". Any idea why, Mr. Nerdy Rodent?

  • @pon1

    @pon1

    27 күн бұрын

    Don't know for sure, when I had a similar problem it was because I had my models at a different directory than the regular directory (I set the path of the models to where I have the A111 models), maybe it's expecting something else there or is expecting the ComfyUI default models path. I ended up installing ComfyUI at a different drive to use with models like SD3 (which is the model that I had a similar problem with).

  • @DeathMasterofhell15
    @DeathMasterofhell1528 күн бұрын

    problem mate my loader does not have vae or clip or model but instead "pipeline, auto encoder, scheduler ? how to change it

  • @NerdyRodent

    @NerdyRodent

    28 күн бұрын

    The HunYuan pipeline loader is detailed at github.com/Tencent/HunyuanDiT/tree/main/comfyui-hydit#hunyuan-pipeline-loader

  • @ElevatedKitten-sr6yi
    @ElevatedKitten-sr6yiАй бұрын

    If you look closely, isn't every iteration of SD actually not true open source? Open weighs, yes - but not open source since the training data, scripts, and exact methodologies are not published. I think for this reason, SAI always carefully refers to 'open release' and similar things regarding SD3, to avoid the term 'open source' as much as possible.

  • @TheDoomerBlox

    @TheDoomerBlox

    25 күн бұрын

    The amount of webscraped Horrendous Atrocities in Picture Form present in SD1.5 would be quite a pill to swallow. :- ) and SD3 still responds to webscraped metadata tags, so it's still there

  • @zappazack
    @zappazack22 күн бұрын

    Hunyuan seems not to work on windows as a build wheel error occurs

  • @sandy66555
    @sandy66555Ай бұрын

    Lots of handsome rodents, and a bunny? What happened to my badger? You promised (you didn't but I took it that way) 😜

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    The badger is currently chilling 🦡

  • @TobyDeshane
    @TobyDeshaneАй бұрын

    Should probably be using stock SDXL to completely avoid wondering if a problem is with SDXL itself or not. :P (Even if those models do generate often better imagery in general.) Additionally, while I don't know what your actual workflow is before editing the video, I'd probably want to generate 3-4 images for a prompt (or maybe cherry picking the best out of 6 or more?), since it's quite common for a generated image to miss some aspect of the prompt. I know this isn't a strictly scientific review, but if we're going to be pitting A vs B even in a fluff intro video like this, it's not really doing anyone favors for either one of them to be at the mercy of a random number generator. (Maybe you do this behind the scenes and just edit that bit out -- that's fine, but I can't tell that from the end result! :D) Anyway, cheers -- interesting seeing alternative models coming out.

  • @generalawareness101

    @generalawareness101

    Ай бұрын

    I am sure you are the type that wants it in an X/Y grid too. Am I right?

  • @suliao-lv6gj

    @suliao-lv6gj

    Ай бұрын

    Love your rigor and skepticism, but it's open source and you can totally test it once for yourself. I hope you keep that same rigor and skepticism when you see other videos and ask for proof yourself.

  • @CMak3r
    @CMak3r27 күн бұрын

    Actually this one might be better than SD3 that we got. I should invest in bigger GPU

  • @neonskimmer
    @neonskimmer29 күн бұрын

    That closing credits song reminds me a lot of a British band from the early 2000s called Fat Truckers. You'd like them I think.

  • @popecillo
    @popecilloАй бұрын

    Definitely, review the models you use, they showed that they are too biased, there are too many mixes of models that are very ruined in their understanding, the base model contains ample knowledge but it is lost the more they train it without a good balance, now the mixes Most models put everything inside and there are models that improve certain concepts but ruin others. In my tests using your prompts I didn't have problems with women's bias or terribly deformed hands.

  • @generalawareness101

    @generalawareness101

    Ай бұрын

    the SD1.5 plague has arrived to SDXL, and it plays hell on we who train loras for it. Might work on yours, or might not, or do weird stuff. Just too many mixes, of mixes that was trained on a mix type fiasco.

  • @rsunghun
    @rsunghunАй бұрын

    Nice

  • @leslieviljoen
    @leslieviljoen25 күн бұрын

    In what world is Dall-e better than Midjourney?

  • @KlimovArtem1
    @KlimovArtem1Ай бұрын

    Can it do any nsfw, or it's heavy censored?

  • @joechip4822

    @joechip4822

    Ай бұрын

    Whether it does NSFW ist the first question any reviewer should ask and answer. Considering this is Chinese, the answer will be obvious though, won't it?

  • @KlimovArtem1

    @KlimovArtem1

    Ай бұрын

    @@joechip4822 the level of censorship in any particular model is one of the most important aspects.

  • @lefourbe5596

    @lefourbe5596

    Ай бұрын

    It's a non problem. Sure it's an arm cut off the integrity but talented fine tunner are able to add that back seamlessly.

  • @suliao-lv6gj

    @suliao-lv6gj

    Ай бұрын

    While my answer may not be entirely accurate, it's subject to Chinese law and NSFW content is prohibited.

  • @joechip4822

    @joechip4822

    Ай бұрын

    @@suliao-lv6gj very probably true - but also a good reason not to use and support it in its original form

  • @cowlevelcrypto2346
    @cowlevelcrypto234616 күн бұрын

    Wait, I can't run ComfyUI in Linux?

  • @NerdyRodent

    @NerdyRodent

    16 күн бұрын

    Other way around - ComfyUI runs best in Linux 😉

  • @Mp3Pintyo
    @Mp3PintyoАй бұрын

    I also tested it in a long video, but it really didn't convince me. kzread.info/dash/bejne/Yp9opMeEftLTo9Y.html

  • @gimperita3035
    @gimperita303527 күн бұрын

    The song is a hit 😂

  • @NerdyRodent

    @NerdyRodent

    27 күн бұрын

    😎

  • @CMak3r
    @CMak3rАй бұрын

    According to steam hardware survey, about 1% has 4090 and 0.58% have 3090

  • @lefourbe5596

    @lefourbe5596

    Ай бұрын

    132 million active user on Steam. 1% of steam gamers have a 4090 that would be 1.23 million peoples. More than the view this video will every have. (I have a 3090 btw)

  • @MilitantHitchhiker
    @MilitantHitchhikerАй бұрын

    Better than the unreleased weights of SD3? Huh amazing that they can compare their model with a model that isn't finished training and can definitively tell theirs is better. What a time to be alive, technology is marvelous.

  • @Afr0man4peace
    @Afr0man4peaceАй бұрын

    It would be interesting to compare it with a actual good trained model like Colossus Project XL 10 for example.

  • @ArawnFR
    @ArawnFRАй бұрын

    Your maths are wrong, 2.3% better than sd 3 (56.7) is 58.0041, not 59

  • @v0idbyt3
    @v0idbyt329 күн бұрын

    time to put my rtx 3060 12gb + 64gb ddr4-2400 ubuntu pc to use

  • @dayantargaryen4129
    @dayantargaryen412927 күн бұрын

    it's much better

  • @TamalPlays
    @TamalPlaysАй бұрын

    sadly it is only 1024x1024 :(

  • @financialfreedomfighters369
    @financialfreedomfighters369Ай бұрын

    According to this chart, HunYuan is supposed to be almost as good as Midjourney in terms of aesthetics and Dalle-3 is supposed to be even better than Midjourney... The curator of this chart cleary needs glasses x-D

  • @Smirnoff67
    @Smirnoff67Ай бұрын

    who would have guessed, a model that doesnt obey to some self restriction obtain better result

  • @openroomxyz
    @openroomxyzАй бұрын

    Well can someone create safetensor versions of these ? xD

  • @generalawareness101

    @generalawareness101

    Ай бұрын

    do it yourself as the tools are out there in kohya with lycoris.

  • @nunuarthas8680
    @nunuarthas8680Ай бұрын

    sir it's pronounced Hung-Yu-Whan, hunyuan

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    Thank you! I’m not good with anything other than British pronunciations 😅

  • @li_tsz_fung

    @li_tsz_fung

    27 күн бұрын

    If that's supposed to be a Chinese word, @nerdyrodent's pronunciation is closer. But the name makes no sense to me anyway.

  • @jairuskersey8311
    @jairuskersey831123 күн бұрын

    Asking for a friend, can Hunyuan do NSFW stuff?

  • @JoernR
    @JoernR23 күн бұрын

    One major drawback - UncomfyUI. :/

  • @taucalm
    @taucalmАй бұрын

    SD3 aint released yet so how can he compare something that doesnt even exist yet?

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    I imagine they used the api 😉

  • @j5545

    @j5545

    Ай бұрын

    It doesn't exists? Lol

  • @TamalPlays

    @TamalPlays

    Ай бұрын

    sd3 api has been released

  • @taucalm

    @taucalm

    Ай бұрын

    @@j5545 "Stable Diffusion 3 (SD3) was announced by Stability AI as their most advanced text-to-image model to date. The early preview of SD3 was made available in February 2024, with the general release of a more accessible version, known as SD3 Medium, scheduled for June 12, 2024. This model aims to improve photorealism and prompt adherence, making it suitable for both consumer and business applications​ (Stability AI)​​ (THE DECODER)​​ (Decrypt)​." Where I live is now 9. of june. So it doesnt exist for me yet. You must come from future.

  • @denisquarte7177
    @denisquarte717714 күн бұрын

    nope, sd3 has better architecture but safety training as well as license are dogsh*. And you know these scores can be gamed. No way in hell dalle3 beat midjourney. I use all of them on a daily basis.

  • @generalawareness101
    @generalawareness101Ай бұрын

    this thing is too bloated and my my comfy is set up to work in bfloat while this is set up for float so mat1 and mat2 errors. When I go to 32bit this model is slow. I mean my 6,5it/s is now 2s/it. Not very good quality either.

  • @SimosFunk
    @SimosFunkАй бұрын

  • @NerdyRodent

    @NerdyRodent

    Ай бұрын

    😃

Келесі