Struggling with v50

#100
by jire94ur - opened

Thank you so much for all of your hard work! I'm super excited to finally see the model reach v50. Each version seemed to only improve the outputs I've been getting, but my tests across the board on v50 have all been worse than v48. The results are no longer following the prompt as well, are blurrier, and struggle to recognize both people and fictional characters.

For some context, I stick to q8 quants due to my relatively low vram, but otherwise keep the model at 30 steps with a cfg of 4, using euler+beta. Here's an example from a pretty simple prompt with the same seed.
(Positive: A candid photograph. Subject is Goodra the Pokemon. Its body is wet. It is at the beach. Warm midday exterior light bathes the scene. A candid photograph taken in the spur of the moment.)
(Negative: Cartoon. Drawing. Painting. Illustration. Anime. Digital Art.)

v481a.jpgv502a.png

No matter what I tweak, v50 can't properly recognize the character in the prompt, spitting out a best guess instead. Sometimes it gets the colors right, but it flubs other details in the process, and struggles to produce a non-static pose. I haven't tested v49 yet, so am uncertain if it also has had a similar degradation. I'd be interested in hearing if anybody else has been having similar issues as well.

Your prompt works fine for me using CFG 4.0+30 steps. res_2m + bong tangent, min padding 1. I'm using fp16 though.

I'm looking for clarity as well. v48 seems to outperform v50 in all my tests

Your prompt works fine for me using CFG 4.0+30 steps. res_2m + bong tangent, min padding 1. I'm using fp16 though.

I actually gave fp16 a shot, but unfortunately did not have any luck with it either.
1b.jpg
I'm curious to know what the culprit could be here. Maybe the text encoder?

EDIT: Also just ran this seed through v49, and got Pikachu instead. It looks like whatever is going on isn't isolated to just v50.
2b.jpg

This might be a dumb comment but... Why would it know what a pokemon is specifically if it wasn't using a lora for it? Especially not without tags? Like, I can't imagine they trained this on a lot of specific pokemon? Or was this just something that Schnell knew?

Your prompt works fine for me using CFG 4.0+30 steps. res_2m + bong tangent, min padding 1. I'm using fp16 though.

I actually gave fp16 a shot, but unfortunately did not have any luck with it either.
1b.jpg
I'm curious to know what the culprit could be here. Maybe the text encoder?

EDIT: Also just ran this seed through v49, and got Pikachu instead. It looks like whatever is going on isn't isolated to just v50.
2b.jpg

ComfyUI_00559_.png

What does your workflow look like? I'm generating at 832 x 1216. Standard workflow with T5 options setting min padding at 1. Plus the res/bong sampler/scheduler. Drag my pic into comfy. Chroma HD is same file as v50 so you can change that.

This might be a dumb comment but... Why would it know what a pokemon is specifically if it wasn't using a lora for it? Especially not without tags? Like, I can't imagine they trained this on a lot of specific pokemon? Or was this just something that Schnell knew?

I'm guessing Chroma's dataset just had a lot of Goodra. It's a popular one for artists to draw. For example, I haven't been able to get Chroma to generate Onix.

Your prompt works fine for me using CFG 4.0+30 steps. res_2m + bong tangent, min padding 1. I'm using fp16 though.

I actually gave fp16 a shot, but unfortunately did not have any luck with it either.
I'm curious to know what the culprit could be here. Maybe the text encoder?

EDIT: Also just ran this seed through v49, and got Pikachu instead. It looks like whatever is going on isn't isolated to just v50.

What does your workflow look like? I'm generating at 832 x 1216. Standard workflow with T5 options setting min padding at 1. Plus the res/bong sampler/scheduler. Drag my pic into comfy. Chroma HD is same file as v50 so you can change that.

Thank you for the example, that looks much better! Not really sure what I'm doing wrong in comparison. I've just been using the stock workflow for my v50 testing. I generally render at 768x768, and keep it on euler+beta since bong tangent is significantly slower on my system. Could it be the genr_t5_xxl I'm using?

@jire94ur I have not done any testing myself so far, so I am just spouting this randomly, but have you tried the v50-annealed version? Would it help?

From what I heard on reddit the annealed version should be the one used for image generation, while the non-annealed version is for finetuning.

@jire94ur I have not done any testing myself so far, so I am just spouting this randomly, but have you tried the v50-annealed version? Would it help?

From what I heard on reddit the annealed version should be the one used for image generation, while the non-annealed version is for finetuning.

I did, and didn't see any observable difference. Appreciate the idea though.

Did you try another T5? I can't think of what else it could be. I've never heard of genr_t5 until now.

Did you try another T5? I can't think of what else it could be. I've never heard of genr_t5 until now.

Thanks for the reply! I was experimenting with the standard t5 earlier today, but I just can't get an output matching yours with v50. This banana was the most on model result I got.
c1.jpg

Sign up or log in to comment