Samplers in LTX-2.x workflows! 😵‍💫

by raffetazarius - opened Mar 7

Mar 7

@RuneXX , THANK YOU for your workflows! They're super clear! I hope it's OK to ask a question here to help me on my way to becoming an LTX magician!

I noticed in your workflows, you use the "lcm" sampler in the KSamplerSelect nodes for both 1st and 2nd pass. Why?

In the ComfyUI templates for LTX-2.3 (available through the Templates link in the ComfyUI left menu), they use "euler" for the 1st pass, and "gradient_estimation" for the 2nd.

But then in the Lightricks GitHub example workflows for LTX-2.3 (https://github.com/Lightricks/ComfyUI-LTXVideo?tab=readme-ov-file#example-workflows):

in the Two Stage workflow, they use "euler_ancestral_cfg_pp" in the 1st stage and "euler_cfg_pp" in the 2nd.
in the Single Stage workflow, they use a "ClownSampler" node instead of KSampler, with the "exponential/res_2s" sampler. (res_2s was what they recommended [visibly with notes!] in their LTX-2.0 workflows).

Is there any logic behind y'alls choice of samplers? Or is it a "we don't really know which one works best, so figure it out yourself!" situation? Or does it depend on whether we're using the Full or Distilled model? It's quite confusing 😵‍💫. Wouldn't the developers have optimized it to work best with ONE sampler? (Not assuming you work for Lightricks or anything.... but you may have ideas)

Thank you in advance for any knowledge you can impart! 🙏

RuneXX

Owner Mar 7

•

edited Mar 7

Sampler you can try others too. LCM or Euler both seems to work well.
In the official workflow for LTX2.0 it used to be Euler, and Res_2s in the 2nd pass upscale, and in ComfyUI official its Euler and gradient_estimation in the 2nd pass upscale (if i remember correctly).

For LTX-2.3 they seem to have tried yet another different samplers it looks like. I havent tried those.
The model just came out, so I might change, and free to try and experiment with different samplers yourself too ;-) It gives slightly different results.

Personally i feel LCM or Euler gives good result and good speed (at least for LTX-2.0)
I wouldn't get too caught up in the sampler part, many of them are extremely similar, that its hard to even tell any difference ;-)

Res_2s is a good sampler that might give better quality, but it can be quite slow.
(why they used ClownSampler instead of KSampler i dont really know, its also not a big difference. But maybe since that sampler adds the Res_2s, that is not part of regular ComfyUI if i remember right )

But most of all, image and video models work with many different samplers, so it can for sure be a bit up to personal preference ;-)
And it wouldnt hurt to set them same as LTX-2 did for sure. I will try those too as well, see if its worth it vs the speed (if res_2s). Euler/ Euler ancestral cfg should be quite fast though.
I actually use euler_ancestral often myself too.

raffetazarius

Mar 7

•

edited Mar 7

Thanks so much for sharing your knowledge! :)

The res_2s sampler - I think - requires (or did once) the RES4LYF node pack, so I understand they may not have wanted to use a sampler that was going to give many users "missing node" errors.

The results for me, changing samplers, are quite notable, from usable (lcm) to not usable (res_2s with LTX_2.3, oddly, since this was the optimal sampler for me with LTX-2.0... very similar workflow, using the Q8 GGUFs with all of Kijai's auxiliary safetensors [VAE, Text Encoder, ]).

I'll keep experimenting.

Gemini had this to say, interestingly....

To get cleaner, more professional-looking video out of LTX-2.3, it is necessary to understand how the architecture of the model interacts with different samplers, and why using res_2s twice caused an over-baked result.

The Mechanics of Samplers and Rectified Flow

LTX-2.3, like Flux and Stable Diffusion 3, uses a Rectified Flow (RF) architecture. Unlike older latent diffusion models that step through complex noise schedules, RF models are trained to map a straight line from pure noise to the clean image.

1st-Order Solvers (euler, lcm)
These samplers calculate the trajectory at the current step and draw a straight line to the next step.

euler: The native, mathematically "correct" solver for Rectified Flow models. It provides the cleanest, most structurally sound base without adding artificial contrast.
lcm: Latent Consistency Models are designed to skip multiple steps at once. It naturally smooths out the generation. This is why your lcm passes looked good and avoided the "over-baked" look—it inherently favors smooth, continuous gradients over sharp, high-frequency details. The trade-off is that lcm can sometimes look slightly soft or "plasticky" and lack fine texture.

2nd-Order Solvers (res_2s, dpmpp_2m, heun)
These samplers calculate the trajectory, look ahead to the next step, and then apply a correction to curve the path.

res_2s: This is a highly aggressive 2nd-order solver specifically tuned to pull intense detail out of RF models.
The Over-Baking Issue: 2nd-order solvers inject high-frequency detail and micro-contrast. When you run res_2s on the first pass, it creates a very sharp, contrast-heavy base latent. When you pass that latent into the upscale node and run res_2s again on the second pass, the sampler compounds that micro-contrast. The model tries to add sharp detail on top of already sharp detail, resulting in a "deep-fried," over-saturated, or over-baked visual output.

Ancestral / SDE Solvers (euler_ancestral, dpmpp_2m_sde)
These inject a small amount of new noise at every step to prevent the image from becoming too smooth. While great for still images, SDE (Stochastic Differential Equation) and Ancestral samplers should generally be avoided for video generation. The injected noise is random at each step, which translates to severe temporal flickering and boiling textures across the video frames.

Recommended Sampler Configuration

For a professional, cinematic output, you want to separate the responsibilities of the two passes. Pass 1 should build clean geometry and motion. Pass 2 should add texture and crispness.

Pass 1 (Base Generation)

**Set to: euler**
Why: You need perfect temporal consistency and clean geometry without aggressive contrast. euler will follow the LTX-2.3 flow trajectory exactly as trained. It will give you a solid, stable base video with no artifacting.

Pass 2 (Latent Upsampler / Refinement)

**Set to: euler OR res_2s**
Why: When the latent is upscaled, it becomes slightly soft. The second pass only runs over a fraction of the total sigmas (denoising strength) to add high-resolution details back in.
If you want a highly cinematic, smooth, and natural look (like film), use euler again.
If you want an incredibly sharp, highly detailed look (like digital 4K video or hyper-realism), use res_2s or dpmpp_2m. Because the first pass was handled by euler, the base is clean enough that the 2nd-order solver will only enhance the fine textures (like skin pores or fabric) without over-baking the global contrast.

If you use a 2nd-order solver on the second pass and it still feels slightly too harsh, you need to lower the starting sigma (denoise strength) of the second pass, rather than changing the sampler.

RuneXX

Owner Mar 7

•

edited Mar 7

Yeah that sounds about right ;-)
My "favorite" is often euler at 1st pass, and euler ancestral at 2nd pass (or just LCM both)

I havent tried the euler_cfg ones they now use, but i suspect they are not very different

Gemini basically said euler at both is fine ;-) or res_2s. But i have to agree with Gemini, that sometimes res_2s just over-bakes the image. Specially skin texture. It might sometimes give more details, but as a general-purpose safe choice Euler is often the go to I think ;-) Rarely can you go wrong with Euler. But sometimes you can enhance the details a bit with other samplers

RuneXX

Owner Mar 7

in the Two Stage workflow, they use "euler_ancestral_cfg_pp" in the 1st stage and "euler_cfg_pp" in the 2nd.

Those works quite well. Will compare with Euler /LCM, seems similar, but got nice results from a couple of random runs

raffetazarius

Mar 7

Looks like I'm not alone in my confusion about this 😂...specifically regarding why the new LTX Desktop app (which I'm about to try out today for the first time) gives "better" results than ComfyUI with the same LTX-2.3 model and settings.

I'm just pasting here because this GitHub comment thread has pertinent info for this discussion.

https://github.com/Lightricks/ComfyUI-LTXVideo/issues/424

RuneXX

Owner Mar 7

Yes i saw some claim that, while other claim its exactly same ;-)
If you run model with good resolution and all in comfy i doubt its much different. But planning to test it out later ;-)

raffetazarius

Mar 7

•

edited Mar 7

Did some experimentation with a video of a human subject (solid test of anatomy) in a rural landscape setting.

RuneXX I2V Basic GGUF workflow JSON. I disabled previews completely because I was getting OOM using the Q8 model (on a 5090!), and I don't need them. NAG is left on. Might be worth making previews a switch in the workflow, @RuneXX , for us plebs without >32GB VRAM😉 ?
GGUF Q8_0 Dev model, NOT distilled.
Distilled Lora left at 0.6, IC Lora Details (LXT 2.0) at 0.5 in the Power Lora Loader.
gemma-3-12b-abliterated text encoder.
everything else using the latest Kijai safetensors files for LTX-2.3.
KJNodes and ComfyUI-LTXVideo node packs updated to latest available versions.
ComfyUI v0.16.3 running in ComfyUI_desktop v.0.8.16
1920 x 1088 resolution, 10 seconds, 25fps, CFG 1.0 at both steps, 30 steps in 1st pass.

Using res_2s in the 1st pass with LTX-2.3 invariably resulted in slow-motion video for me.... even though "slow motion" is in my negative prompt, and I don't get it using other samplers, so.... go figure.

(1st stage + 2nd stage, if 2 samplers mentioned with a "+")

lcm both stages = GREAT starting point, but may have a slight "soft-focus" over-smoothened look.
res_2s both stages = BAD, over-cooked, hallucinates too much detail, including random stuff not in prompt. slow motion.
res_2s + lcm = LOOKS GREAT, but slow motion.
res_2s + euler = good, but skin has too much detail, wrinkles where there shouldn't be. slow motion.
euler + res_2s = POOR, over-cooked.
euler + euler = GOOD, but a little over-cooked.
euler+ dpmpp_2m = OK, more over-cooked than #6.
euler_a_cfg_pp + euler_cfg_pp = EXCELLENT. Probably the best. And whaddayaknow.... what LTX use in their example workflows!
euler_a_cfg_pp + lcm = EXCELLENT, just a slightly smoother result than #8.
euler + lcm = GOOD, but looks very smooth (even CGI), and the motion is not quite as realistic as 9, 8, 3, or 1.
res_2s + euler_cfg_pp = GOOD, but slow motion and skin looks too wrinkled.

no.3 (res_2s + lcm) has the most realistic detail to me, but the video is ALWAYS in slow motion :( You get extra detail out of res_2s in the 1st pass but its tempered by the 2nd smoothening lcm pass.
no.8 (euler_a_cfg_pp + euler_cfg_pp) is probably the next best looking result, and seems like LTX's recommendation.
no.9 (euler_a_cfg_pp + lcm) is next best, a smoother less detailed version of #8.

Interested to hear other folks' results and opinions though.

RuneXX

Owner Mar 7

•

edited Mar 7

Oh wow, thats a comprehensive and interesting test.
Will definitely try those myself as well.

Might even copy some over to an info box at the workflow to help guide the users on what can work and not work ;-)

And if euler_a_cfg_pp + euler_cfg_pp is the best combo i'll update to that, its something new LTX-2.3 team did so makes sense that it might be the best one ;-)
(and i did try this combo already, and quite nice results, agree...)

RuneXX

Owner Mar 7

•

edited Mar 7

RuneXX I2V Basic GGUF workflow JSON. I disabled previews completely because I was getting OOM using the Q8 model (on a 5090!), and I don't need them. NAG is left on. Might be worth making previews a switch in the workflow, @RuneXX , for us plebs without >32GB VRAM😉 ?

Thats odd. I'm on a rusty old RTX 3090 myself, not the latest and greatest GPU at all ;-)
Will check the GGUF workflow, must be something else. (although i didnt try a full Q8, so maybe that where things get so memory hungry, that disabling the previews is a plus)

And do make sure you use the tiny vae for previews.
(but yes, they are not essential, and optional)

Veritsa

Mar 7

Did some experimentation with a video of a human subject (solid test of anatomy) in a rural landscape setting.

RuneXX I2V Basic GGUF workflow JSON. I disabled previews completely because I was getting OOM using the Q8 model (on a 5090!), and I don't need them. NAG is left on. Might be worth making previews a switch in the workflow, @RuneXX , for us plebs without >32GB VRAM😉 ?

GGUF Q8_0 Dev model, NOT distilled.

Distilled Lora left at 0.6, IC Lora Details (LXT 2.0) at 0.5 in the Power Lora Loader.

gemma-3-12b-abliterated text encoder.

everything else using the latest Kijai safetensors files for LTX-2.3.

KJNodes and ComfyUI-LTXVideo node packs updated to latest available versions.

ComfyUI v0.16.3 running in ComfyUI_desktop v.0.8.16

1920 x 1088 resolution, 10 seconds, 25fps, CFG 1.0 at both steps, 30 steps in 1st pass.

Using res_2s in the 1st pass with LTX-2.3 invariably resulted in slow-motion video for me.... even though "slow motion" is in my negative prompt, and I don't get it using other samplers, so.... go figure.

(1st stage + 2nd stage, if 2 samplers mentioned with a "+")

lcm both stages = GREAT starting point, but may have a slight "soft-focus" over-smoothened look.

res_2s both stages = BAD, over-cooked, hallucinates too much detail, including random stuff not in prompt. slow motion.

res_2s + lcm = LOOKS GREAT, but slow motion.

res_2s + euler = good, but skin has too much detail, wrinkles where there shouldn't be. slow motion.

euler + res_2s = POOR, over-cooked.

euler + euler = GOOD, but a little over-cooked.

euler+ dpmpp_2m = OK, more over-cooked than #6.

euler_a_cfg_pp + euler_cfg_pp = EXCELLENT. Probably the best. And whaddayaknow.... what LTX use in their example workflows!

euler_a_cfg_pp + lcm = EXCELLENT, just a slightly smoother result than #8.

euler + lcm = GOOD, but looks very smooth (even CGI), and the motion is not quite as realistic as 9, 8, 3, or 1.

res_2s + euler_cfg_pp = GOOD, but slow motion and skin looks too wrinkled.

no.3 (res_2s + lcm) has the most realistic detail to me, but the video is ALWAYS in slow motion :( You get extra detail out of res_2s in the 1st pass but its tempered by the 2nd smoothening lcm pass.
no.8 (euler_a_cfg_pp + euler_cfg_pp) is probably the next best looking result, and seems like LTX's recommendation.
no.9 (euler_a_cfg_pp + lcm) is next best, a smoother less detailed version of #8.

Interested to hear other folks' results and opinions though.

Are you using the two Loras on both stages ?

raffetazarius

Mar 8

•

edited Mar 8

@Veritsa - I'm using RuneXX's workflows, so whatever they have set up, I've not changed that, and haven't actually checked which stages the Loras get applied to. I just loaded the IC-detail Lora (from LTX-2.0) into the Power Lora Loader, and disabled Previews, and then tried different Samplers on each stage.

I will say that with the LTX-2.0 workflows, applying the IC-Detail Lora to the upscaling (2nd) stage always resulted in overcooked results for me, so I only ever applied it to Stage 1. Might be different with LTX-2.3 tho, not sure. I'm already getting much better detail from it even without the IC-Detail Lora though!

Have you experimented with applying the Loras at each/both stages? What were your findings?

RuneXX

Owner Mar 8

•

edited Mar 8

The lora is only used if not using distilled model.
In the workflows here, the lora is bypassed in all the workflows, using the distilled model as the default.

One can of course use the Dev model instead, and activate the distilled lora node right below the model loader.
Its then recommended/default to use it in both passes. This LTX and Comfy also does, with a step count of around 20 or so... and higher cfg in first pass.
(or you can optionally not use it in first pass with dev model, and use only in 2nd pass, but then i think you gotta bump up the steps to 50 or so)

and yes, in LTX-2.3 the outputs are much better. Dont think there is much need of a detail lora anymore
(and that lora was really mostly intended to restore videos, in v2v workflows where you input a low res original video and re-create it with higher quality)

raffetazarius

Mar 9

"One can of course use the Dev model instead, and activate the distilled lora node right below the model loader. Its then recommended/default to use it in both passes. This LTX and Comfy also does, with a step count of around 20 or so... and higher cfg in first pass."
In https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/2.3/LTX-2.3_T2V_I2V_Two_Stage_Distilled.json (which actually looks like a Two Stage Full workflow to me [they have the Full model pre-selected], right?), LTX are using CFG of 1.0 in both steps, and the Distilled Lora is applied at 0.5 to both steps also. In LTX-2.0 official workflows, the CFG was always higher in the 1st pass, but not with LTX-2.3, it seems. You (RuneXX) area also using CFG 1.0 in both steps in https://huggingface.co/RuneXX/LTX-2.3-Workflows/blob/main/LTX-2.3_-_I2V_T2V_Basic_GGUF.json.

LTX don't actually provide a TWO-stage workflow with the Distilled model pre-selected, so it's hard to know. They only provide Distilled & Full dual-path workflow https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/2.3/LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json which to me is an unusual custom use case to include a workflow for! In it they use 1 base model - the Dev one is pre-selected - but have 2 parallel single-stage video generations, both using the Distilled Lora, but set at 2 different strengths, 0.5 in the top ("Generate Distilled"), 0.2 in the bottom ("Generate Full"). It looks to me like the top flow should actually be using the Distilled base model since it has 8 manual sigmas configured in the Sampler, but then why use the Distilled Lora at all in that flow? The bottom flow looks more like a Dev base model flow because the sampler uses 15 steps, but then I wonder why the Distilled Lora strength is set so low (0.2)? 🫤

Anyway, as regards using the "ltx-2.3-22b-distilled-lora-384", does my summary below summarize YOUR guidance @RuneXX ?

Using LTX-2.3-22b-Dev base model:
Load model (ltx-2.3-22b-dev) >>> Distill Lora (ltx-2.3-22b-distilled-lora-384) >>> Power Lora Loader (ltx-2-19b-ic-lora-detailer optional + other IC Loras as necessary for camera mvt, etc.)
The Distilled Lora should be applied to the base model on BOTH 1st and 2nd stages, at the same strength.

Using LTX-2.3-22b-Distilled base model:
Load model (ltx-2.3-22b-distilled) >>> BYPASS Distill Lora (ltx-2.3-22b-distilled-lora-384) BYPASS >>> Power Lora Loader (ltx-2-19b-ic-lora-detailer optional + other IC Loras as necessary for camera mvt, etc.)

Thanks!

RuneXX

Owner Mar 9

•

edited Mar 9

For the dev model I might make own workflow, since since it can benefit from being a bit more flexible than those uploaded already.
And add other things such as temporal upscale. (+ doesnt use pre-set sigmas at 1st pass etc, thats set to 8 steps)

The "Basic" series workflow uploaded so far was most of all to let users get started easily, since swapping out the model loaders for the split models could be a bit challenging since they were hidden inside multiple sub-graphs in default workflows (from ComfyUI & LTX, although LTX made them a little "easier" to change things for LTX-2.3).
And most users just want fast video generations, and less ram/vram, so the distilled model is the easy go to choice (and works really good in LTX-2.3).

The Dev model is a bit more "tweak things for each run"... needs more steps, and higher CFG, and is quite a bit slower workflow, specially for lower vram.
As far as lora strength where the sweet spot is, i haven't tested out yet, but seems to be same as LTX-2.0.
And there is also multimodal guidance possible, where one can set different CFG for audio and video etc etc.

The lora strength is set to 0.2 at the LTX team own workflow (for single pass, as you said).
ComfyUI has set it to 0.5 for both 1st and 2nd pass using dev model (used to be 0.6 in LTX-2.0, and i bet it wont hurt to keep that).

In a two step workflow it "should" be used in first pass, unless you want to bump up the steps even higher (40++ that will take long time, probably without much gain).
Around 0.5-0.6 should be fine (and is what ComfyUI default has).

The last step is same in both distilled and dev model workflows (when using 2 steps). So that should be 0.5-0.6 as well.
Its entirely possible to use the lora only at 2nd pass though. But you have to bump up the steps a lot more (probably double ..instead of 20, up them to 40++).

That being said, nothing is set in stone ;-)
Some of the fun with ComfyUI is being able to tweak and experiment.

RuneXX

Owner Mar 9

Anyway, as regards using the "ltx-2.3-22b-distilled-lora-384", does my summary below summarize YOUR guidance @RuneXX ?
Using LTX-2.3-22b-Dev base model:
Load model (ltx-2.3-22b-dev) >>> Distill Lora (ltx-2.3-22b-distilled-lora-384) >>> Power Lora Loader (ltx-2-19b-ic-lora-detailer optional + other IC Loras as necessary for camera mvt, etc.)
The Distilled Lora should be applied to the base model on BOTH 1st and 2nd stages, at the same strength.

Using LTX-2.3-22b-Distilled base model:
Load model (ltx-2.3-22b-distilled) >>> BYPASS Distill Lora (ltx-2.3-22b-distilled-lora-384) BYPASS >>> Power Lora Loader (ltx-2-19b-ic-lora-detailer optional + other IC Loras as necessary for camera mvt, etc.)

yes that sounds about right ;-)

raffetazarius

Mar 10

•

edited Mar 10

For the dev model I might make own workflow, since since it can benefit from being a bit more flexible than those uploaded already.
And add other things such as temporal upscale. (+ doesnt use pre-set sigmas at 1st pass etc, thats set to 8 steps)

That'd be great! What IS the temporal upscaler? Is it for converting 25fps to 50fps video?

The Dev model is a bit more "tweak things for each run"... needs more steps, and higher CFG, and is quite a bit slower workflow, specially for lower vram.

Are you sure the 2.3 Dev model requires higher CFG? It's set at 1.0 in the ComfyUI official and LTX official templates.

As far as lora strength where the sweet spot is, i haven't tested out yet, but seems to be same as LTX-2.0.
And there is also multimodal guidance possible, where one can set different CFG for audio and video etc etc.

I spotted that in the https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/2.3/LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json workflow.... LTX have audio and video configured differently for the lower "Full" model flow.

The lora strength is set to 0.2 at the LTX team own workflow (for single pass, as you said).
ComfyUI has set it to 0.5 for both 1st and 2nd pass using dev model (used to be 0.6 in LTX-2.0, and i bet it wont hurt to keep that).
In a two step workflow it "should" be used in first pass, unless you want to bump up the steps even higher (40++ that will take long time, probably without much gain).
Around 0.5-0.6 should be fine (and is what ComfyUI default has).
The last step is same in both distilled and dev model workflows (when using 2 steps). So that should be 0.5-0.6 as well.

Eager to hear your own or anyone else's results experimenting with Distilled Lora strength, to get a better idea of what the "sweet spot" is for the Dev base model, and the Distilled base model, respectively.

Its entirely possible to use the lora only at 2nd pass though. But you have to bump up the steps a lot more (probably double ..instead of 20, up them to 40++).

What is your thinking behind doing this? Why would someone use the Distiller Lora for the 2nd pass but not the first? Perhaps you have a better grasp on what the Distiller Lora actually does to the base model? (because I do not 😂)

That being said, nothing is set in stone ;-) Some of the fun with ComfyUI is being able to tweak and experiment.

I would just love to know the "best" settings so I know I'm getting the best results possible out of the workflow. I understand some might enjoy the experimentation, but it is a less-enjoyed timesuck for me! :-/

RuneXX

Owner Mar 10

•

edited Mar 10

That'd be great! What IS the temporal upscaler? Is it for converting 25fps to 50fps video?

Yes exactly ;-) and makes things a bit more smooth. And strangely it even can help with things like fine details (teeth etc)
That being said, probably with a fast-bypass switch. I usually like 24-25fps, its more cinematic feel.

Are you sure the 2.3 Dev model requires higher CFG? It's set at 1.0 in the ComfyUI official and LTX official templates.

Entirely depends on the strength of the distilled lora. If lora is set to "full" (0.5 - 0.6), you can keep the cfg to 1.
With that strength the Dev model acts as if it was distilled model .
It can improve quality a bit even if its "run as if it was distilled".
Plus in this workflow mode, you can bump up the steps. Its no longer tied to the 8-step pre-set sigma. That can be beneficial too sometimes.

So the main benefit is being able to use more steps (when needed). In this kind of workflow (from comfyui and ltx).
And its entirely possible to also use the CFG and lower the lora ... dev workflow is a bit tweak and experiment by nature.
So are the multimodal guiders.

The dev model + distilled lora is also usable for single pass (with lora set low).
LTX doesnt really "recommend" single pass, its more for fast prototyping at low res, testing out ideas etc.

This table from LTX explains it well :

I think single pass has its place though. Specially for those with low vram that want to run lower res videos. Lets say 832x480 or something.

I spotted that in the https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/2.3/LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json workflow.... LTX have audio and video configured differently for the lower "Full" model flow.

Yes. this workflow has a multimodal guider. It works fine without, but can help if you want to tweak things, and experiment.
I might add that guider, but might be most of all beneficial for those that want to fine tune and tweak things, to guide it this or that way, with independent cfg for video and audio.
Anyone using that are probably looking for things they can tweak to get max out of the prompt and model ;-)

Eager to hear your own or anyone else's results experimenting with Distilled Lora strength, to get a better idea of what the "sweet spot" is for the Dev base model, and the Distilled base model, respectively.

Yes, plan to play with the dev model soon (in fact with LTX-2.0 i always used the dev model, but with the distilled lora set to full. So that i could bump up steps when needed etc.)

What is your thinking behind doing this? Why would someone use the Distiller Lora for the 2nd pass but not the first? Perhaps you have a better grasp on what the Distiller Lora actually does to the base model? (because I do not 😂)

Not sure i would, but its possible. To use with negative guidance and cfg higher (but you can use NAG in cfg 1 and still have negative guidance).
And is perhaps the proper way to use dev model,. Its how LTX 2.0 dev workflow was. (in comfy and ltx)

If the LTX2.3 dev workflows use it in both passes, its might just be because its a bit more "user friendly" and faster.
Probably deemed to not have much gain from running "full" without the lora

I would just love to know the "best" settings so I know I'm getting the best results possible out of the workflow. I understand some might enjoy the experimentation, but it is a less-enjoyed timesuck for me! :-/

The dev model is a bit experiment-and-tweak-settings model though. With multiple guiders (in ltx own workflow), steps and cfg.
But yes, i am sure its possible to have a golden middle that works great most all of the time ;-) (although form comfy and ltx thats basically running it as if it was distilled model)

Distilled model is perhaps the best overall choice for most though ;-) "plug & play" ;-)

raffetazarius

Mar 10

Is NAG really only meant for use with the Distilled Base model? (because I think Distilled might ignore negative prompt without it?)

RuneXX

Owner Mar 10

•

edited Mar 10

Yes, its meant for when CFG is set to 1. (for image and video models).
LTX Nag node takes care of that, so that negative prompt works even when CFG is 1

But doesn't depend on if its distilled or dev model, just if the cfg is 1

NAG is also used in other non LTX workflows (with other NAG nodes). Such as WAN with lightx cfg 1 lora etc, and different distilled turbo image models
https://chendaryen.github.io/NAG.github.io/

fzshyi

Mar 11

•

edited Mar 11

Why do I feel that the various samplers has little effect on the result, I have tried various permutations and combinations of euler_, LCM, and res_, there are some subtle differences, but no obvious difference, the main problems are unnatural expressions, slow movements, stiff movements, and speaking without opening her mouth.
I use LTX-2.3_-_I2V_T2V_Basic.json，maybe my understanding of sampler is too superficial.
In addition, I feel that LTX2 is sensitive to prompts to the point of being fragile, and sometimes a simple one or two words change can lead to very different results.

RuneXX

Owner Mar 11

•

edited Mar 11

Yes, there are plenty of samplers that works well. Euler, lcm, etc etc

Focusing on the prompting probably have much more impact on the result, than anything else. Like you say too.
For talk, you have to explicitly mention the subject is talking, I often even do a "double sure" .. . like : And then the woman talk, and she says: "..............."

And try prompting chronologically starting with a super short scene description (or leave that for bottom), and write the sequence of things to happen. .. first this, then this, and then that...
You can even use timestamp with some success.. .

0-4 seconds: The scene start with..... she turns around and talks to the viewer, she says: "............."
4-8 seconds: she walks over to the bar. Tracking camera following her.
etc etc

Basically more like a "film director", prompting out what you want to happen in the video, chronologically

I feel the prompting is quite different to Wan and other models. Its less describe what you see, more focus on sequence of actions / dialog sequence and camera.
And if prompting like an image generation, you can even just end up with a static image in the video.

Here is a good prompting guide :
https://ltx.io/model/model-blog/prompting-guide-for-ltx-2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment