the old distill LoRA for LongCat-Avatar is suboptimal way to use it, and they are training a specific new one for Avatar, so right now the results aren’t best it can do if you use the distil, and if you don’t it’s super slow
they also said they are working on 24fps version because the 32/16 fps it’s now has some lipsync issues this model now actually uses 32 fps as input an samples the audio with stride of 2, so the output is 16 fps video
Following in the trail after Infinite Talk, Humo and Wan S2V we now have LongCat Avatar.
LongCat Avatar seems to be based on LongCat,
and able to use same LoRA-s, at least to an extent.
The purpose of LongCat Avatar appears remarkably similar to HuMo and Infinite Talk.
Actually it appears that the same team which created MultiTalk/InfiniteTalk is behind this model.
As of 2025.12.23 Kijai has added code supporting LongCat Avatar in ComfyUI to main branch
in Wrapper.
One of the relevant nodes:
![]()
prev_latentsis where it continues from, the overlap amount of frames are taken from it,ref_latentis used to insert that latent in the sequence at theref_frame_indexposition (latent space), which is how it keeps the reference consistency in longer extensions
Model and LoRA-s: HF:Kijai/LongCat-Video_comfy.
Reportedly around 10 steps is necessary
It feels like you get diminishing returns pretty early when increasing step count.
reference method always has the initial frame to use, so it still should not burn.
The audio_cfg makes it twice as slow, you can try without it but usually the lipsync suffers a lot
The audio_cfg has major impact on the lipsync, so that’s the main thing to adjust
1 minute without degradation, using the old distill lora (at 0.8 str), audio_cfg 3.0, text cfg 1.0, 12 steps with the longcat distil schedule
you can do v2v already; this helps to slice the correct part of the input video for each window
![]()
at the moment there is only the bf16 model for longcat_avatar; for the normal longact video there is a fp8 version; You can run a bf16 at fp8 though
are you using the chinese-wav2vec? the chinese model is the only one that works … because it’s trained with this one
wav2vec-chenese-base_fp16.safetensors bf16
do not use sageattn 1.0.6, need sage 2.2.0 or sdpa
Youtube tutorial