This section is incomplete.
Stef:
I have used Infinite a lot and there is no solution against this beside masking, which sucks because it WILL let traces unless your character’s head moves only slightly and there is no zooming/panning. Infinite is so good in v2v for lipsync but it changes details of the source vidéo, especially if the camera moves/zooms/pans. Mask when you cannot do otherwise, and play around with Grow Mask With Blur or how the node is called, sorry I just woke up. If you have several characters, don’t forget Infinite will go from left to right. And if only one character is speaking, you’ll have to feed the other with an empty audio track, though I got often issues if I just used a generated silence in Audacity. I solved the issues with free soundfiles of people breathing quietly which work wonder as ‘silence’, you can make them even quieter with Audacity. Also, if using several sound files they must not only have the same length but also the same amount of samples, you can check in Mediainfo how many samples an audio file has (check advanced parameters to display those). If the sample count doesn’t match, infamous kwarg error in comfy console. And another thing : If you mask, because you have no other option (see masking as last option because it sucks), make sure the audio file isn’t 1 ms longer than the video file or again kwarg error (because Infinite will look for masks which don’t exist). Last but not least, Wan2.1 loras work with infinite both in i2v and v2v mode, though they sometimes give funny unexpected results, especially in i2v mode when they fight with Infinite for hand movements (if your lora is a movement or pose lora e.g.).
in i2v Infinite is perfect of course, masking won’t let traces because there is no source video. But in v2v masking in Infinite is just meh unless I missed something
in v2v without masking, Infinite works very well if you accept the details it WILL change in your source video of course, you don’t even need to add masks, you just provide audio files from left character to right character audio1, audio2 etc but the changes can be dramatic, as said, especially when the camera moves
To my knowledge, there is no open source perfect v2v lipsync solution yet, Infinite is still the second rank optimum
It’s a quote of Pareto, means ‘best possible under current conditions’
unfortunately it will change other things than your character’s expressions, it is because Wan2.1 reencodes the video - if the camera moves it can hallucinate things which were not in your source video
InfiniteTalk support has been added to ComfyUI native. Sample wf: infinitetalk_multi_test_native
Infinite Talk workflows utilize regular Wan I2V safetensors file with regular speed lora of your choice.
They additionally use a “InfiniteTalk Model” such as Wan_2_1-InfiniteTalk-Multi_fp16.safetensors. It is loaded by WanVideo Long I2V Multi/InfiniteTalk which similar to VACE plugs into VanVideo Model Loader.
clip_vision_h and wav2vec .safetensors files are loaded as well.
Can do start image + audio to video as well as start video + audio to video. Can use up to 4 audio tracks and masks to show which character is associated with which file.
| Name In Code (sometimes easier to understand) | Name on Screen |
|---|---|
| MultiTalkModelLoader | Multi/InfiniteTalk Model Loader |
| MultiTalkWav2VecEmbeds | Multi/InfiniteTalk Wav2vec2 Embeds |
| WanVideoImageToVideoMultiTalk | WanVideo Long I2V Multi/InfiniteTalk |
| Wav2VecModelLoader | Wav2vec2 Model Loader |
| MultiTalkSilentEmbeds | MultiTalk Silent Embeds |
| Pre Embeds Node | Pre Embeds Inputs -> Output | Embeds Node | Input from Pre / Embeds Inputs -> Output | Model | WanVideo Sampler Input |
|---|---|---|---|---|---|
WanVideo ClipVision Encode |
image_1, image_2 -> image_embeds |
WanVideo Long I2V Multi/InfiniteTalk |
clip_embeds / start_image -> image_embeds |
Wan I2V family | image_embeds |
| - | - | Multi/InfiniteTalk Wav2vec2 Embeds |
wav2vec_model, audio_1, .. 4, ref_target_masks -> multitalk_embeds |
Wan I2V family | multitalk_embeds |
| Node | Setting | Meaning |
|---|---|---|
WanVideo Long I2V Multi/InfiniteTalk |
frame_window_size |
how many frames are processed at once, leave at default |
Multi/InfiniteTalk Wav2vec2 Embeds |
num_frames |
total number of frames in the video |
WanVideo Context Option is not compatible with Infinite Talk.
Q: is it possible to do v2v with infinitetalk and only inpaining the head? my videos get changed too much
A: yeah, you can use a mask