22 second renders with consumer hardware and all open source makes me happy. But rendering CATS in 22 seconds? Woah. I clearly didn’t put much time into prompt creation. You can do it though and send me a flip book of your results!
Details: Consumer 4090 running ComfyUI, WAN 2.1 14B VACE distilled , q5_1 quant (!), q6k quant for the TE. a LoRA in this workflow goes a long way, wan_lcm_r16_fp32_comfy.safetensors or use I think Causvid, both are good. The former is plenty good at .07 strength and tends to be less invasive towards the output (rank 16 and r32 seem to both work well at very low strengths. (Way under .2)) That said, you should do an x/y test or something to see how well the model is adhering to your prompts. + Sage Attention at F16 and CFGZero. Best part? 4 steps, LCM, Normal (beta/simple/?), cfg 1. Full disclosure: This is without a v2v or i2v input. That said, it doesn’t dramatically increase the render time, I think especially if you’re dialing down your base model strength.
A few renders- The last is the max resolution (as far as I know that’s true, today 07/18 which could change by tonight) that Wan 2.1 can do natively w/o upscaling (832×480):
This first render is long because I interpolated it and saved with a slower frame rate to see what I could get away with. This one is. Well, not good. 28 seconds is obviously too, um, data-less. No amount of FFmpeg or Avidemux hacking will improve this much. 14 may work though. FFmpeg can crank through this in such little time (way < 1 second) that I don’t even add time to the original render time.
Below is 480P @24fps original and a few seeds up.
Last render all settings same except incremented seed by 1, and beta scheduler. We’ve Benjamin Buttoned, cat style! IMHO much sharper!