The hard part with using diffusion models (easier today by a lot than even two years ago) is being consistent enough in style, character, and story for each image. I learned pretty quickly that I’d have to train my own models and adapters to actually pull it off.
That training started out easy’ish but then I kept reiterating over and over and over again (read: failing)- I was fine tuning large 100B parameter models as the base and augmenting with LORA’s where at the end of the day I should have just created a lot of LORA’s. Realizing I needed to create a “world” for the story as I think much more visually turned into mapping all of places in the story with mapping software. And so it went… and went.. I created one new complex task after another to inadvertently prevent me from ever actually creating something but just designing and really geeking out on tech all day.

Settled with something like the above style but a bit more photo-realistic- I needed to get my value back on those thousands of photos I had to organize into datasets, annotate, etc that took weeks per adapter.