Projects 2025 – Alex Swain

alex swain October 10, 2025 0 Comments

Sorted new > old

Julia Child TTS model almost complete but not without days of pain. I had to pick one of the most unique voices…..It actually got decent at 35K steps but getting it perfect has taken another week. I’ve no experience in creating voice models but approaching it similarly to a diffusion model seems to have paid off…slightly. In diffusion or RD models 5e-6 wouldn’t produce a whole lot of benefit to me at the tail end of training. I’ve always thought slow-n-low and time over speed always provides a long-term more stable model (with diffusion models). With TTS models it looks like faster improves prosody and detail that slow-n-low may never get to. It is a lot of listening to wav’s and generating sentences that she would never say. Overfitting is instantly evident with TTS which is actually good. One great thing about TTS models is how fast they train without having to use Cloud. Again, a few 4090’s takes maybe 8 hours to get somewhere decent. I’ve started a post on that effort which may be done by Christmas.
Fall means food (so do the other seasons though). Sauce season has begun. Sounds like heresy to some but I buy my canned tomatoes from Amazon because they are cheaper and impossible to find anywhere locally. I use “San Marzano Dop Authentic Whole Peeled Plum Tomatoes (6 Pack), 1.75 Pound (Pack of 6)” to great success.
10/2025. The barrier to entry has dropped down to where mere mortals without infinite Cloud budget can train high quality and accurate voice models. Therefore I have begun, yesterday in fact, training my Julia Child voice model that goes along nicely with my Julia Child LLM (soon on HF). I’ve been meaning to do this for a while. As someone that loves to cook and bake it made sense to me after I remembered one day the entirety of her episodes were on YouTube. I learned to make Croissant from watching her episode on them very, very slowly. I was shocked that they came out right the first time. The other reason for the voice training was because I knew almost for a fact that it was going to be extremely hard to train her voice. Anyone that knows her voice knows her voice. Her voice is as unique a someone like James Earl Jones if not a lot more unique. Plus the added challenge of her mixing French into many of her sentences. When Stobe died, a much admired train hopper and life narrator, I trained a model on his YT episodes. I thought it to be a tribute at the time but after training it I realized it was just creepy. The Car Talk guys were less creepy though and helped me fix my scooter!
RV Life (I have the hat but not the t-shirt). I spent a few months tearing apart my RV and installing routers, switches, computers, wifi antennas, and other geek tech. It was/is super fun. I used a speaker (?!) that was on the outside of the RV as an exhaust fan. The end game goal was/is to start working from the road indefinitely. Had to pivot all my thinking to DC power which I much prefer anyway.
- I built a sacrificial raidz3 ZFS volume on an old NAS I had to see what would happen if spinning disks were exposed to the bumpy roads and other forces. To use non-spinning disks was simply not cost effective- not just because of the volume of data I have, but the cost of SSD not low enough to send magnetic disks into the past. Yet. Plus, I don’t think SSD is more reliable, it’s likely just more resistant to shock. I thought about how to go about this for a while and settled on these dense rubber squares aka “isolation pads” used for absorbing vibration from washing machines, compressors, etc. I mounted them on the bottom of the unit and went driving around looking for sink holes while it was on and doing stuff. I really thought this was just another stupid idea I had as this RV shakes and rumbles a lot while driving. (As of May 2025 no problems..yet?). (completed turning the bedroom into an office in September). Actually used a brand-spanking VACE model to mock it up in advance with limited success. Turns out I could have done the same thing in Photoshop or GIMP in about 2 minutes instead of two hours. And by two hours I mean like 4 hours+.
- I had a friend install a pretty industrial looking LTE/5G/Wifi6+ antenna dome on the top of the RV and I installed an UPS, some gig switches, and a router running OpenWRT. I wish I was a little ahead of the curve on eSIM’s but the device I purchased for the cellular side of things had one eSIM and one physical sim (I’d have loved to just use eSIM’s exclusively. update: there are sims that are essentially or literally eeprom’s which I’ve swapped into the device!) I’d have probably tried to hack it all together with a Pi but time wasn’t on my side. Physical SIM’s, no thanks. So that runs the network that chooses the strongest cellular signal or best bandwidth via various scripts other folks contributed, fails over during an outage of either carrier (tmo/vz/att). While at home in the driveway with this beast I ran an old TPlink router in repeater/bridged mode and just use that inside the house. Goodbye, cable. Some tests have resulted in getting at least 1Gbps/sec using this setup (moving) which is plenty fine by me. Now, when you’re in an Alfalfa field in Kearney, Nebraska..
I’m back working mostly on LLM’s more as the heat of the Summer drives me back indoors, and this Summer is particularly hot, although I can’t seem to even scratch the itch of all of this incredible growth of AI in the media space. I really want to spend more time with AI and media.
- I’m also interested in using small edge models to take data inputs in with code and make inference based decision routing based on it’s training data. Maybe I just described an agent, that sort of thing I’m not too up on yet. I know some big AI people are talking about it like it’ll solve world hunger. That usually means it’s more hype than reality. For now.
Civil War training. I don’t recall if I mentioned this elsewhere but I trained a model on several thousand (5k+ images) from the LOC (a WAN 2.1 LoRa to be specific). I need to get back to it but where I left off it was generating pretty “realistic” motion pictures of what the Civil War may have looked if say Brady was wandering around with an old black and white 16mm. This was a month before 2.2 came out so now a whole other thing. I do recall 2.2 is backwards compatible but with hi/low noise models there is much tuning to be done + distillation and etc.