Forgive me if this has already been discussed elsewhere on the site. Earlier
@Null had said that he wanted to transform himself into a black woman with AI. I don't know that there's an all-in-one voice and video solution that runs locally yet, or if there is it would have to be kind of shitty in one or more areas for the sake of consumer-grade hardware. Personally I'm not interested in paid services.
I stumbled across this video, which covers a recently released model that seems to take care of one half of the equation, the video part:
There are other face replacers out there, but the other ones I saw were closer to phone face filters which only replaced the center of your face (poorly) and not your hair or the rest of the image. This one brings the reference image to life pretty well. It needs a good quality modern video card, but if you have a 5000 series card, be sure to read the
instructions someone posted about what they had to do to get it to work, they're "too new" and don't support some types of acceleration yet. They say you get the best results with Ubuntu + RTX 4090 + TensorRT.
As with most AI stuff like this, it's bleeding edge and installation is usually fraught with errors, and then actually running it will give you plenty of errors and weird results too. Some users report a pair of glasses appearing spontaneously on their face or strange screen artifacts, which supposedly happens if you set the target FPS too low (it was trained for 25 FPS).
As for audio, it looks like most progress in that area has been happening on the TTS side rather than live audio replacement. For example
Qwen3-TTS just came out and looks really good at what it does, but it doesn't take in a live audio stream. The only thing I find for local, live voice replacement is
w-okada, a 3 year old program that still seems widely used. It doesn't seem to have very steep hardware requirements and people use it to replace their voice in Discord and video games. Also
for trannies.
The real trick would be to get a consistent amount of latency on one or the other and then match them up, or get a video + audio stream using one and send that to the other. Maybe AI could help code something that could do that.