Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

Chris Remington@beehaw.org · 1 year ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

some_guy@lemmy.sdf.org · 1 year ago

They mentioned one potential use that I thought has value and that I hadn’t considered. For video conferencing, this could transmit data without sending video and greatly reduce the amount of bandwidth needed by rendering people’s faces locally. I don’t think that outweighs the massive harms this technology will unleash. But at least there was some use that would be legit and beneficial.

I’m someone who has a moral compass and I don’t like that scammers will abuse this shit so I hate it. But there’s no keeping it locked away. It’s here to stay. I hate the future / now.

flora_explora@beehaw.org · 1 year ago

Wouldn’t you then have to run the AI locally on a machine (which probably draws a lot of power and memory) or use it via cloud (which depends on bandwidth just like a video call). I don’t really see where this technology could actually be useful. Sure, if it is only a minor computation just like if you take a picture/video with any modern smartphone. But computing an entire face and voice seems much more complicated than that and not really feasible for the usual home device.

barsoap@lemm.ee · edit-2 1 year ago

A model that can only generate frontal to profile views of heads would be quite small, I can totally see that kind of thing running on current consumer GPUs, in real time. Near real time is already possible with SDXL-based models with some speedup tricks applied as long as you have a mid-range gaming GPU and those models are significantly more general. It’s not like the model would need to generate spaghetti and sports cars alongside with the head.