Microsoft's VASA-1 Raises Deepfake Concerns With Realistic Video Generation
Microsoft has introduced an AI image-to-video model, VASA-1, that can generate videos from just one photo and a speech audio clip.
The model will be capable of delivering high video quality with realistic facial and head dynamics while also supporting the online generation of 512x512 videos at up to 40 FPS.
Introducing the model, Microsoft’s website read, “It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviours."
According to the company, VASA-1 “outperforms previous methods along various dimensions comprehensively”, owing to the extensive experiments behind it.
Precisely, it has the ability to generate deepfake videos using just one image. However, Microsoft emphasises that the tool is purely a "research demonstration" with no plans for product or API release.
These developments can potentially add to the realm of misinformation, given the swift improvement of generative AI. Images generated from prompts have rapidly progressed from flawed to almost indistinguishable from reality in just a few months.
