Runway’s newest AI video generator brings giant cotton candy monsters to life

Enlarge / Screenshot from a Runway Gen-3 Alpha video generated with the prompt “A giant humanoid, made of fluffy blue cotton candy, stomps the ground and roars at the sky, with a bright blue sky behind them.”

On Sunday, Runway announced a new AI video synthesis model called Gen-3 Alpha, which is still in development, but it appears to produce video of similar quality to OpenAI’s Sora, which debuted earlier this year (and is also unreleased). It can generate new, high-definition video from text prompts that range from realistic people to surreal monsters stomping through the countryside.

Unlike Runway’s previous best model from June 2023, which could only create two-second clips, Gen-3 Alpha can reportedly create 10-second video segments of people, places, and things that have a consistency and coherence that’s easily surpassed goes Gen-2. If 10 seconds sounds short compared to Sora’s full minute of video, consider that the company operates on a shoestring computing budget compared to the more lavishly funded OpenAI — and actually has a history of offering video generation capabilities to commercial users.

Gen-3 Alpha does not generate audio to accompany the video clips, and it is very likely that temporally coherent generations (those that keep a character consistent over time) rely on similar high-quality training material. But Runway’s improvement in visual fidelity over the past year is hard to ignore.

AI video is heating up

It’s been a busy few weeks for AI video synthesis in the AI ​​research community, including the launch of the Chinese model Kling, made by Beijing-based Kuaishou Technology (also called “Kwai”). Kling can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherence said to match Sora.

Gen-3 Alpha prompt: “Subtle reflections of a woman on the window of a train traveling at super speed through a Japanese city.”

Not long after Kling debuted, people on social media started creating surreal AI videos using Luma AI’s Luma Dream Machine. These videos were new and weird, but generally lacked cohesion; we tried out Dream Machine and were unimpressed with everything we saw.

Meanwhile, one of the original text-to-video pioneers, New York City-based Runway, founded in 2018, recently found itself the target of memes showing that Gen-2 technology was falling out of favor compared to newer video synthesis models. That may have spurred the Gen-3 Alpha announcement.

Gen-3 Alpha prompt: “An astronaut runs down an alley in Rio de Janeiro.”

Generating realistic humans has always been tricky for video synthesis models, so Runway specifically shows how Gen-3 Alpha is able to create what the developers call “expressive” human characters with a range of actions, gestures, and emotions. However, the examples provided by the company weren’t particularly expressive (mostly people just staring and blinking slowly), but they do look realistic.

Human examples include generated videos of a woman on a train, an astronaut running down a street, a man whose face is illuminated by the glow of a television set, a woman driving a car, and a woman running.

Gen-3 Alpha prompt: “A close-up of a young woman driving a car as she thoughtfully sees blurry green forest visible through the rainy car window.”

The demo videos generated also include more surreal examples of video synthesis, including a giant creature walking through a run-down city, a man made of rocks walking through a forest, and the giant cotton candy monster you see below, which is probably the best video of the whole is. page.

Gen-3 Alpha prompt: “A giant humanoid, made of fluffy blue cotton candy, stomping the ground and roaring at the sky, with a bright blue sky behind them.”

Gen-3 will power several Runway AI editing tools (one of the company’s most notable claims), including Multi Motion Brush, advanced camera controls, and Director Mode. It can create videos from text or image prompts.

Runway says Gen-3 Alpha is the first in a series of models trained on a new infrastructure designed for large-scale multimodal training, taking a step toward developing what it calls “General World Models,” these are hypothetical AI systems that build internal representations of environments and use them to simulate future events within those environments.

Leave a Comment