구글도 동영상 생성 AI 발표했다

스테이블 디퓨전(Stable Diffusion) 등 이미지 생성 AI가 화제가 되는 가운데 메이크어비디오(Make A Video)나 페나키(Phenaki) 등 동영상 생성 AI도 잇달아 등장하고 있다. 새롭게 구글이 테디베어가 접시를 닦는다(a teddy bear washing dishes)는 자연어 지시로 동영상을 생성하는 이메진 비디오(Imagen Video)를 발표했다.

구글은 2022년 5월 텍스트에서 고정밀 이미지를 자동 생성할 수 있는 AI인 이메진(Imagen)을 발표한 바 있다. 이어 구글은 이번에 이미지가 아닌 5초간 동영상을 생성할 수 있는 이메진 비디오를 공개했다.

Excited to announce Imagen Video, our new text-conditioned video diffusion model that generates 1280×768 24fps HD videos! #ImagenVideo https://t.co/JWj3L7MpBU
Work w/ @wchan212 @Chitwan_Saharia @jaywhang_ @RuiqiGao @agritsenko @dpkingma @poolio @mo_norouzi @fleet_dj @TimSalimans pic.twitter.com/eN81LqZW7I
— Jonathan Ho (@hojonathanho) October 5, 2022

이메진 비디오는 먼저 입력된 텍스트 프롬프트를 자연어 처리 AI인 T5로 처리한다. 다음으로 확산 모델로 영상을 생성하는 비디오 디퓨전 모델(Video Diffusion Models)이 기반이 되는 24×48 해상도, 초당 3프레임으로 16프레임 영상을 생성한다. 그리고 이를 시간적 초해상도(Temporal Super-Resolution)와 공간적 초해상도(Spatial Super-Resolution)라는 모델로 업샘플링해 최종적으로 1280×768 해상도와 초당 24프레임에서 128프레임 그러니까 5.3초 영상을 생성한다.

🥳Thrilled to share Imagen Video: our new text-to-video diffusion model generating 1280×768 24fps HD videos! #ImagenVideo

Website: https://t.co/0y4O6AZFtK https://t.co/t3fUsppHWN pic.twitter.com/uaGqch2NPt
— Ruiqi Gao (@RuiqiGao) October 5, 2022