FastArtisticVideos by zeruniverse

Abstract

Recently, research about artistic style transfer, which trains computers to be artists, become popular. Gatys et al. turned this task into an optimization problem and utilized convolution neural network to solve this problem. However, this method for image stylization doesn't work well for videos due to its failure to consider temporal consistency. To solve this problem, Ruder et al. proposed a method which integrated temporal loss into the loss function. But this method is pretty slow. Stylizing a 15-second-video takes more than 7.5 hours. Earlier this year, Johnson et al. made the image stylization procedure real-time by training a neural network for this optimization problem instead of optimizing each image separately. By combining the ideas of Ruder et al. and Johnson et al., we came up with a new method for video stylization, which keeps the temporal consistency but works about 10 times as fast as the method proposed by Ruder et al. Our method makes it possible to stylize movies and animations with reasonable time costs.

Samples

We’ve uploaded some sample videos to YouTube.

Sample Story (15s)

Style: The Starry Night

Big Buck Bunny (8m 2s)

With the help of Waifu2x super-resolution tool, we are able to make 1080p and 4K HD stylized videos without too much computational cost for stylization.

1080p

Style: The Starry Night

4K

Doraemon (22m 39s)

Styles (from left to right): Composition VII, The Great Wave off Kanagawa, The Starry Night

Compare to Other Methods

The following videos compare our method with other methods

Sample Story (15s)

Resolution: 640*480

Label (As in Video)	Method	Time
Ruder et al. (30 iterations)	Ruder's method, 30 iterations	0.80 hour
Johnson et al. (Real-time)	Simply run Johnson's method for each frame	77.8 seconds
Ruder et al. (1000 iterations)	Ruder's method, 1000 iterations	7.56 hours
Our method (30 iterations), no pixel loss	Our method, with pixel-loss weight 0	0.80 hour
Our method (30 iterations), with pixel loss	Our method, with pixel-loss weight 1.5e-3	0.80 hour

Big Buck Bunny (8m 2s)

Resolution: 960*540

Label (As in Video)	Method	Time
Naive	Simply run Johnson's method for each frame	/
Without Pixel Loss	Our method, with pixel-loss weight 0	33 hours
With Pixel Loss	Our method, with pixel-loss weight 1.5e-3	33 hours

Doraemon (22m 39s)

Resolution: 640*480

Label (As in Video)	Method	Time
With Temporal Consistency	Our Method	58 hours
No Temporal Consistency	Simply run Johnson's method for each frame	1.89 hours
/	Ruder et al. (1000 iterations)	(predict) ~23 days

Notes

All time cost above does not contain time cost for calculating optical flow. This procedure can be paralleled with video stylization and calculating optical flow itself can be paralleled. In addition, we don't need to calculate optical flow again if we just want to change style for a video. We use CPU cluster to calculate optical flow. For Doraemon video, optical flow calculation cost ~50 hours.

References

L.A. Gatys, A.S. Ecker, M. Bethge, A Neural Algorithm of Artistic Style. arXiv:1508.06576

M. Ruder, A. Dosovitskiy, T. Brox, Artistic style transfer for videos. arXiv:1604.08610

J. Johnson, A. Alahi, L. Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution. arXiv:1603.08155

Fast_Artistic_Videos

Fast artistic style transfer for videos

Abstract

Samples

Sample Story (15s)

Big Buck Bunny (8m 2s)

1080p

4K

Doraemon (22m 39s)

Compare to Other Methods

Sample Story (15s)

Big Buck Bunny (8m 2s)

Doraemon (22m 39s)

Notes

References