Fast artistic style transfer for videos

Download as .zip Download as .tar.gz View on GitHub


Recently, research about artistic style transfer, which trains computers to be artists, become popular. Gatys et al. turned this task into an optimization problem and utilized convolution neural network to solve this problem. However, this method for image stylization doesn't work well for videos due to its failure to consider temporal consistency. To solve this problem, Ruder et al. proposed a method which integrated temporal loss into the loss function. But this method is pretty slow. Stylizing a 15-second-video takes more than 7.5 hours. Earlier this year, Johnson et al. made the image stylization procedure real-time by training a neural network for this optimization problem instead of optimizing each image separately. By combining the ideas of Ruder et al. and Johnson et al., we came up with a new method for video stylization, which keeps the temporal consistency but works about 10 times as fast as the method proposed by Ruder et al. Our method makes it possible to stylize movies and animations with reasonable time costs.


We’ve uploaded some sample videos to YouTube.

Sample Story (15s)

Style: The Starry Night

Big Buck Bunny (8m 2s)

With the help of Waifu2x super-resolution tool, we are able to make 1080p and 4K HD stylized videos without too much computational cost for stylization.


Style: The Starry Night

Doraemon (22m 39s)

Styles (from left to right): Composition VII, The Great Wave off Kanagawa, The Starry Night

Compare to Other Methods

The following videos compare our method with other methods

Sample Story (15s)

Resolution: 640*480

Label (As in Video)MethodTime
Ruder et al. (30 iterations)Ruder's method, 30 iterations0.80 hour
Johnson et al. (Real-time)Simply run Johnson's method for each frame77.8 seconds
Ruder et al. (1000 iterations)Ruder's method, 1000 iterations7.56 hours
Our method (30 iterations), no pixel lossOur method, with pixel-loss weight 00.80 hour
Our method (30 iterations), with pixel lossOur method, with pixel-loss weight 1.5e-30.80 hour
Big Buck Bunny (8m 2s)

Resolution: 960*540

Label (As in Video)MethodTime
NaiveSimply run Johnson's method for each frame /
Without Pixel LossOur method, with pixel-loss weight 033 hours
With Pixel LossOur method, with pixel-loss weight 1.5e-333 hours
Doraemon (22m 39s)

Resolution: 640*480

Label (As in Video)MethodTime
With Temporal ConsistencyOur Method58 hours
No Temporal ConsistencySimply run Johnson's method for each frame1.89 hours
/ Ruder et al. (1000 iterations)(predict) ~23 days

All time cost above does not contain time cost for calculating optical flow. This procedure can be paralleled with video stylization and calculating optical flow itself can be paralleled. In addition, we don't need to calculate optical flow again if we just want to change style for a video. We use CPU cluster to calculate optical flow. For Doraemon video, optical flow calculation cost ~50 hours.


L.A. Gatys, A.S. Ecker, M. Bethge, A Neural Algorithm of Artistic Style. arXiv:1508.06576

M. Ruder, A. Dosovitskiy, T. Brox, Artistic style transfer for videos. arXiv:1604.08610

J. Johnson, A. Alahi, L. Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution. arXiv:1603.08155