Self-Supervised Video Representation Learning By Recurrent Networks And Frame Order Prediction