Abstract: The increasing volume of user-generated human-centric video content and its applications, such as video retrieval and browsing, require compact representations addressed by the video ...