[Submitted on 7 Oct 2019]

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film. Unfortunately, pre-trained models to detect all the events of interest in video may not exist, and training new models from scratch can be costly and labor-intensive. In this paper, we explore the utility of specifying new events in video in a more traditional manner: by writing queries that compose outputs of existing, pre-trained models. To write these queries, we have developed Rekall, a library that exposes a data model and programming model for compositional video event specification. Rekall represents video annotations from different sources (object detectors, transcripts, etc.) as spatiotemporal labels associated with continuous volumes of spacetime in a video, and provides operators for composing labels into queries that model new video events. We demonstrate the use of Rekall in analyzing video from cable TV news broadcasts, films, static-camera vehicular video streams, and commercial autonomous vehicle logs. In these efforts, domain experts were able to quickly (in a few hours to a day) author queries that enabled the accurate detection of new events (on par with, and in some cases much more accurate than, learned approaches) and to rapidly retrieve video clips for human-in-the-loop tasks such as video content curation and training data curation. Finally, in a user study, novice users of Rekall were able to author queries to retrieve new events in video given just one hour of query development time.

Subjects: Databases (cs.DB); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR) Cite as: arXiv:1910.02993 [cs.DB] (or arXiv:1910.02993v1 [cs.DB] for this version)

[1910.02993] Rekall: Specifying Video Events using Compositions of Spatiotempora...

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

Recommend

[2111.13662] Modular Information Flow Through Ownership

[2011.06171] The Usability of Ownership

白羊李志远のBlog

[1901.01001] Identifying Barriers to Adoption for Rust through Online Discourse

[1805.07339] Scanner: Efficient Video Analysis at Scale

[1909.12281] Human-Centric Program Synthesis

[2011.05600] Documentation Generation as Information Visualization

【人事】上海市任命原工信部科技司司长刘多为副市长

[2101.06305] The Role of Working Memory in Program Tracing

直播卫星平台1月1日增加“CCTV-6电影高清”和“CCTV-8电视剧高清”频道

About Joyk