[CV System] NoScope: Optimizing Neural Network Queries over Video at Scale

PDF

Summary

NoScope presents a suite of optimizations that significantly reduce computational cost of DNN queries on fixed-angle cameras. There are mainly two techniques: specialized model and difference detector, followed by a cost-based optimizer that finds best parameters of those two techniques.

Intuition

Fixed-angle cameras give you many opportunities:

A lot of static background. Frame-to-frame movement is small. So you can exploit frame-to-frame difference.
Object of interest usually appear in the same angle (e.g., cars entering a parking lot’s entrance). So you can forgo CNN’s generality of recognizing different angles of a class.
Categories of objects to appear are limited. For example, it’s unlikely an aircraft carrier will appear on CMU campus. So you can forgo CNN’s generality of recognizing many classes.

Optimizations

Specialized Models

Forgoing unnecessary generality

Use a shallow model to mimic a full model’s prediction faithfully but only on the target camera and the target object. In other words, the shallow model can perform poorly on other cameras or other objects. (Who cares?)

Teacher-student training

Use the full model to generate labelled training data for the shallow model.

Threshold and cascade

If the shallow model is not confident enough (positive or negative), pass to the full model. A tunable parameter. Trade-off between end-to-end accuracy and computational cost.

Performance

Reported 340X faster than full model.

Difference Detector

Goal: detect small frame-to-frame difference so that we can reuse the prediction of the previous frame.

Input: reference frame, test frame

Output: whether we can reuse the prediction of the reference frame. If not, pass to the CNN (incl. the specialized model above)

Variants

Difference against a fixed reference frame.
Difference against an earlier frame.

Extension

Difference = mean square error
Blocking/tiling: slice frame into tiles and train a logistic regression to weigh each tile. For example, in a street view, traffic lights change frequently but are irrelevant to detecting cars.
Frame skipping. Trade-off between accuracy and speed.

Cost-base Optimization

There are several knobs regarding the above two optimization techniques:

Specialized model:

The CNN architecture of the shallow model
The confidence threshold to consult the full model

Difference detector:

Firing threshold of difference
Frame skipping interval
Choice of variants and extensions

The optimal configuration (choice of knobs) depends on the specific camera and task.