PDF

Summary

Glimpse tries to tackle system problems for computer vision based mobile application in an offload setup. Glimpse offloads the hard work of detection and recognition to a powerful server, while mobile devices tracks the object to hide detection and recognition latency.

Problems

  1. Computer vision algorithms cannot be done in real-time (30fps) even on a powerful server
  2. Network latency in offload makes detection/recognition result stale
  3. The accuracy of detection/recognition suffers from such staleness of results
  4. Below is the performance measurement of object detection, feature extraction, recognition, and tracking. Mobile client is a Samsung Galaxy Nexus, server a Intel i7 3.6GHz, 4-core CPU. Image resolution is 640x480.

Design Goals

  1. High Trackability
  2. Low Bandwidth Consumption

Technique

  1. Architecture

  2. Active Cache (High Trackability):

    Actice cache on the mobile devices store the intermediate frames between frames offload to detect and current camera feed frame. A subset of cached frames are used on the mobile device to track object of interest to bring the staled detection result up-to-date.

    Mobile devices cannot process all frames in the active cache because tracking on each frame takes more than 30ms. Two factors then need to be considered.

      * How many frames to select
      * Which frames to select 
    

    How many frames to select is determined by the end-to-end processing delay of a frame and the execution time of the tracking algorithm.

    Frames selected should capture user movements as much as possible and not have much redundancy. Authors transform this problem into a linear partition problem – partition n number of frames into l partitions such that the maximum sum of frame differences over all partitions is minimized.

  3. Trigger Frame (Low Bandwidth Consumption):

    Instead of sending every frame for detection, Glimpse only sends trigger frames satisfying one of following conditions:

    • Tracking Failure
    • Significant Change in the Scene
    • Number of Frames in Flight (set to 1 in implementation)

Results

(Server Only: Client outputs staled recognition/detection results directly)

Lessons Learned

  1. Implementation Method:

    Authors used native code on Android to speed up execution. Author’s in-house frame difference function takes only 30 ms using C++ and JNI compared to 220 ms using java.

  2. Evaluation Setup:

    For a comparative and reproducible experiments, authors built an emulation framework. All experiments are done using a linux machine as Glimpse Client and a windows machine as Glimpse Server. Mahimahi tool is used to simulate different types of networks. Mahimahi takes network measurement traces and delay packet delivery based on per packet delay in the trace. Previous Wi-Fi and LTE delays are used. Compute delays are collected using micro-benchmarks collected from smartphones and Google Glasses.

  3. Paper Writing:

    For each design choice, authors clearly stated alternatives and comparisons to alternatives. Such comparison and justification serve well to analyze the design choices authors made.