Abstract:
Object recognition and tracking in videos is a field of research with extremely high potential and utility. Application of Machine Learning in this field is relatively recent with many algorithms being proposed to do so. Detecting and tracking objects in videos using TensorFlow (CNN) is a relatively recent approach. This paper proposes a methodology to detect and track a specific class of object in a given video using Convolutional Neural Networks (CNNs). The CNNs used for this project are the TensorFlow SSD Model and the TensorFlow Inception Model. The use case of airplane detection was incorporated as a test subject in this project although the concept is widely extensible to any class of object per the application. Object recognition has important applications in physical security systems including intrusion detection. For the SSD Model, images of planes were downloaded and were annotated using bounding boxes to identify regions of interest. Images were split into training and test sets, after which TensorFlow specific records were generated for training and test sets. For the Inception model, the last layer of the Neural Network was trained with multiple images of Planes and Random images to essentially obtain a classifier for classifying Planes and No-Planes. The model was tested and compared with the SSD model on multiple criteria. The TensorFlow SSD model was accurate, generating crisp bounding boxes with relatively high accuracy. The number of false positives and false negatives were very low. The TensorFlow Inception Model had a higher accuracy than the TensorFlow SSD model in terms of false positives and false negatives. However, the model does not display bounding boxes since the model isn’t meant to find the region of interest. Both are good offline models for a specific class of object recognition. The GPU clearly outperformed CPU in Training and Testing by a wide margin. The TensorFlow Inception model is suitable to extract frames in which specific object is present if the position of the object is not of importance. The SSD model is suitable if the specific object needs to be detected with its position in a video frame