Abstract:
A multi-view surveillance system captures the scenic details from a different perspective, defined by camera placements. The recorded data is used for feature extraction, which can be further utilized for various pattern-based analytic processes like object detection, event identification, and object tracking. In this proposed work, we present a method for creating a network of the optimal number of video cameras, to cover the maximum overlapping area under surveillance. In the proposed work, the focus is on developing algorithms for deciding efficient camera placement of multiple cameras at various junctions and intersections to generate a video summary based on the multiple views. Deep learning models like YOLO have been used for object detection based on the generation of a large number of bounding boxes and the associated search technique for generating rankings based on the views of the multiple cameras. Based on the view quality, the dominant views will be located. Further, keyframes are selected based on maximum frame coverage from these views. A video summary will be generated based on these keyframes. Thus, the video summary is generated through solving a multi-objective optimization problem based on keyframe importance evaluated using a maximum frame coverage.