The purpose of this paper is on the study of data fusion applications in traditional, spatial and aerial video stream applications which addresses the processing of data from multiple sources using co-occurrence information and uses a common semantic metric. Use of co-occurrence information to infer semantic relations between measurements avoids the need to make use of such external information, such as labels. Many of the current Vector Space Models (VSM) do not preserve the co-occurrence information leading to a not so useful similarity metric. We propose a proximity matrix embedding part of the learning metric embedding which has entries showing the relations between co-occurrence frequency observed in input sets. First, we show an implicit spatial sensor proximity matrix calculation using Jaccard similarity for an array of sensor measurements and compare with the state-of-the-art kernel PCA learning from feature space proximity representation; it relates to a k-radius ball of nearest neighbors. Finally, we extend the class co-occurrence boosting of our unsupervised model using pre-trained multi-modal reuse.
Traditional event detection from video frames are based on a batch or offline based algorithms: it is assumed that a single event is present within each video, and videos are processed, typically via a pre-processing algorithm which requires enormous amounts of computation and takes lots of CPU time to complete the task. While this can be suitable for tasks which have specified training and testing phases where time is not critical, it is entirely unacceptable for some real-world applications which require a prompt, real-time event interpretation on time. With the recent success of using multiple models for learning features such as generative adversarial autoencoder (GANS), we propose a two-model approach for real-time detection. Like GANs which learns the generative model of the dataset and further optimizes by using the discriminator which learn per sample difference between generated images. The proposed architecture uses a pre-trained model with a large dataset which is used to boost weekly labeled instances in parallel with deep-layers for the small aerial targets with a fraction of the computation time for training and detection with high accuracy. We emphasize previous work on unsupervised learning due to overheads in training labeled data in the sensor domain.
In this paper, we discuss some of the challenges of computing mosaics from practical aerial surveillance video, and how these challenges can be overcome. One particular challenge is "burned-in" metadata, which occurs when metadata from the sensor and aircraft are burned into the actual pixel data. Another obstacle is the presence of "black borders" that commonly appear on the edges of video frames, which may vary in size and location from system to system. The paper demonstrates methods of robustly aligning frames and compositing them so that the limitations just mentioned do not affect the final mosaic quality too severely.
Significant progress toward the development of a video annotation capability is presented in this paper. Research and development of an object tracking algorithm applicable for UAV video is described. Object tracking is necessary for attaching the annotations to the objects of interest. A methodology and format is defined for encoding video annotations using the SMPTE Key-Length-Value encoding standard. This provides the following benefits: a non-destructive annotation, compliance with existing standards, video playback in systems that are not annotation enabled and support for a real-time implementation. A model real-time video annotation system is also presented, at a high level, using the MPEG-2 Transport Stream as the transmission medium. This work was accomplished to meet the Department of Defense’s (DoD’s) need for a video annotation capability. Current practices for creating annotated products are to capture a still image frame, annotate it using an Electric Light Table application, and then pass the annotated image on as a product. That is not adequate for reporting or downstream cueing. It is too slow and there is a severe loss of information. This paper describes a capability for annotating directly on the video.
KEYWORDS: Video, Image segmentation, Video surveillance, Computer programming, Video compression, Surveillance, Image compression, Detection and tracking algorithms, Video processing, Sensors
Temporal segmentation of video in the compressed domain is becoming increasingly popular due to its computational advantages over video decompression followed by pixel-domain segmentation. This paper discusses the advantages of compressed-domain processing, and proposes a computationally-efficient method of detecting scene changes without reconstructing the video. The target application provides requirements that allow the algorithm to avoid complicated processing that searches for unnatural scenes changes such as dissolves, fades, and wipes that are common studio effects. The paper provides experimental results to demonstrate operation of the algorithm on real data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.