This paper investigates Distributed Compressed Video Sensing (DCVS) using Convolutional Sparse Coding (CSC). DCVS is a coding method that combines compressed video sensing and distributed video coding. Although many CSC approaches to DCVS use a random matrix as the measurement matrix, our method employs the Fourier matrix for the measurement matrix to reduce the computational load and memory consumption. In addition, this work addresses a DCVS in the convolutional sparse coding manner, which is more robust against object shift than that in a block-wise manner. The convolutional filters are also designed to improve the fidelity by using the L1 fidelity term. The experiments show that the proposed method outperforms the conventional method in SSIM and PSNR while reducing the execution time and memory usage at both the encoders and the decoder’s sides.
Graph theory is the main theory for handling data with complex relationships: social networks, protein structures, and web page links. In graph theory, an adjacency matrix represents a finite graph structure as a square matrix. Additionally, a graph embedding vector is a low-dimensional vector that extracts the features of a graph. However, adjacency matrices have a drawback of being sensitive to graph vertex ordering changes. In this study, we propose a neural network model using graph isomorphism problem to generate new graph embedding vectors. Experimental results showed that the embedding vectors are robust to vertex reordering.
Vision Transformer (ViT) is one of the neural network architectures applied to image processing based on Transformer. ViT has achieved State-Of-The-Art performances on various computer vision tasks. This study attempts to improve Input Layer of ViT by changing the way of positional embedding. We propose ViT with pre-positional embedding that adds constants to each pixel before dividing input images into patches. This method assumes the following image features: vertically asymmetric, horizontally symmetric, and distribution of similar features in an image extending concentrically from the center of the image. Experimental results demonstrate that the proposed method achieves the same image recognition accuracy as the conventional method with positional embedding while reducing the number of training parameters.
This paper tries improving image recognition accuracy with Convolutional Neural Networks (CNNs). CNNs are one of state-of-the-art image recognition frameworks, and have used the Rectified Linear Unit (ReLU) as the activation function. However, the ReLU rectifies negative values to zero. This paper applies the Sign-to-Position (S/P) format conversion after convolutional procedures to eliminate the rectification loss. Experimental results show that the proposed method improves the recognition accuracy of the MNIST and Fashion-MNIST data set by 0.50% and 1.30% compared with a conventional CNN respectively. The S/P format conversion also contributes to negative image recognition, and results in 12.58% and 3.66% higher accuracy.
Being motivated by the Saak (Subspace approximation with augmented kernels) transform, we propose an image recognition scheme using multi-layer sparse feature extraction with a convex solver ADMM (Alternating Direction Method of Multipliers). The Saak transform consists of a multi-layer PCA (Principal Component Analysis) and S/P (Sign-to-Position) conversion to avoid sign confusion. This paper adopts sparse representation instead of PCA and also compares the S/P conversion with the activation function ReLU (Rectified Linear Unit), which is realized by involving the projection mapping onto the non-negative set in convex formulas. The Saak transform uses PCA not only for feature extraction but also for dimension compression of feature vectors. We expect that our method does not need the dimension compression since sparse representation compresses features more than PCA. Experimental results on the MNIST and Fashion-MNIST dataset show that the proposed method is equivalent to the Saak transform in recognition accuracy, and that our method can make features more sparse and extract features that have high discriminant power locally.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.