Recently, convolutional neural networks (CNN) have been widely used in object detection and image recognition for their effectiveness. Many highly accurate classification models based on CNN have been developed for various machine learning applications, but they generally computationally costly and require a hardware-based platform with super computing power and memory resources to implement the algorithm. In order to accurately and efficiently achieve object detection tasks using CNN on a system with limited resources such as a mobile device, we propose an innovative type of DenseNet, which is a lightweight convolutional neural network algorithm called Lite Asymmetric DenseNet (LADenseNet). Aiming to compress the CNN model complexity, we replace the 7 x 7 convolution and 3 x 3 max-pool with multiple 3 x 3 convolutions and a 2 x 2 max-pool in the initial down-sampling process to significantly reduce the computing cost. In the design of the dense blocks, channel splitting and channel shuffling are employed to enhance the information exchange of feature maps and improve the expressive ability of the network. We decompose the 3 x 3 convolution in the dense block into a combination of 3 x 1 and 1 x 3 convolutions, which can speed up the computations and extract more spatial features by using asymmetric convolutions. To evaluate the performance of the proposed approach we develop an experimental system in which LA-DenseNet is used to extract features and Single Shot MultiBox Detector (SSD) is used to detect objects. With VOC2007+12 as training and testing datasets, our model achieves comparable detection accuracy as YOLOv2 with a fraction of its computational cost and memory usage.
The developments of generative adversarial networks (GANs) make it possible to fill missing regions in broken images with convincing details. However, many existing approaches fail to keep the inpainted content and structures consistent with their surroundings. In this paper, we propose a GAN-based inpainting model which can restore the semantic damaged images visually reasonable and coherent. In our model, the generative network has an autoencoder frame and the discriminator network is a CNN classifier. Different from the classic autoencoder, we design a novel bottleneck layer in the middle of the autoencoder which is comprised of four dense-net blocks and each block contains vanilla convolution layers and dilated convolution layers. The kernels of dilated convolution are spread out and result in an effective enlargement of the receptive field. Thus the model can capture more widely semantic information to ensure the consistency of inpainted images. Furthermore, the multiplex of different level’s features in each dense-net block can help the model understand the whole image better to produce a convincing image. We evaluate our model over the public datasets CelebA and Stanford Cars with random position masks of different ratios. The effectiveness of our model is verified by qualitative and quantitative experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.