Aiming at the problem of cellular lung CT image segmentation difficulty caused by complex backgrounds, different shapes, and blurred tissue boundaries, an ARIFNet model with a codec structure is proposed. Firstly, depthwise separable convolution with fewer parameters is used as the decoder to reduce the complexity of the model and improve the segmentation efficiency. Second, a reweighted jump connection module RWC acting on jump connections is constructed to adjust the pixel weights according to the spatial locations of the features to suppress the expression of irrelevant information in the high-level features. Finally, the multi-scale information fusion module MIF is added to fuse multi-scale information to provide richer contextual information. Experiments showed that the ARIFNet model could segment the lung lesion region more accurately, and the segmentation indexes IoU, mIoU, Dice, and mDice reached 89.75%, 94.76%, 94.09%, and 96.99%, respectively.
Aiming at the problem that the image is defaced with more noise and complex features that make the target difficult to be segmented accurately, the traditional CNN target segmentation method is difficult to fully extract the detail information. Based on this, a segmentation network based on improved MaskFormer is proposed in this paper. The lightweight mask attention mechanism is used instead of the Transformer decoding attention mechanism, and the foreground region of each prediction mask is constrained by cross-attention, which reduces the interference of the smeared region and enhances the extraction of local features; GAMS, a multi-scale feature fusion module based on the gating mechanism, is used to carve out the semantic information of the image at different scales, which improves the feature discriminative ability of the model; the exchange of the self-attention and the lightweight mask attention order to reduce the network computation and improve the model training efficiency. The optimal values of evaluation indexes such as MIoU, ssMIoU and msMIoU are obtained in the segmentation experiments on ADE20K and COCO-Stuff-10k datasets after noise defacement treatment, and the improved MaskFormer segmentation performance is better compared with other networks.
An X-ray security image detection model incorporating a multi-scale fusion module is proposed to address the problem of low accuracy in detecting threat objects in X-ray images against complex backgrounds. The model adds a multichannel fusion convolution block after the Neck layer to perform adaptive feature fusion and refinement on the input image, effectively improve the description of global information and boundary attributes of x-ray threat objects to improve the precision of detecting and identifying threat objects. SIoU is chosen to replace CIoU as the loss function of border regression, which redefines the penalty index and reduces the total degrees of freedom of loss to achieve the high accuracy localization. The model can effectively detect five different categories of dangerous goods on the Tianchi dataset, and the mAP value for dangerous goods detection is 92.7%, which is 2.1% higher than YOLOv5s, can satisfy the real-time recognition and detection requirements with high accuracy, good robustness and speed.
KEYWORDS: Convolution, Optical character recognition, Education and training, Feature extraction, Data modeling, Performance modeling, Overfitting, Matrices, Deep learning, Visual process modeling
Handwriting Chinese Character Recognition (HCCR) is the foundation of document digitization. It is a challenging subject in the field of image classification and recognition for a series of reasons such as the large number of Chinese characters, the diversification of writing style and numerous similar characters. To solve the above problems, this paper designs a four-channel convolution recognition model based on MobileNetV2. First, the input image is sent to four-channel convolution with different receptive fields, and feature maps of different scales are extracted respectively to improve the accuracy of the model. Then the feature maps are combined to enrich the diversity of features. Afterwards, the combined features are weighted by SE Block, and more useful feature maps are screened by this means to accelerate the model convergence. Finally, the lightweight network mobilenetv2 is used to classify the weighted features. The experimental results show that the recognition accuracy of the four-channel convolution recognition model based on mobilenetv2 on the offline handwritten Chinese character set CASIA-HWDB1.1 has reached 96.05%, and the convergence speed of the model is extremely fast. Also, the memory occupation and parameter quantity are far lower than other Chinese handwriting character recognition models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.