PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11720, including the Title Page, Copyright information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Given that the infrared radiation energy of false alarm sources such as rivers, high-altitude cirrus clouds and icy lakes is similar to that of the infrared targets, the false alarm sources seriously affect the accurate detection and tracking of the target by Infrared Search and Track (IRST) system. To reduce the false alarm rate in infrared target detection, a pixellevel algorithm is proposed in this paper to detect and eliminate false alarm sources. Firstly, we use histogram equalization and gray stretch to enhance the infrared image. The next step is to locate the false alarm source based on Local Neighborhood Intensity Pattern (LNIP) and local probability distribution, which is critical in false alarm source detection. Finally, we use region growing based on fused saliency feature to strengthen the accuracy of the location. Numerous experimental results manifest that the algorithm proposed in this paper boasts a detection rate of more than 96% for multiple false alarm sources, and performs better on the Precision-Recall (PR) curve and the Receiver Operating Characteristic (ROC) curve among the algorithms based on texture analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a real-time people detection system based on time-of-flight (TOF) depth cameras, which monitors the flow of people in public places, such as subway entrances and exits and shopping mall passages. The proposed system mainly includes preprocessing, contour recognition, Neural Networks recognition, tracking and counting. It makes full use of the top-view depth information, avoids the problem of strabismus, and reduces the amount of calculation. At the same time, compared with the contour template matching algorithm, the accuracy is improved. This algorithm can improve the calculation speed while ensuring accuracy and robustness. Experiments show that the proposed system can run on the CPU platform at a speed of 20ms per frame. It also achieves high-precision head detection and counting, and the accuracy rates of single person and double person can reach 100% and the accuracy rates of the multi-person can reach 97%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The task of Object Detection is to find all the objects of interest in the image and determine their categories and locations. The dominant deep learning-based object detection methods usually regard objects as isolated individuals and ignore the relationship between the objects, which limits the accuracy of the object detection model. There is some work that attaches relationships between categories to candidate proposal regions and proves that the relationship improves the accuracy of object detection, but these methods are all operations of the feature map. In this paper, we propose a correlation complement (CC) module that combines the class representation vector with the relationships between categories in the dataset. Experimental results on multiple object detection datasets prove the effectiveness of our module. In addition, this model is extensible and can be added to other one-stage object detection methods. The task of Object Detection is to find all the objects of interest in the image and determine their categories and locations. The dominant deep learning-based object detection methods usually regard objects as isolated individuals and ignore the relationship between the objects, which limits the accuracy of the object detection model. There is some work that attaches relationships between categories to candidate proposal regions and proves that the relationship improves the accuracy of object detection, but these methods are all operations of the feature map. In this paper, we propose a correlation complement (CC) module that combines the class representation vector with the relationships between categories in the dataset. Experimental results on multiple object detection datasets prove the effectiveness of our module. In addition, this model is extensible and can be added to other one-stage object detection methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
By the end of 2019, the total number of motor vehicles in China had reached 340 million, of which 250 million were cars. In order to reduce the occurrence of road traffic accidents, more and more automobile manufacturers have begun to develop vehicle-mounted image-assisted driving technology based on dashcam. The most important part of the collision avoidance system is its visual recognition and ranging. Therefore, it is of great significance to study the range finder system based on vision. This paper builds a 3D simulation simulator based on Unreal Engine and uses specific data sets to test and verify various classic algorithms of vision-based identification and ranging. According to different operating conditions, the advantages and disadvantages of each algorithm are listed and summarized, and the limitations of their application are pointed out. As for the detection part, this paper conducts experimental verification on SVM+HOG and LBP+Cascade detection algorithms to obtain their recognition accuracy, positioning accuracy and calculation time information, and analyzes their advantages and disadvantages. As for the ranging part, this paper conducts ranging tests on monocular and binocular ranging algorithms, obtains ranging errors at different positions and analyzes the advantages and disadvantages of different algorithms. Experimental results show that the monocular distance requires a more accurate identification frame than the binocular distance measurement. Binocular distance measurement can measure the distance information of the target more quickly and accurately when the identification frame is very small.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to most of the current algorithms use stacked RGB information for spatial-temporal action detection, the time sequence information is easily lost in the process of convolution and down-sampling, which makes it difficult to model spatial-temporal action and limits the development of action detection. Given the current advanced pose estimation algorithm that has achieved good detection accuracy, we propose an end-to-end network that fuses RGB with skeleton to solve the problem of spatial-temporal action detection. We use RGB to describe the appearance information of object and skeleton to describe the action information. Specifically, in the first part, we generate the initial classification and location proposals based on RGB information by the SSD network. Secondly, we generate frame-level skeleton information by the current advanced pose estimation algorithm, the skeleton helps the SSD network to filter negative samples during training, and then we stack the skeleton after completion and normalization, put it into LSTM network for classification. Finally, we fuse the outputs of the SSD network and LSTM network. We believe that the introduction of skeleton information can effectively address the problem of the insufficient capacity of RGB information for spatialtemporal action modeling. It is worth noting that our skeleton information is based on advanced attitude estimation algorithms rather than annotated. For the datasets, we select the single-person action videos in UCF101 and UCF50. The final experimental results show that our method can significantly improve the action modeling ability of the neural network, and show effective results in action detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
YOLO is a milestone algorithm of object detection, which is the first One-stage detector in deep learning era. In spite of its great improvement of detection speed, the detection accuracy is somewhat insufficient, especially for small targets. In this paper, U-shaped module based on YOLOv4 (U-YOLO) is proposed. First, multi-level features extracted by CSPDarknet using Feature Pyramid Network (FPN) are fused. Then, the fused features is fed into multiple U-shaped modules. Finally, feature maps consisting of the features from different U-shaped modules are gathered up to construct a feature pyramid for object detection. Experiment shows that the U-shaped module can improve the accuracy of YOLOv4.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Infrared image from nighttime cattle farm has low contrast, blurred visual effect and unclear details. We proposed a method based on dark channel prior as well as piecewise linear stretch to enhance infrared image. Improvement of image quality contributes to manually annotating images more accurately when preparing the dataset. The results of image enhancement are compared with other methods to evaluate the performance. Further more, we verify its performance on nighttime cattle detection based on YOLOv4. We get appropriate prior anchor boxes for this work by K-means clustering on cattle image dataset. YOLOv4 models of cattle detection are trained with datasets of original images and processed images. A total of 1400 cattle images from different scenes have been collected from surveillance videos as a dataset for experiment. The average precision (AP) of cattle detection is more than 95%. Compared to control groups, the APs from enhanced images are 0.64% and 0.70% higher. Experimental results show that image enhancement can improve the accuracy of nighttime cattle detection based on YOLOv4
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem of low accuracy of ship detection in SAR images, we propose an improved detection method based on RetinaNet. This method introduces channel-wise attention mechanism into the backbone feature extraction network, in order to automatically obtains the importance of each feature channel by means of learning, and then enhances the useful features according to this importance and restrains the features that are not of much use to the detection task. In order to improve the capability of multi-scale detection, this method also introduces an efficient weighted bidirectional feature fusion network—BiFPN, which adjusts the proportion of each feature by learning the importance of features of different scales. In addition, we propose a training method to expand the complex background samples in the data set to improve the classification performance of the network to the targets and complex background. Training and testing with open SAR image ship detection datasets, the detection results show that this method can significantly improve the precision and recall rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In infrared small target detection tasks, targets usually occupy very few pixels and present as local bright spots, lacking prior knowledge such as shape and speed. In response to the above problems, a temporal low-rank and sparse decomposition and spatio-temporal continuity detection algorithm, names as TLRSD-STC, is proposed to detect small targets and eliminate false alarm targets. The proposed algorithm firstly expands the sequence images in time domain. The preliminary separation of small targets and background is achieved through low-rank and sparse decomposition, and target prediction maps can be obtained. Subsequently, targets and noise are further separated by an improved pipeline filter to obtain the final detection image. The proposed algorithm is validated on three sequence images containing complex scenes. Experimental results demonstrate that the algorithm has a higher detection rate and lower false alarm rate than other algorithms in complex scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, an EIST method based on improved electrical impedance tomography (EIT) is proposed to detect the distribution of the complex in the sensor. According to the impedance of different objects at different frequencies, each test object is imaged separately, and finally a clear and accurate image of the distribution of all components was combined by the algorithm. A simulation model for testing EIST is created. In the simulation result, the synthesis reconstruction image is combined by the distributions of different objects which are reconstructed at each frequency, the frequency of each object has the max difference of impedance to the medium. The simulation reveals EIST has the feasibility to detect the multicomponent in the sensor. In the end, the distribution of banana and carrot in water were detected by experiment, and the experimental results showed that this method could effectively detect the distribution of different substances.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Research on video bleeding has become a hot topic in the computer-assisted surgery. We do further research to directly detect the bleeding point. Locating bleeding point can help surgeons stop bleeding quickly. We propose a mixed RCNN model based on faster RCNN for bleeding point detection in the laparoscopic surgery videos. And we make three contributions: (1) we propose an idea for hemostasis support system which can more directly assist the surgeons. (2) we show the blood’s optical flow to improve bleeding point detection. (3) both arterial and venous bleeding can be detected. Experimental results on our laparoscopic surgery video datasets show that our approach performs very well in the bleeding point location and recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In passenger flow statistics, camera shoots from top to bottom. This paper put forward to a method,which calculates the disparity of targets region. Firstly, the left view is used to detect all possible head targets, and then the target and its surrounding pixels are selected as the matching points. According to some matching constraints, it is used to match with the right view, and the disparity of the target and its surrounding pixels is calculated. Taking the target and its surrounding pixels as a whole, it overcomes the disadvantage of weak texture in the head area, improves the matching accuracy and improves the accuracy of disparity calculation. According to the disparity threshold, the number of false heads is greatly reduced and the recognition accuracy is improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ship detection technology is an important development direction in the field of optical remote sensing image processing. In recent years, convolutional neural networks have achieved good results in ship target detection and recognition. We train the latest model YOLOv5 on our dataset in this paper. The results show that YOLOv5 can be well applied in the field of ship detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For underwater videos, the effects of object tracking becomes more challenging since videos taken are greatly influenced by blurry background, illumination condition and occlusion. Hence, this paper explores an effective approach for underwater multiple object tracking where the main advantage lies in associating objects effectively for online and real-time applications. It is a remarkable fact that detection quality is a key factor for tracking effects, where changing the detector improves the result of tracking. In the process, the whole model consists of detection, Kalman Filter and Hungarian Algorithm. While the detection method applied is yolov3 as it has an excellent performance in terms of speed and accuracy. Consequently, yolov3 can be perfectly embedded in the model. The model embraces an impressive tracking result on underwater dataset. Tracking results have been analyzed using Multiple Object Tracking Accuracy (MOTA) and Multiple Object Tracking Precision (MOTP).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multiple vehicles tracking in the aerial video have a strong effect on the intelligent transportation system. Therefore, a fast multiple vehicles tracking method is required to meet the demand of a higher energy conversion efficiency. However, frequent and long-term occlusions in complex traffic scenes make it more difficult, and identity switches of vehicles with a similar appearance also bring challenges. Currently, vehicles tracking methods based on detection shows satisfactory performance. In the tracking-after-detection methods, data association plays the key role, and it includes two types, namely, frame-by-frame association and multi-frame association. Frame-by-frame association refers to the association between detections in the two consecutive frames, but tracking drift or failure is likely to occur when vehicles are blocked or not detected. The multi-frame association establishes a relational model by using object detection information of multiple frames instead of the previous two frames, which can effectively reduce the vehicle error association and deal with occlusions. However, the tracking will also be interrupted if the occlusion time is too long to associate the detection points before and after. Therefore, online multiple vehicles tracking in the aerial video based on fast incremental discriminative appearance learning is put forward. A fast incremental discriminative appearance learning (FIDAL) is introduced to discriminate the appearances of vehicles and adaptively update the vehicle appearance models based on the difference value between the new sample and the mean of vehicle samples to address the problem of identity switches. Experimental results on video sequences from different data sets demonstrate an average 25 percent performance improvement when using fast incremental discriminative appearance learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Correlation filter-based trackers exploit large numbers of cyclically shifted samples to train the object classifier, which can achieve good results in tracking accuracy and speed. However, when in complex scenes such as occlusion or deformation, tracking drift or loss will occur. In this paper, a kernel correlation filter tracker base on scale adaptive and occlusion detection is proposed to strengthen the tracker robustness. Firstly, a robust appearance model combine the gradient feature and color feature is proposed to enhance the features representation ability; Secondly, a scale adaptive mechanism is introduced to handle the problem of the fixed template size, and the Newton method is used to find the maximum response value to more accurate predict the center position of the target and estimate the target scale; Finally, the occlusion detection scheme adopted when update model to avoid tracking failure due to appearance model pollution. Experiments are performed on the OTB2013 Benchmark Dataset, the results show that, compared to the basic tracker, we obtain an absolute gain of 6.6% and 13.4% respectively in mean distance precision and mean overlap precision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In long-term visual object tracking, the tracking model would be prone to drift or corruption and the tracker can hardly catch the target again after tracking failures. A set of novel strategies for long-term tracking is proposed to solve these problems. First, a simple and efficient method is proposed to calculate the tracking confidence of Staple, a well-known tracker based on correlation filers. A model update mechanism is then developed to prevent model corruption. Furthermore, an online Support Vector Machine (SVM) classifier is trained to re-detect the object in case of unreliable tracking result. By means of intermittent sampling in the re-detection stage, the computational efficiency and the re-detection reliability are greatly improved. The combination of these new components in multi-stages spawns a real-time, accurate and robust tracker for long-term video. Experimental results demonstrate that our tracker, operating at a speed of 30 FPS, performs superiorly against some competitive trackers on robustness and accuracy, especially when the target encounters occlusion, severe deformation and out-of-view.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual object tracking is one of the popular research topics in computer vision. It has a wide range of application scenarios. Although recent approaches based on siamese network have achieved good performance, similar interference and non-real-time speed are still very challenging problems. In this paper, an online Patch Filter Network (OPFNet) is proposed, the online patch filters learned from the target can introduce the local detailed features and avoid the interference of similar objects. In addition, in order to enhance the generalization ability of the tracker trained with small scale dataset, an image mix-up method for augmentation is proposed during offline training process. These improvements are proved to be effective by experiments and can be applied to existed siamese tracking methods
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the occlusion problem in the target tracking algorithm FDSST (Fast Discriminative Scale Space Tracking), this paper designs an improved SSDA (Sequential Similarity Detection Algorithms)-based FDSST anti-occlusion algorithm. The algorithm mainly improves the model update strategy of FDSST, the main processes are as follows: Firstly, judge whether the target has occlusion according to the oscillation degree of the correlation filter response graph; then, when there is occlusion, the SSDA algorithm is applied to re-detect the target according to the search strategy, to restore the occluded target and re-track the target. Our method was transplanted to onboard Jetson TX2, and the actual test was carried out on the public dataset OTB50 and the aerial video dataset respectively. The experimental results show that the proposed algorithm retains the advantages of FDSST and improves its anti-occlusion ability effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Correlation filter based tracking algorithms have recently shown favorable performance in terms of high frame rates. However, a significant problem is that the context information is not be fully used which can result in model drift under challenging situations, such as fast motion and occlusion. In this paper, we propose an adaptive context-aware correlation framework which can improve the discriminative power and detect target within a large neighborhood. Firstly, we construct a context-aware correlation filter model and a peak extraction method is proposed to select the context patches adaptively, which can be regarded as hard negative samples mining. Secondly, a simple yet effective multi-region detection strategy is proposed to improve the anti-occlusion ability and prevent model drift. Thirdly, we adopt high-confidence model update method to avoid model corruption. We integrate the proposed framework with the existing DCF tracker, experimental results show that the proposed framework improves the accuracy by 9.1% and the success rate by 7.1%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning has revolutionized every field of computer vision, including single image super-resolution because of its remarkable performance pertaining to effectiveness and efficiency. Various recent methods try to predict the SR image by incorporating different prior knowledge of images. In this paper, we propose a new method that utilizes not only internal image features but also multi-level edge prior knowledge with richer information. Holding the intuition that edge information helps to deal with blurry edges and try to generate sharper results, we present a residual edge and channel attention super-resolution network to handle LR images, named RECAN. Our architecture consists of two basic modules: the first module is EdgeNet, which generates multi-level edge maps from the input image; and the second module takes advantage of significant information in input image along with edge maps, called SRNet. Specifically, the SRNet uses channel attention technique and spatial feature transform (SFT) layers to super-resolve an image. Qualitative and quantitative comparisons are presented with state-of-the-art methods, which show promising results of our method together with improved image quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Random Sample Consensus (RANSAC) algorithm is an iterative method to estimate the parameters of the model in a data set that includes outliers. It’s an uncertain algorithm with a certain probability to get a reasonable result. To increase the probability when the proportion of outliers in the sample data set is unchanged, the number of iterations needs to be increased. In application, too many iterations will have a great negative impact on the speed of algorithm execution. This paper presents a deterministic algorithm in 2D feature point matching, which solves the defect that the number of iterations of RANSAC algorithm is uncontrollable. Firstly, the similarity of feature point descriptors is used to calculate the unidirectional optimal matching (UNOM) point pair of the registered and input images. Secondly, the bidirectional optimal matching (BIOM) is calculated based on the two UNOM sets. Of course the BIOM set includes outliers. Then, calculate adjacency matrix and connected subgraph between BIOM point pairs. Eliminat the unsuitable point pairs in the connected subgraph and the remaining point pairs are considered to be correct data. Finally, use these point pairs to calculate the homography matrix with the least square method and make global accurate matching. Results of experiment show that the algorithm in this paper can get better matching results in a significantly shorter time, since it can directly calculate suitable point pairs without iterative selection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image local feature descriptors are widely used in various classification and matching scenarios. Traditional local descriptors only extract features at specific scale. In order to improve the performance, some methods extract multi-scale and multi-angular features and construct descriptors by generalization, accumulation or ranking. However, the information of descriptors on bins and scales is not fully utilized, so there is a certain improvement space. To solve this problem, a method of constructing local descriptor based on two directions reconstruction transformation is proposed. Firstly, the stability of bins under different scale pairs is calculated, and then the stability of each bin relative to other bins is calculated. Then the cumulative gradient migration in scales direction is calculated, and the relative stability is combined into the score of a scale pair. The sum of the scores of all scale pairs of a bin constitutes the total score of the bin and eventually forms the vector of the descriptor. Experiments on two general datasets show that the accuracy of the proposed descriptor is improved without expanding the dimension.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The current 3D point cloud feature extraction algorithms are mostly based on geometric features of points. And the distribution of feature points is so messy to accurately locate. This paper proposes a point cloud feature extraction algorithm using 2D-3D transformation. By selecting three pairs of 2D image and 3D point cloud feature points, the conversion matrix of image and point cloud coordinates is calculated to establish a mapping relationship and then we realize the extraction of point cloud features. Experimental results show that compared with other algorithms, the algorithm proposed in this paper can extract the detailed features of point cloud more accurately.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Widely used in 3D modeling, reverse engineering and other fields, point cloud registration aims to find the translation and rotation matrix between two point clouds obtained from different perspectives, and thus correctly match the two point clouds. As the most common point cloud registration method, ICP algorithm, however, requires a good initial value, not too large transformation between the two point clouds, and also not too much occlusion; otherwise, the iteration would fail to converge to a correct result. To solve this problem, this paper proposes an ICP matching algorithm based on the local features of point clouds. With this algorithm, a robust and efficient three-dimensional local feature descriptor is firstly designed by combining the density, curvature, and normal information of the point clouds, then based on the feature description, the correspondence between the point clouds and also the initial registration result are found, and finally, the aforementioned result is used as the initial value of ICP to achieve fine tuning of the registration result. The experimental results on public data sets show that the ICP algorithm boasts good registration precision and robustness, and a fast running speed as well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The fusion of multiple complementary features can effectively improve the performance of texture image retrieval. In this paper, a new texture image retrieval method based on spatial domain and dual tree complex wavelet transform (DT CWT) domain is proposed. For obtaining the local features of texture images, the local binary pattern (LBP) histogram is calculated in the spatial domain, and the LBP histogram of the magnitude subband and the local tetra pattern (LTrP) histogram of the relative phase subband are respectively calculated in the transform domain. Then in the transform domain, the energy of the approximate subband is computed, and the gamma distribution model for the magnitude subband and the von Mises distribution model for the relative phase subband are carried out, and the obtained energy and the estimated model parameters are taken as the global features of texture images. Finally, the relative L1 distance is used as the similarity measurement for the local features, and the normalized Euclidean distance and the KullbackLeibler (K-L) distance with closed form are used as the similarity measurements for energy feature and distribution parameter features, respectively. Experimental results on VisTex and Brodatz databases show that, compared with the existing best methods, the proposed method achieves higher average retrieval rates with 90.72% and 84.12% respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Facial makeup transfer is a hot research field in computer vision, aiming to transfer the reference face's makeup style to the non-makeup face. In the existing research, we can use to combat the loss to keep the identity information of the face consistent before and after the makeup transfer, but because the input face sometimes has a massive deflection and expression action, which seriously affects the effect of makeup transfer. This paper proposed a face makeup transfer framework based on multi-scale feature loss. Our model is composed of a generator, discriminator, and multi-scale discriminator. The reference makeup and the non-makeup face are input into the generator at the same time. The generator will output the non-makeup face after makeup, which contains the input non-makeup face's identity information and the reference makeup style. To enhance the robustness of the makeup result on the network and improve the effect of makeup on the network, the output makeup face and the input non-makeup face are input into the multiscale discriminator to calculate the feature loss. The multi-scale discriminator takes the pixel multiplication of the semantic segmentation image of the makeup face and the non-makeup face as the input 1 of the multi-scale discriminator, and the pixel multiplication of the non-makeup face and its semantic segmentation as the input 2 of the multi-scale discriminator, and then calculates the feature loss of the two inputs after passing through the multi-scale discriminator. In the calculation process, the feature loss can constrain the pixel differences of different semantic segmentation, which can restrain the shadow and makeup overflow caused by angle deflection and facial expression in makeup transfer. The experimental results show that this paper's experimental method achieves a better effect of makeup transfer than the existing experimental methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The conversion from SDR contents to HDR version, termed inverse Tone Mapping (iTM), is substantially a non- linear mapping problem. The neural network provides the potential of learning this kind of non-linear mapping in an end-to-end way. This paper proposes a Generative Adversarial Network (GAN) with an aim to reconstruct an HDR image from a single-exposure. Unlike previous work that adopts a U-net as generator, the proposed GAN structures the generator using three branches to extract the global level, regional level, and local level details of an image for further fusion. Our discriminator adopts a slim architecture, which successfully solves the conventional color excursion problem at a low cost. Moreover, to train the proposed GAN effectively, we design a mixed loss function where the pixel-wise color is incorporated. Experimental results demonstrate that the proposed GAN scheme achieves state-of-the-art performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural networks have recently demonstrated high-quality reconstruction of single image dehazing. However, existing methods seldom consider the relationship of haze concentration and image depth. In this paper, we propose an end-to-end single image dehazing network called Progressive Guidance Dehazing Network (PGDN), which gradually recovers the clear image from the shallow to deep areas of its hazy image. Our network consists of progressive dehazing blocks, each of which is followed by a guided filtering layer to reinforce the result. Additionally, deep supervisions are added before and after each guided filtering layer, and the supervisions before the guided filtering layers guarantee that the dehazing blocks further incorporate the mutual content information. Experiments on both synthetic and real-world dataset show that our network achieves superior performance over existing methods in quantitative and qualitative evaluations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pose estimation is a fundamental task in the field of computer vision. It contains a huge variety of sub-tasks, including 3D pose estimation, pose tracking, etc. In this paper, we propose a novel algorithm of pose estimation to determine whether it is correct of doing certain physical exercises by single-person pose-tracking and key frame extraction method. First, we use the proposed tracking compensation method to refine the output of pose estimation network. Second, we define different angles composed of human key joints as action angles to determine the standards of physical exercises. Therefore, we can have a personal physical exercise assistant and do physical exercises even without professional personal trainer around. Experiments show that our real-time method can achieve 92.0% accuracy on two kinds of physical exercise actions. It can be adapted to different applications with important significance in theory and practice.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With image generation and manipulation as one of impressive progress of convolutional neural networks (CNNs), facial image synthesis methods, e.g., DeepFakes, pose serious challenges to social and personal security. Specifically, we find that (1) CNN-based synthesized facial image detection methods generally fail to identify synthesized images generated by other synthesis methods; (2) classical detection methods exploiting one-class support vector machines (SVMs) and traditional features of video clips fail when only one image is available. In view of the above challenges, we propose and experimentally verify a method combining CNNs features and one-class SVMs, which not only effectively detects synthesized facial images generated by different methods, but also has good robustness to the variances of the scene content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Conventional intra coding is retained in depth map prediction while depth modeling mode is introduced. Depth map intra coding has a heavy computational burden. Depth modeling mode can be omitted based on DC or Planar mode with the minimum sum of absolutely transform distortion. According to the relationship between angle mode and wedgelet orientation, the wedgelet direction is roughly selected. We make use of boundary gradient method to choose wedgelet pattern segmentation. The experimental results introduce that synthetic viewpoint bit rate increases by 0.62% and the encoding time decreases by 30.1% compared with HTM16.0. Compared with other algorithms, the time saving is slightly higher than other algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The denoising of InSAR phase images with fringes of great density variety and high noise level is particularly challenging. In this paper, an efficient technique based on variational image decomposition is proposed to remove noise from an InSAR phase image. We propose a new image decomposition model BL-Hilbert-BM3D to decompose an InSAR phase image into three components: fringes of low density, fringes of high density and noise. They are described by Beppo Levi space (BL), Hilbert space and Block Matching and 3D function space (BM3D) respectively. So our model is able to sufficiently smooth fringes of low density as well as perfectly preserve fringes of high density. We test the proposed method on a simulated and an actual InSAR phase images. We compare the results yielded by our method with those by four other widely used and well-known methods in terms of both quantitative evaluation and visual quality. The experimental results have demonstrated the validity of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a remote sensing image semantic segmentation model based on dual attention and multi-scale feature fusion to solve the problems of objects scale differences and missing small objects. This model uses ResNet50 in the coding part to extract features. First of all, the output features of each stage of ResNet50 are introduced into the pyramid pooling module, making full use of the multi-scale context information of the image to cope with the change of the object scales. Secondly, the dual attention is introduced in the final output features of ResNet50 to establish the semantic relationship between the spatial and channel dimensions, which enhances the ability of feature representation and improve the condition that small targets are difficult to segment. Finally, starting from the output features of the attention module, the features of all levels are gradually integrated to complete decoding to refine the target segmentation edge. The designed comparative experiments results show the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation on medical Computed Tomography (CT) images is of great significance to research and clinical diagnosis. However, methods based on neural network have competitive advantages for segmentation of dental CT images. In this paper, a 3D multi-feature fusion method for tooth segmentation is proposed. In order to obtain the body space of the data, first of all, the dental CT training set is compressed in NII format, and the body space data is processed; then the proposed 3D convolution network is used to train the data, extract the feature vectors, and obtain the probability distribution; to handle the situation that 3D neural network always leads to fuzzy boundary and unclear topology, the new CRF algorithm is used to refine the probability distribution which removes the redundant information generated by the neural network model, and makes the segmentation results more accurate. Compared with diverse contemporary segmentation algorithms, the effectiveness and superiority of our proposed method are verified, proving the conclusion that the supervision mechanism, neural network model components, and optimization proposed methods can improve the accuracy of tooth segmentation is reliable and valid.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer-Aided Diagnosis (CAD) benefits from its early diagnosis and accurate treatment of lung diseases. Accurate segmentation of lung fields is an important component in CAD for lung health, which facilitates subsequent analysis. However, most of the existing algorithms for lung fields segmentation are unable to ensure appearance and spatial consistency due to the varied boundaries and poor contrasts. In this paper, we propose a novel and hybrid method for lung fields segmentation by integrating Dense-U-Net network and a fully connected conditional random field (CRF). In order to realize the reuse of image features, the structure of densely-connected is added to the decoder, which ensures the object with varied shapes and sizes can be extracted without adding more parameters. To make full use of the mutual information among pixels of the original image, a fully connected CRF algorithm is adopted to further optimize the preliminary segmentation results according to the intensity and position of each pixel. Compared with some previous popular methods on JSRT dataset, the proposed method in this paper shows higher Jaccard index and Dice-Coefficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The geographic atrophy (GA) caused by retinal layer atrophy is an important clinical manifestation of age- related macular disease (AMD). Automatic segmentation for GA in spectral-domain optical coherence tomography (SD-OCT) images is a challenging task. In this paper, we proposed a multi-loss convolutional neural network for GA automatic segmentation based on patient independent. Firstly, to overcome the shortness of samples in medical image processing, the proposed method augmented the samples with samples reversing. Then the model used multi-path block structure to replace single structure of classical CNN to enrich the diversity of features. And the multi-path block loss, cross entropy, and center loss were adapted to supervise and optimize the network effectively, thus it can force the network to learn more representative features. Finally, two data sets were used to evaluate the performance of the model, it demonstrated that the result has a high overlap ratio, correlation coefficient and low absolute area difference. The average overlap ratios on two data sets are 81.88% and 66.86% respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Panoptic segmentation is an important method for UAV platforms to implement road condition monitoring and urban planning. In recent years, the panoptic segmentation technology provides more comprehensive information than the current semantic segmentation technology. In this paper, the framework of the panoptic segmentation algorithm is designed for the UAV application scenario. Due to the large target scene and small target of UAV, resulting in the lack of foreground targets in the segmentation results and the poor quality of the segmentation mask. To solve these problems, this paper introduces deformable convolution in the feature extraction network to improve the ability of network feature extraction. In addition, the MaskIoU module is introduced in the instance segmentation branch to improve the overall quality of the foreground target mask. In this paper, a series of data are collected by UAV and organized into UAV_OUC panoptic segmentation dataset. We tested on the UAV_OUC panoptic segmentation dataset. The experimental results on UAV_OUC panoptic benchmark validate the effectiveness of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation is a technique for classifying images pixel by pixel. The road surface cracks are difficult to extract with traditional method because they are exposed to more environmental factors such as light and more interference noise. This paper proposes a road surface crack detection technology based on RU-Net. The impact of environmental factors is effectively reduced by this network, and it realizes the classification of cracks from end to end and pixel by pixel. The network mainly includes two parts which are encoder and decoder. The encoder part is mainly used for feature extraction, and the decoder part is mainly used to recover spatial information. The results of the experiment show that the RU-Net achieves an accuracy of more than 98% and an MIoU of more than 73%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low-light images are inappropriate for human observation and computer processing. Multiple enhancement techniques have been proposed but most of them are based on synthetic datasets. In this paper, an image enhancement method using deep convolutional neural network is introduced. The model is expert at merging multi-scale information and it is flexible for image of arbitrary size without pre- or post-processing. Different from previous work, we deliberately address those extreme low-light images which cannot be identified by human eyes. For doing so, a Real Extreme Low-Light Images (RELLI) dataset is collected to validate the method, and will be contributed to related research. We also analyze some factors that affect the model’s performance to achieve the best results. Furthermore, the generalization ability beyond RELLI dataset and the superiority over several state-of-the-art methods are also confirmed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a structure-based low-rank Retinex model for simultaneous low-light image enhancement and noise removal. Based on the traditional variational-based Retinex framework, in the proposed model, a smooth prior is forced on the illumination, and a gradient fidelity term and the weighted nuclear norm are used to suppress noise and enhance structural details in the reflectance. By considering that the manifold structure similarity is more effective than intensity similarity in describing the structural features of image patches, we further propose to use the manifold structure similarity in image patch grouping. Then, an alternating direction minimization algorithm is used to solve the reflectance estimating model. The entire process for solving the proposed model uses a sequential optimization. The final enhancement results is obtained by combining the reflectance and the Gamma corrected illumination. Experiment show that, the proposed method can simultaneously enhance and denoise the low-light image, and produce better or comparable results compared with the state-of-the-art methods
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the existing one-factor cancelable biometric template protection scheme, the hashing function used in the transformation of biometrics can’t preserve the original biometric features, which leads to low recognition rate. To make full use of biometric features by replication and extension, but too long feature vectors can cause low computational efficiency. Therefore, a one-factor cancelable fingerprint template protection based on feature enhanced hashing is proposed. Firstly, the extended binary biometric vectors are combined by sliding and extracting window, then converted into decimal system, in order to make full use of biometric features and increase non-invertibility. Secondly, the permutation factor is calculated by the feature enhanced hashing function and the random sequence is reordered, it can embed the information of the original biometric features into the random sequence better. Finally, a cancelable template is generated by reducing the same length of the first and last of reordered random sequence, in this way, some elements can be deleted to improve the computational efficiency and non-invertibility. The experimental results show that the recognition rate of the algorithm is improved on FVC2002 and FVC2004 fingerprint databases, which meets the design standards of cancelable biometric recognition and can defend against security attacks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The underwater image quality is usually damaged due to the selective absorption of sea water and the scattering effect of particles, which leads to image distortion and will reduces the accuracy and efficiency of subsequent vision tasks. To solve the above problems, an underwater single image enhancement method based on latent low-rank decomposition and image fusion is proposed. First, a color correction method based on channel compensation is introduced to remove color cast. Second, an improved Laplace sharpening method and a gamma correction technology are applied to effectively improve the sharpness and contrast of the image. Then, the latent low-rank representation is utilized to decompose the obtained image. Finally, a dual-image weighted image fusion strategy is proposed to integrate the enhanced image. The experimental results show that the method can obtain better results than the traditional methods on both qualitative and quantitative analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To deal with the problem of low foreground brightness in the backlit image, a novel backlit image enhancement method is proposed in this paper. First, K nearest neighbors (KNN) matting is used on the original images. By dividing the lower-brightness part into the foreground and marking the contour as the unknown area, the trimap of the image is defined, and the opacity of each pixel is calculated in the marked foreground. Second, logarithmic transformation method is used to enhance the extracted foreground image, and the brightness can be improved to different degrees by setting the base number. Finally, the original image is combined with the enhanced foreground part to obtain the synthesized image, which is more natural and the phenomenon of insufficient or excessive enhancement is avoided. Experimental results show that the influence of backlight conditions on image can be eliminated by the proposed approach, and the image quality is improved significantly and is superior to the conventional and the cutting-edge algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper discusses the method of detecting the palace area of the capital city site by using the land surface temperature retrieval and downscaling taking Erlitou site in wheat coverage period as an example. Land surface temperature was retrieved from Landsat 8 multispectral data on May 6, 2014 using radiative transfer equation method. Then, the mathematical relationship between the retrieved LST and NDVI is established to realize the downscaling of LST data and improve its spatial resolution. Compared with the results of archaeological excavation, it is found that the geographical location and scope of the palace area of the capital site can be well reflected through the downscaling of land surface temperature image, which provides a feasible method for the satellite remote sensing detection of the early capital site.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When capturing images in low-light conditions, the images are often degraded with low visibility and severe noise. To improve the visual quality and repress the noise simultaneously, a kind of low-light image enhancement method via layer decomposition and optimization was proposed. Firstly, the low-light image was smoothed via iterative least squares thus we could get the noise-free basic layer. Secondly, by means of subtraction of the original image and the basic layer we could get the detailed layer. Then we enhance the basic layer via variational Retinex-based method. At the meantime, we weaken the noise of the detailed layer by non-subsampled shearlet transform. Finally, we could obtain the enhanced image by fusion of the optimized basic layer and detailed layer. Experimental results of a number of low-light images reveal the efficiency of the proposed method and show its superiority over several state-of-the-arts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The anchor-based two-stage object detection methods like the Faster R-CNN are commonly utilized for detection tasks in various fields. Since networks in these methods are built on the pre-trained classification models, their performance largely depends on the backbone's properties. And it will make them suffer from limited generalization ability on some specific datasets. To overcome this problem and enhance the model's representation ability, we propose a Variational Information Bottleneck Based Feature Enhancement Object Detection Network (VFEDet). We first design a spatial-wise feature enhancement module in the first stage to highlight the critical target in the images, using a weighting map generated from the original feature in the form of information bottleneck (i.e., Variational Information Bottleneck, VIB). It can effectively suppress the overfitting and make the features contain more discriminative information for recognition and bounding box regression. Furthermore, we modify the second stage by inserting the VIB after the first fully connected layer to improve the model's robustness. Introducing the two parts into the original detection model, we achieve 39.34% improvement on a thyroid nodule ultrasound image dataset polluted by a kind of special noise in a previous work. The effectiveness of the proposed method is also evaluated on the COCO dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the continuous expansion and deepening of the human knowledge system, various compound words have been created to express new concepts. Since the combined words cannot be recorded in the thesaurus in time, which lead to the word segmentation system cannot recognize them. Hence they are generally recognized as the unite of the smallest word (atomic word). So it is very urgent and meaningful to study the recognition method of compound words. In this paper, we propose a word structure based combinatorial word discovery algorithm, which makes full use of the following three word structure characteristics: word spacing, word frequency, and grammatical rules. According to the distance and position relationship between different words, the algorithm makes a comprehensive evaluation based on the rule judgment and the occurrence frequency of words. By experiments on different corpora, the results show that this method has higher accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Word recognition is a basic task for intelligent K-12 education, which leads to further complex tasks including grammar checking, composition grading, etc. However, there is little study about recognition of students’ handwritten words. We propose a novel convolutional recurrent neural network architecture that combines attention mechanism with connectionist time classification loss for student handwritten words. And the method also performs excellently in handwritings of adults. With an ablation study, we show that our method is better than its counterpart without attention. The CRNN with attention model we proposed achieve superior performance on word recognition and has the potential to support applications of intelligent K-12 education.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the growth of the aging population, the incidence of eye diseases is getting higher and higher. Traditional manual diagnosis has strong subjectivity and limitations. Computer-aided diagnosis can improve the accuracy of diagnosis while accelerating the diagnosis. The traditional convolutional neural network cannot fully obtain the effective features of the image, which makes the classification accuracy of the image low. The computer-aided diagnosis algorithm proposed in this paper integrates DenseNet and Squeeze-and-Excitation Networks (SENet) in deep learning based on image de-watermarking and data enhancement, while fully extracting and utilizing fundus images features while improving the network's global features information utilization. The experimental results show that the classification accuracy of the model in the fundus image is 0.9528. Compared with other convolutional networks, SEDenseNet achieves the highest accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, deep learning convolutional neural network has been widely used in food recognition and other fields, but there are still some problems such as poor semantic feature extraction ability and low recognition accuracy of Chinese food images. Based on the above reasons, this paper proposed a DenseNet model based on attention mechanism for image recognition of Chinese food. Using the idea of transfer learning, this paper applied the DenseNet pre-training model with excellent classification performance to the Chinese food image dataset. In order to improve the ability of extracting distinguishable features and learning fine-grained features of Chinese food images, this paper proposed an attention mechanism based on the DenseNet-169 model. This paper used the attention module to extract the key semantic feature map of the image, and then carried out the adaptive multi-fine-grained region tailoring. Finally, based on the region weight fusion scheme, the multi-fine-grained region feature map was integrated into an image feature description and sent to DenseNet-169 for recognition, so as to realize the end-to-end image recognition network. Experiments were carried out on VIREO Food-172 dataset, and the results shows that our method achieves considerable and comparable recognition performance to the state-of-art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion recognition is widely used in somatosensory games, rehabilitation training and robot motion learning. Tennis training can identify and classify the captured actions, and improve the performance of computer-aided tennis teaching timely and accurately. Traditional image based or video based human posture capture recognition are easily affected by complex background environments, different lighting conditions and other factors in practical applications. In this paper, Microsoft Kinect is used as the sensor device to capture the data information of the tennis player. Firstly, Kinect's depth sensing technology is used to obtain human skeleton data. Secondly, in order to improve the efficiency of classification, this paper reduces the dimension of data by extracting the feature value of human skeletons. Thirdly, a kind of KNN algorithm which defines the dimension weight is proposed to implement the movement classification, compared with the 94% accuracy of the algorithm, the accuracy of the KNN algorithm is 92.4%, the accuracy of the decision tree algorithm is 92.81%, and the accuracy of the CNN algorithm The accuracy is 89.97%. The evaluation method of tennis action is defined to provide guidance for users. By comparing the difference between the user’s postures and the standard postures in the joint positions and angles which is prone to get error, this method can correct the user's postures and build up the function of movements guidance. From the teaching effect of tennis aficionados and general tennis players, this method is more practical and targeted than the traditional tennis graphics and video teaching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many face algorithms require a relatively large number of training samples. In practice, they often face the challenge of inadequate training samples, which reduces their recognition accuracy. Motivated by the extended sparse representation-based classification (ESRC), we propose an improved method to address the problem of under-sampled face recognition. We show that intra-class variant dictionary plays a significant role in feature extraction. Firstly, we propose to use the robust principal component analysis (RPCA) to model the sparse part of face images as intra-class variant dictionary, so that the various changes between faces can be well captured. Secondly, we incorporate the intraclass variant dictionary into the framework of ESRC. Experimental results on the AR and Extended Yale B databases show that our method outperforms other competitors either in the case of cross database recognition or one sample per class.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Orientation feature is one of the most important features of palmprint images. At present, palmprint recognition methods based on orientation features have achieved promising recognition performance. However, most of these methods neglect the relationships between the orientation features, which can not effectively describe the structure of palm lines, and are sensitive to the translation and rotation. In this paper, a palmprint recognition method based on threeorientation joint features is proposed. Firstly, Gabor filter is adopted to extract the orientation features. Secondly, by analyzing the characteristics of palm lines, two sets of feature vectors are constructed by using three orientation features, which are maximum and two minimum orientation. Finally, the weighted Manhattan distance metric is used to measure the similarity between two palms. Further, in order to improve the recognition performance, a feature fusion scheme is proposed for fusing different features obtained from multispectral palmprints. Experiments on PolyU MSpalmprint Database demonstrate that the proposed method can achieve better recognition accuracy than some state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new vehicle counting and traffic flow monitoring system is designed based on deep learning and image recognition. For accurate recognition of vehicles, Mask R-CNN model has been adopted and improved. Vehicle dataset is set up to obtain the corresponding model weight as recognition backbone in software. In addition, two counting methods, regional counting method and tracking counting method, have been analyzed and combined for effective counting. The experimental results show that the recognition rate of the proposed system is almost 100% and the counting rate is about 93.5%. According to the counting results, the planning requirement of vehicle path optimization is realized.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Research on dangerous driving behavior recognition is beneficial to regulate the driving behavior of drivers. As the existing algorithms are sensitive to noise, and abnormal data often affects the process of identifying dangerous driving behaviors. This paper proposes a novel driving behavior research method. Such method establishes a driving behavior recognition model based on Support Vector Machine (SVM) and oversampling. The experimental results show that the proposed model demonstrates a higher recognition rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most presented facial expression recognition methods laid more emphasis on the facial features extracted from expression images, but ignored the coupled relationship between facial expression features and identity features. This paper proposes a novel expression recognition method based on spatial feature disentanglement. Expression features and identity features are encoded with deep neural network independently under a multi-task framework. A latent space discriminator is designed to disentangle spatial features and weaken the impact of identity features on expression recognition. The experimental identification accuracy on CK+ and RaFD datasets could achieve 99.69% and 97.64% respectively, which verifies that the proposed method has better generalization ability and strong robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The image compression based on visual quality has got great interest over the past two decades. Meanwhile, it usually requires multi-resolution for the image code streams to browse images using electronical device. Recently, the image encoding respective to the above two factors become an attractive research area and gets great practical attractions in large image remote browsing scenarios, such as the pathology telemedicine. In this paper, we propose a visibility threshold model and an encoder based on measured HVS sensitivities and JPEG2000 standard. The proposed encoder can efficiently optimize the code stream on both HVS perception-based quality and display resolution. Moreover, the resulting code streams can be decoded with any JPEG2000 Part-I compliant decoder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The proposed method aims to identify thyroid follicular adenoma (TFA) and carcinoma (FTC) in ultrasound images. Although deep learning methods are powerful for image classification, it is limited for the small dataset from these two diseases. In this paper, we conduct the classification with fine-tuning and semi-supervised graph convolutional networks (GCN). First, we use a semi-automatic phase consistency geodesic active contour (PCGAC) method to segment the lesion areas. Then, by the fine-tuned EfficientNet, we extract the feature vectors. After that, the feature vectors are built as a graph. Finally, with the established graph, we utilize the semi-supervised GCN to classify TFA and FTC. The experimental results show the proposed method can recognize thyroid follicular neoplasm with 92.42%, 94.73% for specificity, 89.28% for sensitivity. Furthermore, the generalization ability is validated by three different testing data sets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The current 3D point cloud feature extraction algorithms are mostly based on geometric features of points. And the distribution of feature points is so messy to accurately locate. This paper proposes a point cloud feature extraction algorithm using 2D-3D transformation. By selecting three pairs of 2D image and 3D point cloud feature points, the conversion matrix of image and point cloud coordinates is calculated to establish a mapping relationship and then we realize the extraction of point cloud features. Experimental results show that compared with other algorithms, the algorithm proposed in this paper can extract the detailed features of point cloud more accurately.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Overfitting is a common problem in training of neural network with small training sets, which leads to worse performance on the new samples. Dropout has been proved to be an effective method to avoid overfitting, which prevents co-adaptation of features detectors by randomly discarding nodes from hidden layers of network. Inspired by dropout, we proposed a ranked dropout method to remove randomness of standard dropout mask, which discards a part of active nodes and forces the inactive nodes to learn more features to improve generalization ability. We apply the proposed ranked dropout to a stacked autoencoder network and compare it with standard dropout, gaussian dropout, uniform dropout and DropConnect on MNIST dataset. Experimental results of handwritten digit recognition demonstrate that the ranked strategy leads to better classification performance and the proposed ranked dropout can effectively reduce interference of overfitting and improve model‟s generalization ability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A daytime star sensor is a high precision attitude measurement instrument with the ability of detecting stars in the atmosphere during daytime. Different from star sensor applied in space, the performance of daytime star sensor is greatly affected by the strong sky background radiation. The complex and low signal to noise ratio daytime star image increases the difficulties of star recognition. This paper proposes a novel star extraction method for daytime star sensor, mainly focusing on star image preprocessing and fake star removal algorithm. Firstly, an improved morphology Top-Hat filter is provided to suppress the image noise. Then, the detailed process of star extraction is discussed and a pipeline filter is used to reject fake stars. Finally, multi-frame star vectors are calculated and averaged to improve the accuracy. An experiment with daytime star images captured by a self-developed airborne star sensor are analyzed to confirm the validity of the proposed approach, stars can be identified even if there are thin clouds in the sky.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Edge accurate dense disparity estimation is of great importance to applications, such as augmented reality where the geometry relationships among objects in a scene need to be presented precisely. Binocular stereo is a promising approach to get 3D depth information of a real scene from 2D images, but faces the issue of difficult to achieve high accurate disparity edges with reasonable computation complexity. A depth edge preserving dense stereo matching method is presented in this paper aiming to alleviate this problem. By taking a sparse-to-dense route for disparity estimation, depth edges corresponding to object boundaries are distinguished from the texture edges based on sparse disparities which can be obtained efficiently. With a designed disparity filling strategy, these extracted edges are used to refine the dense disparities and align the depth discontinuity edges with corresponding object boundaries. Disparities obtained by this work can faithfully conform to the scene geometry recorded in the input images only with a relative small increase about 13% in computational complexity. The effectiveness of the proposed method is verified through experiments and contrastive analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problems of image distortion and detail information loss occurring in recent dehazing algorithms, this paper proposes a dehazing algorithm based on dark channel prior with multi-scale weighted transmission fusion and self-adaptive gamma correction. Firstly, in order to refine the transmission map, a multi-scale weighted transmission fusion strategy with three scales is applied in the transmission estimation step. Then, a self-adaptive gamma correction method is proposed to enhance the contrast performance after applying the multi-scale weighted transmission fusion to image dehazing, and finally get the desired dehazed image. Experimental results demonstrated that the proposed algorithm can not only overcome the problems of image distortion and detail information loss well, but also yields a satisfied performance in comparison with tested similar methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Considering the low spatial resolution of remote sensing images, it is unreliable to achieve accurate pixel-level color restoration. Inspirited by human perception, which is sensitive to salience features, in this paper, we propose an approach for colorization of remote sensing images with semantic salience priors. Firstly, based on the DCGAN architecture, we introduce the semantic salience prior, which is designed and learned from existing data set with semantic labels to supervise the training of the network. Then, to eliminate the distortion in foreground color caused by the overwhelming amount of marine or bare land backgrounds, we leverage the idea of focal loss to prevent the vast number of backgrounds from overwhelming the generator. Finally, we evaluate the proposed method on the NWPU-RESISC45 public data set. Both the evaluations and comparisons validate the proposed colorization approach is superior to the state-of-the-art methods on remote sensing images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Skin diseases not only endanger physical health but also cause psychological problems. Traditional manual diagnosis has strong subjectivity and limitations. Recently, the use of computer-aided diagnosis technology based on deep convolutional neural networks to classify and recognize dermatological images has been widely used. In order to further improve the classification effect, we propose a method to merge the SENet network with the Inception-v4 network. By comparing the DensenNet-121, VGG-16, and ResNet-101 networks, the effectiveness of the SE-Inception-v4 network is verified, and the SENet network has also verified the effectiveness of model performance improvement. Experimental results show that the improved deep learning algorithm in this paper can improve the accuracy of skin disease image classification and has certain guiding significance for the research and application of computer-aided diagnosis in the medical field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As buildings constitute the main component of urban areas, which can provide several kinds of information. In this paper, a building extraction method based on the high resolution remote sensing image via Gabor filter and multi-orientation π local binary pattern (LBP) operator is proposed to aim at the application requirements of rapid, accurate urban planning and visual management. At first, the multi-dimensional texture features are extracted by using Gabor filter for the original image. Further, some training samples are obtained by multi-orientation π LBP operator at different orientations. Finally, the pixel-level discrimination is conducted for texture features, and achieves the location and shape of buildings. Experimental results demonstrate that the overall extraction accuracy has reached 94%, and the extracted results coincide with the distribution of each building, the proposed method is accurate to complete the task of building extraction, and has an excellent applicability for land management.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The traditional control flow diagram (CFG) only displays the overall framework of the program, but ignores its detailed control information and the complex inner structure. Programmers can hardly comprehend the whole program through traditional CFG. Therefore, an algorithm is proposed to generate a new kind of CFG with abundant details and inner structure. After parsing the compilation results of Clang static analyzer, the algorithm firstly extracts the key control information of a program, and then uses the control information to build up the basic display units. Secondly, the algorithm orders and packs these basic display units to form an intermediate XML file. Finally, the Jgraphx graphics library is introduced to parse the intermediate XML file and render the new CFG. Experiment shows that when comparing with the conventional CFG, the new CFG that drawn by the proposed algorithm can demonstrate more details, states, and data flows. The CFG drawn by the proposed algorithm can also offer strong support for complex software maintenance and testing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate recognition of convective initiation (CI) is important to locate severe hazardous weather events. Early identifying CI can provide warning signals so that people can prepare for the coming natural disasters. Modern geostationary satellite and Doppler weather radar can provide high spatial-temporal resolution imageries to monitor CI. In this study, CI refers to Doppler radar image having reflectivity greater than 35dBZ at the first time. This paper presents a deep learning method for early recognition of CI using multi-source observation data, including geostationary satellite and Doppler weather radar imagery. We use the 3D U-Net method which is composed of three-dimensional convolution, pooling, down sampling and up sampling. The North China area is selected as the study domain. The experimental results show that the proposed method can recognize CI effectively while the false alarms still need to be reduced in future work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Monocular depth estimation is a very valuable but also very challenging problem. In order to solve this ill-posed problem, traditional approaches apply depth cues such as defocusing, atmospheric scattering and shading to estimate depth information and approaches based on machine learning apply frameworks such as MRF and data-driven learning. With the development of deep learning, the monocular depth estimation approaches based on CNN and other networks have achieved good results and gradually become the mainstream. In this paper, we summarize some typical and representative literature on monocular depth estimation based on a single image in the past two decades and depict our analysis involved in these approaches. In addition, this paper also analyzes and compares the results obtained by some typical approaches, which may provide some guidance for those who are interested in this field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Doppler radar is the main remote sensing equipment to monitor severe convective weather which has significant threats to social and economic activities. It is important to accurately predict the time and location of severe weather events. In this study, we use a deep learning technique to predict severe weather events based on radar images. Firstly, we transform the prediction problem into a binary classification problem and use Generative Adversarial Networks (GANs) to construct a classifier. Then Doppler radar images are used to train the model. The critical success index, probability of detection, and false alarm ratio are used to evaluate the prediction results. The experimental results show that the GANs model provides satisfactory results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The importance of face recognition algorithms in biometric authentication systems has become increasingly prominent. In order to ensure the security of face authentication, it is crucial to detect spoof attacks before performing face recognition. In this paper, we propose a 9-layer convolutional neural network (CNN) architecture using end-to-end learning for face anti-spoofing applications in small-scale datasets, which can directly judge the corresponding output class of the raw input face image. In addition, we believe that real faces and fake faces are well distinguishable in color spaces other than the RGB space. Therefore, we propose a novel face anti-spoofing method using multiple color space models to provide complementary features. Extensive experiments on mixed dataset for CASIA-FASD database and Replay-Attack database showed excellent face spoofing detection result comparing with other similar approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a particle swarm optimization (PSO) algorithm-based approach for solving multi-vehicle tasks allocation problem. Firstly, by analyzing the missile capacity and voyage constraints associated with vehicles, we establish the functions of threat cost, voyage cost, and attack befits. Then the mathematical model of the considered multi-vehicle task assignment is presented. Secondly, based on the position of a particle, we define the corresponding assignment vector, from which a feasible task assignment solution can be extracted. Such a solution satisfies the considered constraints. Finally, based on PSO, we develop a multi-vehicle task allocation algorithm. A simulation experiment verifies the superiority of proposed algorithm by compared it with other existing method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming to analyze the influence of earth-atmosphere radiation on imaging characteristics of space object. A scene of space object motion and detection was designed by Satellite Tool Kit (STK) where visible light imagers mounted on geosynchronous earth orbit (GEO)/medium earth orbit (MEO) satellites were treated as observation platform, highly elliptical orbit (HEO) satellite was treated as object. Equivalent magnitude models of space object and earth-atmosphere radiation, and formulation for signal-to-noise ratio (SNR) of space object were derived by adopting infinitesimal method, according to spatial relationship between space object, earth, sun, and observation platform. The variation of equivalent magnitude between object and earth-atmosphere radiation, as well as the SNR were analyzed when tracking detector and gazing detector were arranged on observation platform. Simulation results indicate that the SNR of object on low orbit observation platform is higher than that on high orbit observation platform, the SNR of the former is 1.1 orders of magnitude higher than the latter on average, while the average imaging SNR of the latter is 1.9. Tracking detector’s object SNR is higher than gazing detector, the difference is largest when object enters or leaves detecting field of view, yet it is the smallest when object is close to detecting field of view. Moreover, the value of SNR obtained by simulation provides a guidance for the detection and recognition of space object, as well as a way of reduction of earth-atmosphere radiation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Non local mean method has been widely used to fill the holes in the depth image based rendering system. For selected candidate patches, the weighting coefficient is very important for filling effect. In this paper, we select the optimal candidate patches depending on color, gradient and depth information. In order to obtain more appropriate weighting coefficient for each candidate patches, namely soft weighting, we constructed an equation sets between the valid pixels in the hole and the corresponding pixels in each candidate patches. The weighting coefficient is computed by the Orthogonal Matching Pursuit method (OMP), namely soft weighting. Finally, we combine the weight of several best matching patches to fill the holes. The results show that the proposed method has a better robustness and performance for filling hole in DIBR systems than other holes filling based on algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the increasing popularity of applications such as unmanned driving, the ability of environment perception has become more and more important, and the most common expression of environment perception is semantic reconstruction. Therefore, more and more researchers are trying to synthesize the information from multiple sensors to achieve better semantic reconstruction effects. However, most of the current estimation methods (a). Too bulky to run in real-time (b). Failure to effectively use the information of a variety of different sensors (c). Failure to generate sufficient environmental perception information under limited computing power, such as semantic information and depth information. Therefore, this paper proposes a multi-modal joint estimation network for semantic reconstruction, which can solve the above problems. Our method takes RGB image and sparse depth as input. By adding multi-scale information to the neural network, it outputs semantic segmentation and depth recovery results simultaneously while maintaining light-weighted and real-time performance, then fuses both results in point clouds to get better environment perception ability. A large number of experiments show that our method has better performance than other methods in the same application scenario.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning has made great contributions to the study of single image super resolution. The recently proposed feed-forward architectures of super-resolution focus on nonlinear mapping from low-resolution inputs to high-resolution outputs. However, the feed-forward structure does not well represent the interdependencies between low- and high-resolution images. This leads to bad effect of SISR for large scaling factor. To solve this problem, this paper proposes an enhanced back-projection network that provides an up and down sampling process with error feedback to capture various spatial correlations, and introduces the residual block in sampling process to alleviate the difficulty of training deep networks and achieve better results. The results about 8x SR show that the proposed network is effective with compare to other popular methods in the large scaling factor.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When creating real or virtual computer graphics performance. The desired visual effect is influenced by a range of factors, i.e. the music effect, illumination, etc. Although there are mature theoretical supports for the current stage design, it still requires rich experience for the designers to realize the idea. And there are still no effective methods to modify the 3D scene illumination considering the music atmosphere. To address this problem, we propose a music-driven Stage lighting design system to automatically control the lighting color according to the background music. Our technique classifies the music with category tags, then formulates the evaluation function with the visual properties and the scene view images to evaluate and obtain harmonious lighting color. Experiments and analysis demonstrate that the advantages of our system provide strong support tools for the stage light designers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Data augmentation plays an indispensable role in expanding datasets and preventing neural network overfitting. In this paper, we analyzed the advantages and disadvantages of information dropping and information mixing methods in data augmentation. Then we combine the advantage of information dropping and information mixing, and propose Gridcut and Gridmix. Gridcut is based on the structured deletion of the input images like Gridmask. The size and shape of the deleted areas can be flexibly adjusted according to the Characteristics of the dataset. Based on this Gridcut, Gridmix interpolates pixel information of other images on the deleted areas. Comparative experiments demonstrate that our methods outperform the state-of-the-art augmentation strategies on CIFAR classification tasks. By adjusting the parameters, our method can also be flexibly expanded into Cutout, Cutmix and Gridmask.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The article presented an improved visibility evaluation model to make up two shortcomings of the visual cone maintenance visibility analysis method based on the real visual field, which could not quantitatively evaluate visibility and visual field occlusion. Firstly, combining sight distance and volume, we determined the position of objects in-plane field of vision. Then the line of sight model from human eyes to each part of the object was established to determine the occlusion of each part of the object. Finally, the evaluation results were quantified. The redevelop of DELMIA software realized the improved maintenance visibility analysis function module. In the maintenance visibility verification of the welding workstation, the improved method can effectively evaluate the visibility and quantify the evaluation results and quantitatively evaluate the occlusion of objects. The improved method greatly improved the comprehensiveness and rationality of visibility analysis in complex equipment maintenance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to satisfy the demand of autistic patients to integrate into social life, a street-crossing training model based on the virtual reality technology is built in this paper. Concerning about the reality of insufficient language expression ability of autistic patients, this paper proposes the application of eye tracking technology to realize the digitization of the training process. Therefore, the establishing of training database makes it possible to systematically design a rehabilitation training plan for each autistic patient. At the same time, this paper innovatively uses the radar chart to evaluate the street-crossing ability of autistic patients from six dimensions: Traffic light observation, Start in time, Sidewalk observation, Pedestrian observation and avoidance, Green light time planning and Vehicles observation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern Information Theory and Information Processing
This paper considers the problem of denoising the image corrupted by mixed impulse and Gaussian noise. We propose a two-stage approach based on impulse detectors and L0 sparse regularization. We first employ the impulse detectors to identify the locations of impulse noise, and then restore the noisy image by solving a constrained minimization model. The objective function of the proposed model uses the L0 norm to promote the sparsity of the resulting image in a tight framelet system. To overcome the algorithmic difficulty caused by the L0 norm, we use proximal block coordinate descent method to solve an approximate model. The global convergence of the algorithm is proved. We also develop an adaptive strategy on approximation parameter selection and a FISTA-like iterative scheme to speed up the algorithm. Numerical results show that our method performs favorably in comparison to several existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Based on the problem that the observation matrix in the traditional target positioning algorithm in wireless sensor networks, does not satisfy Restricted Isometry Property, a sparse target positioning algorithm based on LU decomposition is proposed. The algorithm applies the principle of compressed sensing to the grid target positioning problem based on the Received Signal Strength Indication. The LU decomposition method is used to decompose the observation matrix, which not only satisfies Restricted Isometry Property, but also reduces the impact on the original signal sparsity. After the experiment of the UAV positioning system, it is proved that the positioning performance of the target positioning algorithm based on LU decomposition is superior to that of the sparse node positioning algorithm based on Orth preprocessing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Iron is one of the major elements on the Moon. In general, it exists in two forms, i.e. FeO and submicroscopic metallic iron (SMFe). The presence of FeO on the Moon is of great significance in studying the history of lunar lava differentiation and its evolution. However, it has been difficult to inverse the abundance of FeO from the spectral data of the Moon, since the two forms of iron have opposite optical effects on the spectral absorption characteristics of the lunar surface. The spectral absorption depth will be strengthened by FeO, while weakened by SMFe that is produced by space weathering. The FeO of the Moon is inversed either directly from reflectance spectra or spectral absorption characteristics of satellite, telescope and in-situ obtained spectra, both without taking into account the effect of space weathering, which may induce bias on the FeO inverse. The degree of space weathering can be expressed by various maturity indexes, such as magnetic maturity (e.g. Is and Is/FeO) and optical maturity (e.g. OMAT and continuum slope). To better quantitate the content of FeO from the lunar spectra, in this study, we first investigate the variations of spectral absorption depth and maturity indexes due to different degrees of space weathering using Hapke radiative transfer model. Then the correlation of different maturity indexes is analyzed. Based on these, to consider the optical effects of FeO and SMFe, a novel method to inverse the FeO from lunar spectra are established. By comparing the FeO derived with the new method with four methods proposed by others, the FeO derived in this study yields a better correlation with laboratory measured FeO contents using LSCC data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The goal of blind image deblurring is to recover sharp image from one input blurred image with an unknown blur kernel. Most of image deblurring approaches focus on developing image priors, however, there is not enough attention to the influence of image details and structures on the blur kernel estimation. What is the useful image structure and how to choose a good deblurring region? In this work, we propose a deep neural network model method for selecting good regions to estimate blur kernel. First we construct image patches with labels and train a deep neural networks, then the learned model is applied to determine which region of the image is most suitable to deblur. Experimental results illustrate that the proposed approach is effective, and could be able to select good regions for image deblurring.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Clustering has been a hot research topic in unsupervised learning. Recently, the Generative Adversarial Networks (GANs) has achieved good results in clustering tasks such as ClusterGAN. However, this type of model could not work well on class imbalanced data. In this paper, the spatial attention and class balanced term are adopted to improve the data clustering. The proposed Spatial Attention GAN (SAGAN) can effectively rebalance the feature maps and achieve more reliable clustering when the number of samples in the dataset for each class is not balanced. Experiments show the promising results and the potential of the method for unsupervised clustering.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low rank approximation is an effective method in deep neural network (DNN) compression. In view of the fact that the redundancy information content of different network layers is different, a novel iterative low-rank approximation method based on the redundancy of each network layer is proposed. By giving priority to the network layer with higher redundancy, the loss of intrinsic information in each network layer is expected to be reduced and the performance of the compressed model is improved. Experimental results show that the performance of compressed model obtained by this method is improved with a slight reduction in compression ratio. It can be concluded that the proposed method can better retain intrinsic information in the pre-training network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Group convolution can significantly reduce the computational cost by dividing the feature map channels into groups and then convolution operation is applied within each group. However, evenly dividing channels will cause the isolation between divided groups, i.e., no interaction between groups. To address the channel isolation problem, we proposed flow group convolution (FGConv) that utilizes different but overlapped input channels to compute output channels and enhance the interaction of different groups. FGConv can be easily applied to the existing networks and reduce their computational cost. We replace the original convolution with FGConv in ResNets and validate them on CIFAR-10 and CIFAR-100 benchmarks. Experiments results demonstrate that FGConv performs better than existing group convolution techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This research tried to investigate how the use of the Tencent virtual community influenced college students' motivation and engagement in online language course. The instructors and the technology engineers worked together to design the online class. The participants were 320 freshmen from a university in eastern China. The results showed that the students involved in the Tencent virtual community were more motivated and engaged in their language classes than those in the traditional lecture classes. In addition, students' course perceptions of success and interest predicted their engagement among the two group students. These findings indicated that the Tencent virtual community incorporating motivational strategies are effective strategies for college instructors' pedagogical design to improve students' motivation and engagement. This study provided evidence for the effectiveness of the Tencent virtual community in college students' language classes and it also emphasized the importance of motivational strategies of success and interest in the instructional design of college language classes. Implications and limitations are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Voice gradually becomes a quick channel for people to input information. At the same time, the commercialization of augmented reality technology is advancing. While these two emerging technologies are maturing and combining, the interactive interface and methods in the corresponding environment have also greatly changed. This paper collects data that may affect the user's interactive experience in AR voice scenarios through experiments and questionnaires. With the help of the factor analysis, three key factors, and nine influencing factors of AR voice interaction are obtained. Evaluation indexes of interactive experience influencing factors in the AR voice environment are generated, and three modification schemes are drawn up according to analysis results. Through the analytic hierarchy process, the most worthwhile improving plan is selected. Finally, compared with the traditional interactive interface, four targeted design recommendations for the interactive design of the AR voice interaction are shown.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Stroke is one of the diseases that threaten human life and often leads survivor’s motor dysfunction. Rehabilitation training is an important means of treating sequelae. Traditional methods lack patient active participation, and doctors cannot monitor patient movement in real time. Therefore, an interesting and efficient upper limb rehabilitation training system was designed using virtual reality technology. The system design mainly involves: System management interface, rehabilitation training game, rehabilitation effect evaluation. The patient realizes human-computer interaction through VR glasses and data gloves. The sensors collect the posture change information of the main joints during patient training and upload it to the patient information database, so that the doctor can grasp the patient training situation in real time and update the training plan. The use of virtual reality technology to guide patients in active rehabilitation training has a positive reference for achieving highly efficient rehabilitation training.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a new method of robot path rolling planning based on automatic shunt ant algorithm. When a node is selected by multiple ants, later ants choose other paths to realize automatic shunting, thereby expanding the search range, enhancing the search diversity, and helping to obtain the optimal solution. The overall idea of the algorithm in this paper is to map the target point near the inner boundary of the robot's field of view, and use the new algorithm to plan the local optimal path of the robot, and the robot will move forward according to this local path. The robot repeats the process every time it advances, and reaches the end safely along a globally optimized path. Simulation experiments show that even in a complex unknown static environment, the algorithm in this paper can also be used to plan a global optimization path.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Knowledge Distillation (KD) aims at using a low-capacity model, called student, to learn from a high-capacity one, termed as teacher, such that the performance of student can be improved. Previous KD methods typically train a student by minimizing a task-related loss and the KD loss simultaneously, with the help of a loss weight hyper-parameter to balance these two terms. In this work, we propose to first transfer the backbone knowledge from a teacher to the student, and then only learn the task-head of the student network. Such a training decomposition alleviate the use of loss weight, which can be hard to define. This allows our method to be easily applied to different datasets or tasks with strong stability. Importantly, the decomposition permits the core of our method, Stage-by-Stage Knowledge Distillation (SSKD), which facilitates progressive feature mimicking from teacher to student. Extensive experiments on CIFAR-100 and ImageNet suggest that SSKD significantly narrows down the performance gap between student and teacher, outperforming state-of-the-art approaches. We also demonstrate the generalization ability of SSKD on object detection on COCO dataset. On both tasks SSKD shows significant improvements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sea ice model is deferent from other invariable models in navigation simulator that they would not change their structure. Because of ship-ice collision in ice navigation scene, sea ice will be broken and floating in the sea water, then ice channel will be generated, what should be modeled and visualized in scene of navigation simulator necessarily. Focusing on above problems, the ship-ice collision process and modeling method of sea ice are analyzed in this paper, numerical calculation of ship-ice collision process is analyzed and modularized to sea ice calculating model. The semiinfinite plane elastic foundation theory and wedge-shaped beam structure are applied to the corresponding modules. The result has been simulated and verified; what could also be the base of sea ice broken and ice channel generated visualization in ice navigation scene.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Depression is a widespread mental health disorder. At present, the clinical diagnosis of depression mainly depends on the interview between doctors and patients and the self-report of patients, so the diagnosis is subjective to a certain extent. This paper summarizes the scales of self-assessment for depression and the methods of relevant health data collection. This paper discusses the research status of the auxiliary diagnosis of depression based on voice data and social network data, and summarizes the basic process of the auxiliary diagnosis using the above data. The application cases of some machine learning algorithms and their evaluation indexes are counted. The machine learning algorithm can objectively predict and assist the diagnosis of depression. The convenience and economy of collecting depression health information should be fully considered. In the actual analysis, data acquisition cost, difficulty, application purpose and application population should be taken into comprehensive consideration, and data used for auxiliary diagnosis should be designed. When necessary, multi-modal data combined with several data forms can be used. In this paper, different methods are proposed to evaluate depression in different populations, and at the same time, early warning and health intervention should be carried out on the basis of the assessment, so as to bring a new way of thinking for the economic and rapid objective evaluation of depression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Disassembly sequence planning is an important part of maintenance, and disassembly modeling is the premise of sequence planning. Due to a large number of parts and complex structure, it is difficult to model the disassembly process for ship maintenance. In this paper, we proposed a disassembly modeling method with consideration of the relationship between the general parts and fasteners. At first, we analyzed the disassembly rules of fasteners and the constraints of fasteners, determined the hidden priority relation of part, and established the fastener relation table. At the same time, the relation table of fasteners and the interference matrix of parts were synthesized, and the constraint interference matrix considering the influence of fasteners was established. Based on this proposed modeling method, the judgment criterion of part disassembly and the disassembly rule of fasteners were put forward, and the effectiveness and superiority of the method were illustrated with an application case.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the many variables involved in the process, it still remains challenging to quantitatively evaluate the maintainability of vessel equipment. In this context, a new model, by employing the fuzzy comprehensive evaluation method based on analytic hierarchy process, was proposed to help evaluate vessel equipment maintainability. The evaluation index of vessel equipment maintainability was determined with special focus on factors influencing the maintainability. Then the method was applied in self-developed system that aims to quantify maintenance of vessel equipment and analysis the process of maintenance work, and it was subsequently verified by the practical examples. Results show that maintenance determined using this method are consistent with that observed within actual condition. This further certifies that this method can reasonably and effectively quantify the maintainability of vessel equipment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.