Selecting representative data is a key factor in improving the performance of machine learning algorithms. In this paper we focus on out-of-distribution (OoD) methods evaluation, which can be integrated into ML project lifecycle in a nonintrusive way, without changing a model architecture. Considered methods are applicable to image classification datasets analysis. In addition to commonly used AUROC metric, we evaluate the number of out-of-distribution samples misclassified with high confidence. Case studies were conducted on benchmark and production datasets. As a result, we provide practical guidance for data evaluation and recommendations on which method to use to detect different types of OoD images.
In this paper, we consider the problem of clothes compatibility for total look recommendation systems by means of deep neural networks. This task has become very popular in recent years, primarily due to the growth of online retail sales of clothing. Unlike the existing solutions, we developed a comprehensive model of clothes compatibility evaluation based on color characteristics as well as on the characteristics of the style. As a rule, neural networks are robust to the color characteristics of an image, but color is an extremely important component in the task of a total look evaluation, so such additional branch with color characteristics is well justified. The proposed model uses both: color embedding obtained from color clustering and histograms, and style embedding as an output tensor of ResNet-50 encoder. The paper shows that color embeddings significantly improve the quality of the total look evaluation. The model was trained on Polyvore dataset, which was pre-processed and cleaned from the items not related to the topic of total look compatibility.
Modern vehicles include a vast number of intellectual functions such as lane-keeping assist (LKA), vehicle, pedestrian and obstacle recognition (FCW, PPS) which are implemented in the advanced driver assistance system (ADAS). These functions allow a vehicle to localize itself correctly within the road lane and to increase the overall system safety. It is also critical for vehicle motion and planning the target trajectory. Previously, algorithms implementing ADAS functions were based on classical computer vision approaches (e.g. edge detection, morphology, Hough transform), which did well only on a rather simple road scene. Modern state-of-the-art systems are based on semantic segmentation networks, it is the unquestionable trend. With a more thorough study of the road scene segmentation issues we face the problem that the existing benchmark suites such as MOTS, KITTI as well as recent DNNs for the road/lane semantic segmentation employ only mutually exclusive classes i.e. in this case, a pixel can belong to a single class only. But if we recognize the road scene, a pixel can easily refer to several classes, e.g. to ego-lane and crosswalk. The classical approach with mutually exclusive classes will give preference to only one class in this case and we will get an ego-lane consisting of two components. As a result, it may be difficult to restore the ego-lane at the stage of post-processing, see Figure 1. To overcome this problem, in the paper we propose the approach with multiple segmentation maps as an output of the DNN architecture, as well as a multi-map loss function. In this case, each pixel is referred to several classes at the same time (depending on the number of layers) and we don’t have the restriction to use mutually exclusive classes. The DNN classifier for each segmentation map has a separate activation branch and loss function.
Advanced Driver Assistance System (ADAS) is a very important part of an up to date vehicle. For achieving highlevel objectives in such ADAS functionality like LKA (lane keeping assistance), LDW (lane departure warning system), FCW (forward collision warning) the quality of the algorithms under the hood must be extremely high. In the last few years, it is common that these algorithms are based on DNNs (deep neural networks) applied to the tasks of semantic and instance segmentation, 2D/3D object detection and visual object tracking. Recent state-of-the-art DNN models as usual solve only one single task from the listed above and running several neural networks is rather computationally expensive and even impossible due to the lack of the GPU memory. One of the approaches used to overcome such a problem is a shared backbone (also called feature extractor or encoder). The backbone consumes most of the computing resources thus the model with a shared backbone achieves better inference performance. Unfortunately, the training procedure for a shared backbone model has several difficulties. The first one is the lack of datasets with all the required and uniform annotation types. The second problem is a more sophisticated backpropagation procedure. In this paper, we consider several methods for multi-task neural network training and present the results of such training procedures on several public datasets with dissimilar annotation types. The shared backbone is applied to the following three tasks performed simultaneously on the road scene: semantic segmentation, 2D object detection and 3D object detection. While the performance of the DNNs with shared backbone increased significantly, we obtained the quality evaluation results, which are quite close to the original separate state-of-the-art DNNs and even outperforms them in some evaluation indices.
Computer vision systems based on convolutional neural networks are being rapidly introduced in the field of precision agriculture to solve the problem of scene recognition. Convolutional networks allow performing high-precision recognition, but a significant problem is the expensive process of adapting the network to new conditions. This article proposes a method of fast adaptation of the trained network to minor changes in the source domain without annotating new data. This method is known as Adversarial Domain Adaptation, in the current paper it is applied to the problem of agricultural scene recognition in automated harvesting. The initial training procedure is modified for parallel training of an additional subnet on unannotated data, which makes it possible to compensate the domain shift due to adversarial training. This approach allows us to monotonically increase the quality of all recognized classes of objects and to enhance the stability of CNN model.
Publisher’s Note: This paper, originally published on 13 April 2018, was replaced with a corrected/revised version on 14 September 2018. If you downloaded the original PDF but are unable to access the revision, please contact SPIE Digital Library Customer Service for assistance.
The paper proposes a solution to the automatic operation of the combine harvester along the straw rows by means of the images from the camera, installed in the cab of the harvester. The U-Net is used to recognize straw rows in the image. The edges of the row are approximated in the segmented image by the curved lines and further converted into the harvester coordinate system for the automatic operating system. The “new” network architecture and approaches to the row approximation has improved the quality of the recognition task and the processing speed of the frames up to 96% and 7.5 fps, respectively. Keywords: Grain harvester,
We study the issue of performance improvement of classification-based object detectors by including certain geometric-oriented filters. Configurations of the observed 3D scene may be used as a priori or a posteriori information for object filtration. A priori information is used to select only those object parameters (size and position on image plane) that are in accordance with the scene, restricting implausible combinations of parameters. On the other hand the detection robustness can be enhanced by rejecting detection results using a posteriori information about 3D scene. For example, relative location of detected objects can be used as criteria for filtration. We have included proposed filters in object detection modules of two different industrial vision-based recognition systems and compared the resulting detection quality before detectors improving and after. Filtering with a priori information leads to significant decrease of detector's running time per frame and increase of number of correctly detected objects. Including filter based on a posteriori information leads to decrease of object detection false positive rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.