As modern displays continue to increase in resolution, means of capturing images and videos at such high resolutions can be prohibitively expensive. This is especially true in the infrared domain. Image super-resolution, or upsampling, has often been applied to improve the aforementioned problem. Deep learning models have been proposed to reconstruct high quality high-resolution images from a low-resolution base. Previous solutions require a massive number of parameters which necessitate a large amount of free memory and computation power or they fall apart when applied to the infrared domain. As a result, many modern super-resolution models are not entirely practical. One difficult aspect in IR super-resolution is that IR images are inherently noisy, causing a poor signal-to-noise ratio, due to characteristics of IR sensors and internal reflections within the lenses. Because of this, super-resolution in IR must also act somewhat as a denoiser. Therefore, we propose a highly efficient, super-resolution model capable of producing single-image super-resolution in the IR domain.
The objective of image compression is to reduce irrelevance and redundancy of the image data to be able to store or transmit data in an efficient manner by minimizing the number of bits required to represent an image accurately. JPEG is capable of achieving an image compression ratio of 10:1 with little perceptible loss in image quality using standard metrics, and has become the most widely used standard image compression in the world since its release. Traditionally, compression techniques have relied on linear transforms to approximate 2-D signals (images), and the omission of specific constituent vectors has been mostly arbitrary. These techniques can save incredible amounts of memory while retaining image integrity. Recently techniques have been developed that use neural networks to approximate these signals. These networks offer the advantage of decorrelating image data to find a series of vectors to represent an image that is smaller than traditional techniques by estimating gradient descent, thus finding the minimum number of bits required to represent an image. Expansion to the development of these architectures is happening rapidly through informed design drawing upon other fields that have recently seen increased focus such as computer vision and image analysis applications. A novel efficient neural network is proposed in this work to compress infrared images at state of the art levels while preserving overall image quality to handle the demands spanning from the daily commute to combat environments.
K-means is a popular unsupervised ML algorithm for analyzing and recognizing natural occurring patterns to cluster similar points together. When applied to the color space of an image, it can work to recognize segments of the image where more meaningful clustering can be applied. Color quantization has been employed for decades now to optimize the memory usage of saved images. Typical images are composed of red, blue, and green channels, each represented by a byte in memory. Therefore, each pixel can be represented by a total of 24 bits, resulting in around 16.8 million unique colors. However, the human perceptive system is not sensitive enough to require full usage of this color space and it is beneficial to find ways to reduce the number of colors closer to what the eye is able to distinguish. This results in more efficient use of memory while still preserving details and color separation in the image. The key issue is to determine how much a picture can be quantized before the image starts to degrade to the point that a human would be able to discern the difference. Currently there is no algorithm available to aptly determine where this point occurs or whether each color channel should be treated identically. This research applies K-means color clustering to each color channel of the image separately to optimize compression. The introduction of principal component analysis (PCA) informed K-means in place of randomly seeded K-means on each color channel separately further improves performance.
Classification of one-dimensional (1D) data is important for a variety of complex problems. From the finance industry to audio processing to the medical field, there are many industries that utilize 1D data. Machine learning techniques have excelled at solving these classification problems, but there is still room for improvement because the techniques have not been perfected. This paper proposes a novel architecture called Multi-Head Augmented Temporal Transformer (MHATT) for 1D classification of time-series data. Highly modified vision transformers were used to improve performance while keeping the network exceptionally efficient. To showcase its efficacy, the network is applied to heartbeat classification using the MIT-BIH OSCAR dataset. This dataset was ethically-split to ensure a fair and intensive test for networks. The novel architecture is 94.6% more efficient and had a peak accuracy of 91.79%, which was a 13.6% reduction in error over a recent state-of-the-art network. The impressive performance and efficiency of the MHATT architecture can be exploited by edge devices for unmatched performance and flexibility of deployment.
As one of the classic fields within the area of computer vision, image classification and segmentation solutions as topics have expanded exponentially in terms of accuracy and ease of use. On Mars, the atmospheric and surface conditions can lead to the sudden onset of a dust storm, or a more common dust devil, causing a multitude of issues for both equipment and crew. The ability to identify and locate area which should be avoided due to these storms is necessary for mission safety. Many current techniques are not practical due to being hefty and computationally expensive for specific tasks that require the ability for swift deployability onto systems with more stringent constraints. This paper proposes a novel approach to the problem of segmentation by marrying an efficient yet powerful Vision Transformer based model with traditional signal processing techniques to ensure peak performance. With the National Aeronautics and Space Administration (NASA) looking to land a team on Mars, this paper takes on the real time hurdle of classifying and segmenting dust storms within remote satellite equatorial photos, using a model designed to be integrated on any and all future systems, increasing overall mission success.
Ethical data splitting is of paramount importance to ensure the validity of any solution that is based on data. If data is biased, it will not accurately represent how the solution will solve the problem. To ethically split data, the overall variance of the data needs to be fairly represented in the training and the testing sets of the dataset. To do this, the outliers of the data need to be determined so that they can be accounted for when splitting the data. Finding the principal components of the data using the L2-norm has been shown as an effective way to identify outliers of data to make a robust dataset that is resistant to outliers. It has been shown that the L1-norm is more resistant to outliers than the L2-norm, so it will allow the dataset to become more resistant to outliers. Therefore, utilizing L1-norm principal components when determining ethical data splits will result in more robust datasets.
The detection and recognition of targets within imagery and video analysis is vital for military and commercial applications. The development of infrared sensor devices for tactical aviation systems imagery has increased the performance of target detection. Due to the advancements of infrared sensors capabilities, their use for field operations such as visual operations (visops) or reconnaissance missions that take place in a variety of operational environments have become paramount. Many techniques implemented stretch back to 1970, but were limited due to computational power. The AI industry has recently been able to bridge the gap between traditional signal processing tools and machine learning. Current state of the art target detection and recognition algorithms are too bloated to be applied for on ground or aerial mission reconnaissance. Therefore, this paper proposes Edge IR Vision Transformer (EIR-ViT), a novel algorithm for automatic target detection utilizing infrared images that is lightweight and operates on the edge for easier deployability.
This paper introduces principal component signature design for correlation-and-bit-aware embedding. Simulation studies show superior performance in terms of bit error rate for L1-norm and L2-norm based signatures versus arbitrary signatures.
Chest X-rays can quickly assess the COVID-19 status of test subjects and address the problem of inadequate medical resources in emergency departments and centers. The image classification model established by the deep learning method of artificial intelligence can help doctors make a better judgment on patients with COVID-19 and related lung diseases. We compared and analyzed the current popular deep learning image classification methods, VGGNet, GoogleNet, and ResNet, using publicly available chest X-ray datasets on COVID-19 from different organizations. According to the characteristics of chest X-ray images and the classification results of the deep learning algorithm, a novel image classification algorithm, CovidXNet, is proposed. Based on the ResNet model, the CovidXNet algorithm introduces the hard sample memory pool method to improve the accuracy and generalization of the algorithm. CovidXNet is able to categorize chest X-ray images more efficiently and accurately than other popular image classification algorithms, allowing doctors to quickly confirm the patient’s diagnosis.
For the past 4 decades the MIT-BIH dataset has become the industry standard for the analysis of a comparative metric of signal processing and machine learning techniques. This is because medical data is difficult to collect and use because it is not widely available and open-source. There exists a need to standardize the metric for comparative reasons. This paper proposes a set of datasets targeted at specific tasks currently under investigation in state-of-the-art works. The open sharing of these datasets in multiple formats will allow for the application of the benchmark data to multiple advanced classification algorithms. Published methods will be profiled using this new dataset building the foundation for its merit. A series of datasets are identified with applicable criteria as to their usage such as, TinyML for health monitoring and detection of heart disease.
Several classical statistical methods are commonly used for forecasting time-series data. However, due to a number of nonlinear characteristics, forecasting time-series data remains a challenge. Machine learning methods are better able to solve problems with high nonlinearity. RNNs (recurrent neural networks) are frequently used for time-series forecasting because their internal state, or memory, allows them to process a sequence of inputs. Specifically, LSTM (long short term memory), a type of RNN, is particularly useful, as it has both long-term and short-term components. Due to its feedback connections, ability to process a sequence of data of varying lengths, and ability to reset its own state, LSTMs are less sensitive to outliers and more forgiving to varying lags in time. Consequently, LSTMs are able to extract vital information and learn trends to forecast time-series data with high accuracy. We propose a novel neural network architecture using a combination of long short term memory and convolutional layers to predict time-series energy data with higher accuracy than comparable networks.
Modern displays are steadily increasing in resolution, though sensors can be prohibitively expensive to capture images and video at such high resolutions. Image super-resolution, or upsampling, has recently been applied to alleviate these shortcomings. There exist many deep learning image super-resolution models that reconstruct very high quality high-resolution images from a low-resolution base. However, most of these models use a tremendous amount of parameters, requiring a large amount of free memory and computational power to super resolve a single image. As a result, many modern super-resolution models are not entirely practical due to the computational or memory usage requirements. We propose a highly efficient, small super-resolution model utilizing the sub-pixel convolution block for single image super-resolution.
In recent years, the computational power of handheld devices has increased rapidly to the point of parity with computers of only a generation ago. The multiple tools integrated into these devices and the progressive expansion of cloud storage have created a need for novel compressing techniques for both storage and transmission. In this work, a novel L1 principal component analysis (PCA) informed K-means approach is proposed. This new technique seeks to preserve the color definition of images through the application of K-means clustering algorithms. Assessment of the efficacy is carried out utilizing the structural similarity index (SSIM).
As one of the classic fields of computer vision, image classification is a topic that has expanded exponentially in terms of usability and accuracy in recent years. With the rapid progression of deep learning, as well as the introduction and advancement of techniques such as convolutional neural networks and vision transformers, image classification has been elevated to levels only theoretical until modern times. This paper presents an improved method of object classification using a combination of vision transformers and multilayer convolutional neural networks with specific application to underwater environments. In comparison to previous underwater object classification algorithms, the proposed network classifies images with higher accuracy, shorter training iterations, and deployable parameters.
The fusion of multispectral sensor data techniques for sets containing complementary information about the subject of observation leads to the visualization of data into a form more easily interpreted by both humans and algorithms. Many applications of feature-level fusion seek to combine edges and textures, across the bandwidth of the sensory spectrum. Visualization techniques can be skewed by the introduction of corruption and redundancies induced by harmonics. A majority of image fusion techniques rely on intensity hue saturation (IHS) transforms, principal component analysis (PCA), and Gram Schmidt. PCA’s ability to remove the redundancy from a set of correlated data while preserving the variance and its resistance to color distortion lends itself to this application. PCA also has a lower spectral distortion as compared to IHS and has been found to create superior image fusion. The application of neural network control techniques has been shown to more accurately recreate results similar to those found by human inference. Over the years, increased computation power has given rise to the spread of neural networks into roles previously carried out by humans. Select advanced image processing techniques have benefited greatly from their implementation. We propose a novel method of utilizing PCA in conjunction with a neural network to achieve a higher quality of image fusion. Implementation of an autoencoder neural network to fuse this information creates a higher level of data visualization when compared to traditional weighted fusion techniques.
There has been a sharp rise in the amount of data available for analysis in many professional fields in recent years. In the medical sector, this significant increase in data can help detect and confirm underlying symptoms in patients that would otherwise remain undetected. Machine learning techniques have been applied in the medical sector and can help diagnose irregularities when data is provided for the specific area on which the system has been trained. Leveraging the newfound amount of big data and advanced diagnostic techniques, higher dimensional data feature extraction can be better analyzed. The algorithm presented in this paper utilizes a convolutional neural network to categorize electrocardiogram (ECG) data by processing the original data implementing the fast Fourier transform (FFT) and principal component analysis (PCA) to reduce dimensionality while maintaining performance. The paper proposes three intelligent identification algorithms that can be fed into another specialized machine learning system or analyzed using traditional diagnostic procedures.
Living in a constant news cycle creates the need for automated tracking of events as they happen. This can be achieved through the investigation of broadcast overlay textual content. There exists a great amount of information to be deciphered via these means before further processing, with applications spanning from politics to sports. We utilize image processing to create mean cropping masks based on binary slice clustering from intelligent retrieval to identify areas of interest. This data is handed off to CEIR, based on the connectionist text proposal network (CTPN) to fine-tune the text locations and an advanced convolutional recurrent neural networks (CRNN) system to carry out text recognition to recognize the text strings. In order to improve the accuracy and reduce processing time, this novel approach utilizes a preprocessing mask identification and cropping module to reduce the amount of data being processed by the more finely tuned neural network.
As one of the classic fields of computer vision, image classification has been booming with the improvement of chip performance and algorithm efficiency. With the rapid progress of deep learning in recent decades, remote sensing land cover and land use image classification has ushered in a golden period of development. This paper presents a new deep learning classifier to classify remote sensing land cover and land use images. The approach first uses multi-layer convolutional neural networks to extract the image features, attached through a fully-connected neural network to generate the sample loss. Then, a hard sample memory pool is created to collect the samples with large losses during the training. A batch of hard samples is randomly extracted from the memory pool to participate in the training of the convolutional fully connected model so that the model becomes more robust. Our method is validated by testing the classic remote sensing land cover and land use dataset. Compared with the previous popular classification algorithm, our algorithm can classify images more accurately with a shorter training iteration.
KEYWORDS: Principal component analysis, Satellites, Data fusion, Sensors, Visualization, Signal to noise ratio, Infrared sensors, Infrared radiation, Visible radiation, Data processing
Satellites are equipped with an array of diversified sensors, capable of relaying multiple types of optical data about the earth’s surface. The different sensors used can capture varying levels of detail for a particular area of interest. Combining information gathered from sensors, ranging from the infrared to the visible spectrum, can enhance visualization and depth of data. The application of principal component analysis (PCA) to data fusion is traditionally processed by weighted reliability matrix. This paper presents a novel weighted reliability with rejection control PCA based sensor algorithm to improve data fusion quality creating a more robust visualization of the composite information obtained from satellites. The proposed algorithm can be applied using both L2 and L1 PCA. Simulation studies validate the proposed controlled weighted fusion method, even under high levels of corruption.
Meteorological modeling takes data captured from multiple sources that is then processed by data mining techniques to predict environmental changes. The most commonly used machine learning techniques for processing meteorological data are decision trees, rule-based methods, neural networks, naive Bayes, Bayesian belief networks, and support vector machines. These techniques require accurate data for effective models to be simulated. Meteorological datasets can contain outliers and errors that can significantly skew the accuracy of the generated models that are relied upon for many sectors of society including agriculture, natural disasters, and meteorological forecasting. This paper proposes a method to eliminate outliers from meteorological data to enhance the accuracy of models by applying a blind thresholding algorithm to the principal components (PCs) obtained from L1 and L2 norm Principal Component Analysis to identify and discard outliers in the dataset.
This paper considers the linear quadratic regulator (LQR) optimal control problem of multi-agent unmanned vehicle systems under communication constraints with packet drops. The problem is formulated into a distributed optimization problem of minimizing a global cost function through the sum of local cost functions by using local information exchange. By utilizing a newly developed optimization technique, we propose a novel algorithm to solve the distributed LQR problem in a first order (gradient descent based) manner. Moreover, we adopt the key idea of virtualizing an extra node for each agent to store information from the previous step and create a fully distributed optimization algorithm. Extensive simulations demonstrate the efficacy and robustness of the proposed solution.
Big data has been driving professional sports over the last decade. In our data-driven world, it becomes important to find additional methods for the analysis of both games and athletes. There is an abundance of videos taken in professional and amateur sports. Player datasets can be created utilizing computer vision techniques. We propose a novel approach by creating an autonomous masking algorithm that can receive live or previously recorded video footage of sporting events. This procedure can identify graphical overlays to optimize further processing by tracking and text recognition algorithms for real-time analysis.
Gene microarray data generally includes high dimension, small sample datasets prone to noise. Analyzing this data using supervised and non-supervised learning algorithms is extremely useful for gene characterization, disease diagnosis, and genetic therapy in the medical field. For many years, principal component analysis (PCA) has been used as a tool in algorithms for gene expression classification. Previous solutions utilize L2 norm based PCA, however with its superior resistance to outlier data, L1 norm PCA offers improved results. Both methods are compared using support vector machines (SVM) to classify genetic mutations and co-regulation in several publicly available datasets. Methods utilizing L1 PCA result in improved accuracy compared to L2 PCA when used as a pre-processing step to SVM classification for gene microarray data.
Free space optical (FSO) communication systems can be used to transmit data at a high data rate while being immune to noise that typical communication systems are susceptible to. Current radio frequency (RF) trans- mission systems are flooding the usable spectrum, causing it to become overcrowded and inconvenient to use. This results in more noise and potentially a lower bandwidth based on the part of the spectrum a given RF system is able to operate in. The free space optical transceiver (FSO-TRx) system proposed in this paper helps solve these issues with a modular and scalable design.
We propose a new recognition method to extract effective information from receipts by integrating deep learning algorithms from computer vision and natural language processing. Our method consists of three parts. The first part provides effective areas for receipt detection. By removing noise and extracting the gradient of the receipt image, we determine the threshold to crop and reshape the useful receipt area. Detecting text from a receipt image is the second part, we modify and deploy the text detection algorithm connectionist text proposal network (CTPN) to locate the text region in the receipt. In the third part, we import the connectionist temporal classification with maximum entropy regularization as the loss function for updating the convolutional recurrent neural networks (CRNN) to recognize the text detection area, which converts the receipt from an image into the text. Based on our method, the effective information of a receipt can be integrated and utilized. We train and test our system using the data set published by scanned receipts optical character recognition and information extraction (SROIE). The results illustrate that our recognition system is able to identify receipt information quickly and accurately.
In this paper, we investigate the obstacle avoidance and navigation problem in the robotic control area. For solving such a problem, we propose revised Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization algorithms with an improved reward shaping technique. We compare the performance between the original DDPG and PPO with the revised version of both on simulations with a real mobile robot and demonstrate that the proposed algorithms achieve better results.
Hydraulic fracturing has greatly impacted the oil and gas industry and is a large component of future oil production. Proper operation is dependent on supervisors monitoring data for signs of dangerous pressure spikes. Human error becomes a large factor in processing such a large amount of data in real time. The system proposed in this paper is able to read and interpret the data from a well to make accurate predictions on when potential pressure spikes will occur within the well, saving time and money on projects that are pushed to the limit.
KEYWORDS: Signal to noise ratio, Modulation, Phase shift keying, Frequency shift keying, Detection and tracking algorithms, Interference (communication), Detection theory, Signal detection, Algorithms, Binary data
We present a two-stage generalized-likelihood-ratio test (GLRT) based procedure for the classification of modulation schemes with unknown signal parameters, such as frequency, amplitude, phase and symbol sequence. Extensive simulation studies presented in this paper demonstrate the efficacy of the developed scheme under limited observation for various PSK and FSK signals, including those with nested symbol constellations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.