As global populations soar and the climate warms, food supply management is an increasingly critical problem. Precision agriculture, driven by on-site data collected from various sensors, plays a pivotal role in optimizing irrigation, fertilization, and enhancing plant health and crop yield. However, the manual process of in-filed chlorophyll measurement, which is a key metric for guiding agricultural decisions, is very cumbersome and poses significant challenges. This paper explores the transformative potential of multispectral imaging data to automate plant measuring and monitoring tasks, thereby reducing labor and time costs while improving the quality of data available for making informed agricultural decisions. We present a deep-learning model for instance segmentation of plants, trained on the Growliflower dataset of RGB and multispectral image cubes of cauliflower plants. The proposed algorithm uses a Convolutional Neural Network (CNN) to leverage both the spectral information and and its spatial context to locate individual plants. We introduce a novel band-selection algorithm for determining the most significant multispectral features for use in the convolutional network: this reduces model complexity while ensuring accurate results. Our model’s ability to generalize across varying growth stages, soil conditions, and varieties of crops in the training dataset demonstrates its suitability for real-world agricultural applications. This fusion of cutting-edge sensing technology for robotic systems and state-of-the-art deep learning models holds significant promise for advancements in crop yield, resource efficiency, and sustainability practices.
Eye-tracking holds numerous promises for improving the mixed reality experience. While eye-tracking devices are capable of accurate gaze mapping on 2D surfaces, depth estimation of gaze points remains a challenging problem. Most gaze-based interaction applications are supported by estimation techniques that find a mapping between gaze data and corresponding targets on a 2D surface. This approach inevitably leads to a biased outcome, as the nearest objects in the line of sight will tend to be the target of interest more often. One viable solution would be to estimate gaze as a 3D coordinate (x, y, z) rather than the traditional 2D coordinate (x, y). This article first introduces a new comprehensive 3D gaze dataset collected in a realistic scene setting. Data was collected using a head-mounted eye-tracker and a depth estimation camera. Next, we present a novel depth estimation model, trained on the new gaze dataset to accurately predict gaze depth based on calibrated gaze vectors. This method could help develop a mapping between gaze and 3D objects on a 3D plane. The presented model improves the reliability of depth measurement of visual attention in real scenes as well as the accuracy of depth-based scenes in virtual reality environments. Improving situational awareness using 3D gaze data will benefit several domains, particularly human-vehicle interaction, autonomous driving, and augmented reality.
The dynamics of gaze coordination in natural contexts are affected by various properties of the task, the agent, the environment, and their interaction. Artificial Intelligence (AI) lays the foundation for detection, classification, segmentation, and scene analysis. Much of AI in everyday use is dedicated to predicting people's behavior. However, a purely data-driven approach cannot solve development problems alone. Therefore, it is imperative that decision-makers also consider another AI approach—causal AI, which can help identify the precise relationships of cause and effect. This article presents a novel Gaze Feature Transverse Network (Gaze-FTNet) that generates close-to-human gaze attention. The proposed end-to-end trainable approach leverages a feature transverse network (FTNet) to model long-term dependencies for optimal saliency map prediction. Moreover, several modern backbone architectures are explored, tested, and analyzed. Synthetically predicting human attention from monocular RGB images will benefit several domains, particularly humanvehicle interaction, autonomous driving, and augmented reality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.