Paper
25 July 2024 Uncertainty estimation through quantile forest for prescriptive scheduling of data processing at ALMA
Author Affiliations +
Abstract
The Atacama Large Millimeter/submillimeter Array (ALMA) is a prominent astronomical observatory known for its detailed imaging capabilities. Efficient scheduling of ALMA’s data processing tasks, especially those involving complex pipeline executions, is crucial for maximizing operational productivity. This paper addresses the challenge by developing a predictive model that estimates the runtime of these tasks, enabling more effective scheduling and resource management. Our approach employs the Light Gradient Boosting Machine (LGBM) and Quantile Forest models to predict processing times and quantify uncertainties. The use of these models is innovative, as it not only provides accurate predictions but also offers insights into the variability of processing times. This is particularly beneficial for handling the dynamic nature of the data processing workload at ALMA. We enhance the model’s performance and reliability by incorporating variable scaling and logarithmic transformations. To determine the best model, we comprehensively evaluated seven different machine-learning techniques. Our results show that the LGBM model and quantile estimation outperform traditional methods in predicting task durations. This leads to more efficient scheduling, as it allows the system to account for potential delays and optimize the sequencing of jobs. The quantile approach, in particular, offers a robust method for dealing with the inherent uncertainty in processing times. Our predictive tool has demonstrated a substantial reduction in overall flow time, decreasing it by 5.7%. Further improvements were achieved using stochastic scheduling techniques, which leverage the uncertainty estimates provided by our model. This research highlights the potential of machine learning to significantly enhance the operational efficiency of large-scale observatories like ALMA, providing a scalable and practical solution for managing complex data processing tasks.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Rodrigo A. Carrasco, Luis Aburto, Jorge García Yus, Alfredo De Rodt, and Gianfranco Speroni "Uncertainty estimation through quantile forest for prescriptive scheduling of data processing at ALMA", Proc. SPIE 13098, Observatory Operations: Strategies, Processes, and Systems X, 1309813 (25 July 2024); https://doi.org/10.1117/12.3019492
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data processing

Machine learning

Data modeling

Decision trees

Back to Top