We present a parallel implementation of the optimized maximum noise fraction (G-OMNF) transform algorithm for feature extraction of hyperspectral images on commodity graphics processing units (GPUs). The proposed approach explored the algorithm data-level concurrency and optimized the computing flow. We first defined a three-dimensional grid, in which each thread calculates a sub-block data to easily facilitate the spatial and spectral neighborhood data searches in noise estimation, which is one of the most important steps involved in OMNF. Then, we optimized the processing flow and computed the noise covariance matrix before computing the image covariance matrix to reduce the original hyperspectral image data transmission. These optimization strategies can greatly improve the computing efficiency and can be applied to other feature extraction algorithms. The proposed parallel feature extraction algorithm was implemented on an Nvidia Tesla GPU using the compute unified device architecture and basic linear algebra subroutines library. Through the experiments on several real hyperspectral images, our GPU parallel implementation provides a significant speedup of the algorithm compared with the CPU implementation, especially for highly data parallelizable and arithmetically intensive algorithm parts, such as noise estimation. In order to further evaluate the effectiveness of G-OMNF, we used two different applications: spectral unmixing and classification for evaluation. Considering the sensor scanning rate and the data acquisition time, the proposed parallel implementation met the on-board real-time feature extraction.