In this work, the GPU implementation of three target detection algorithms for hyperspectral images was studied. The first two algorithms were detectors for full-pixel targets: the RX algorithm and the MF detector. Two different implementations were studied for the RX detector. In the global implementation, the mean and covariance matrix were globally estimated from training samples as a preprocessing step on the CPU. In the adaptive implementation, these parameters were locally estimated using a moving window centered at the test pixel. The third algorithm studied was the AMSD, a detector for sub-pixel targets based on structured modeling of the background. The detection algorithms were implemented on a NVIDIA® Tesla™ C1060 graphics card. In addition, a CPU implementation of each target detector was developed to be used as a baseline to estimate the speedups of the GPU implementations. The computational performance of the implementations and the detection accuracy of the algorithms were evaluated using a set of phantom images of a scene simulating traces of different materials on clothing and collected using a SOC-700 hyperspectral imager. In the design of the GPU implementations, we have analyzed three important aspects: computation decomposition, data layout, and the computation mapping to the GPU memory hierarchy. The computation is decomposed in a one-thread-per-output basis. Since the output value of the detectors is computed independently for each pixel of the image, the task of computing the output value for a single pixel can be assigned to a single processing unit, i.e., a GPU thread in the CUDA architecture. The data layout is related to how input image is stored in the GPU memory. In the GPU-based implementations described in this document, the band sequential storage scheme was used since it leads to coalesced memory transactions since threads with consecutive ID numbers will access contiguous memory positions when reading or writing a single band. Finally, different memory spaces were used for storing the data elements of the algorithms in order to exploit the GPU architecture. Parameters that do not change their values throughout the computation, like the background mean , are stored in the GPU constant memory space to improve the memory throughput. Other parameters, like the covariance matrix of the full-pixels detectors and the projection matrices of the AMSD algorithm, although they remain constant, cannot be stored in the constant memory space due to their size. In this case, the rows of the matrices are temporarily stored in the shared memory space in order to increase the memory throughput. The GPU implementations of the global RX and AMSD algorithms showed best performance improvement achieving maximum speedups of 24.76 and 46.64 respectively. The performance of the MF algorithm was limited by the low number of arithmetic operations performed by this detector in the kernel, achieving speedups below five. The parallel portion of this algorithm only consists of a dot product, which is relatively fast. Therefore, most of the total running time is spent in transferring data from the CPU to the GPU and vice versa. The performance of the adaptive RX algorithm was also limited, but in this case, due to high dependency on local data which limits the memory throughput. In addition, in the adaptive RX implementation the number of bands had to be reduced to 60 since the local memory space per thread is limited to 16 KB in the C1060 card. Experimental results also showed that the method evaluated for estimating the background subspace, SVD and MaxD, are only accelerated on the GPU for large data sizes. In terms of detection accuracy, the MF showed the best detection results for the data set evaluated.