First, we pick number of random observations (FAST feature locations) to reduce the computation time. For each observation location, we compute the distance to the nearest neighbor observation point. Then, the mean of all distances gives us a number (calculated as 105.6 for ). We assume that variance of Gaussian kernel () should be equal to or greater than . To guarantee the intersection of kernels of two close observations, we assume variance of Gaussian kernel as in our study. Consequently, bandwidth of Gaussian kernel is estimated as . For a given sequence, that value is computed only one time over one image. Then, the same value is used for all observations extracted from images of the same sequence. The introduced automatic kernel bandwidth estimation method makes the algorithm robust to scale and resolution changes. In Fig. 2(d), we represent the PDF obtained for test image. The represented PDF function is color coded, which means yellow-red regions show high probability values and dark blue regions show low probability values. As can be seen in this figure, crowded areas have very high probability values, and they are highlighted in estimated PDF. We use the automatic thresholding method of Otsu21 on this PDF to detect regions having high probability values. After thresholding our PDF function, in the binary image obtained we eliminate regions with an area , since they cannot indicate large human crowds. The resulting binary image holds dense crowd regions. For image, boundaries of detected crowd regions are represented on original input image with blue borders in Fig. 2(e). After detecting very dense groups, in the next step we focus on detecting other people in sparse groups. After detecting dense crowds automatically, we also extract quantitative measures from detected crowds for more detailed analysis. Because they indicate local color changes, we assume that detected features can give information about number of people in crowded areas. Unfortunately, the number of features in a crowd region does not give the number of people directly. In most cases, shadows of people or small gaps between people also generate a feature; in addition, two neighbor features might come from two different chroma bands for the same person. To decrease counting errors from these features, we follow a different strategy to estimate the number of people in detected crowds. We use a binary mask where the image has zero values but the feature locations have value 1. Then, we dilate using a disk-shape structuring element with a radius of 2 to connect close feature locations in binary mask.22 Finally, we apply connected component analysis to the mask, and we assume the total number of connected components in a crowd area as the number of people ().22 In this process, a slight change of radius of a structuring element does not make a significant change in estimated people number . However, an appreciable increase in radius can connect features coming from different people and that decreases , which leads to poor estimates of number of people. Because the resolution of the input image is known, using an estimated number of people in the crowd, the density of people () can also be calculated. Let us assume is the ’th connected component in crowd mask. We calculate crowd density for ’th crowd as , where and are the numbers of pixels in the image in horizontal and vertical directions, respectively, and is the area of one pixel in square meters.