Open Access Paper
28 December 2022 Distributed storage and retrieval of massive remote sensing images
Muchun Lu, Yunfeng Nie, Wantao Liu
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 1250634 (2022) https://doi.org/10.1117/12.2662384
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
With the continuous development of remote sensing technology, remote sensing images show multi-source, massive and high resolution, resulting in exponential growth of remote sensing data, which brings great challenges to image pyramid storage and retrieval. In order to solve the problem of low storage and query efficiency of tile data, the Z-Curve index is proposed in this paper by improving the Z curve. This paper uses the distributed Accumulo database for storage. The experiments show that this method can effectively improve the efficiency of tile data query.

1.

INTRODUCTION

With the continuous development of remote sensing technology, remote sensing image data grow exponentially, and it is necessary to store and search remote sensing data efficiently1. The traditional way to display images in the front page is to load the whole image into memory at one time, but this situation is only applicable to images with small amount of data2. When the amount of remote sensing image data is huge, the whole image cannot be read at one time beyond the memory capacity, resulting in slow loading and clatter. Using image pyramid mode can handle this problem well, but how to store large tile data is a problem to be solved.

The tile data has the characteristics of numerous files and difficult management. At present, there are two solutions for tile data storage: one is using distributed file system HDFS3 to store indexes and data in HDFS. This method has simple structure and high autonomy, but tile data and index will occupy a large amount of memory of the master node, affecting the cluster performance; the other is the use of non-relational database storage, using Key-Value model to facilitate storage4.

This method reduces the management difficulty and facilitates the search. In this paper, Accumulo NoSQL database management is used, and the index is established based on the improved Z curve to realize the efficient management of tile data.

2.

Z-CURVE INDEX DESIGN

2.1

Z-curve index structure

Spatial index5, 6 is a data structure that combines spatial data in a certain order. Z curve is a simple key index in the space filling curve. The multi-dimensional space is transformed into one-dimensional curve7. The linear quadtree can be well constructed by using Z curve, and the spatial data is reduced. The row and column coordinates are mapped into onedimensional curves for Z-type sorting coding8. The advantage of this curve is that it can reduce the discontinuity of spatial data. In this paper, Z curve is used to build index to speed up image search, and more efficient storage and management of image tiles.

The schematic diagram of Z curve is shown as in Figure 1. In the plane of 8 × 8, the range of row and column coordinates is 0-7, so three bits are used to represent the row-column coordinates. A full quadtree is established in the completed pyramid, and the levels divided by the Z curve are regarded as a full quadtree. Each node of the quadtree corresponds to different levels of the pyramid, the root node of the quadtree corresponds to the top layer of the image, and each node corresponds to four sub-nodes from top to bottom.

Figure 1.

Z curve coding diagram.

00119_PSISDG12506_1250634_page_2_1.jpg

In the tile pyramid model, the resolution of each layer is different, and the constructed tiles have spatial correlation, and there are correlations between different levels. If encoded by the two dimensions Z curve shown as in Figure 1, the lack of hierarchical resolution information in the pyramid model leads to the construction of duplicate indexes. At the same time, when querying the user region, the range may include images of different sources but with the same coverage, and an additional layer of index is needed to retrieve images of different sources.

In view of the above problems, the original Z curve is improved in this paper, and an extended two dimensions Z curve is designed. The index value consists of 8 bytes corresponding to 64 bits. Consists of four parts : The first part, the symbol position 1; the second part, the ID of different images accounted for 12 bits, the image ID is added to the index coding, so it does not need to establish a layer from ID to tile index mapping, improve the query efficiency; the third part is the pyramid scaling level of 5, which only used to 0-23 level, increasing the pyramid level can be the same level index coding adjacent storage, narrowing the search range; the fourth part is divided into row and column coordinates, accounting for 23 bits, the highest can display a grid matrix 223 × 223. This design can make the index value as the only corresponding value of each tile data, and use the index value bit operation to retrieve tiles. According to the coding characteristics of Z curve, the logical adjacent tiles are similar in physical storage, which reduces the number of I/O and improves efficiency. The index bit allocation is shown as in Figure 2.

Figure 2.

Index value allocation graph.

00119_PSISDG12506_1250634_page_2_2.jpg

According to the coding method defined above, when retrieving a regional image, it is found that there are several multisource images in the region. In the index value, there is a mapping from image ID to tile encoding. According to the index value, the corresponding tile data are found in the database, which reduces the number of searches. So the Z-curve index used in this paper has some advantages in querying data.

2.2

Implementation of Z-curve algorithm

There is a relatively simple mapping relationship between Z-curve index and image ID, pyramid scaling level, and tile row and column coordinates. This part is divided into two algorithms:

(1) ConvetToZKey algorithm

The algorithm converts four parameters into index values and defines two-dimensional query table:

MAGIC= {0x5555555555555555L,0x3333333333333333L,0x0F0F0F0F0F0F0F0FL,0x00FF00FF00FF00FFL,0x0000FFFF0000FFFFL,0x00000000FFFFFFFFL,0x000000000000003FL}.

Firstly, the row and column coordinates are filled respectively, and the transformed coordinates are obtained by five left-shift calculation and iteration. The calculation formula is:

00119_PSISDG12506_1250634_page_3_1.jpg

The converted row-column Morton code value is then used for bit cross calculation, The index value ZKey is obtained by performing or calculating the results with the zoom level Z and image ID after left shift:

00119_PSISDG12506_1250634_page_3_2.jpg

The algorithm is used to construct the pyramid block and store the tile data as the key of the tile data in the Accumulo database. The time complexity is constant.

(2) ConvertToParameters algorithm

This algorithm obtains four parameters by inverse calculation according to the index value. Firstly, the image ID and scaling level are extracted according to the right-shift of the index value. Then, when the index value ZKey is inversed, because X is in the front and Y is in the back when the row and column bits are cross-calculated, the right-shift of one bit when the initial value is taken from the column is:

00119_PSISDG12506_1250634_page_3_3.jpg

Finally, the right-shift operation and the parameter array are performed to obtain the row and column coordinates:

00119_PSISDG12506_1250634_page_3_4.jpg

This algorithm is mainly used to query tile data with spatial correlation with the tile, and the time complexity is also constant.

3.

RESEARCH ON DISTRIBUTED TILE DATA STORAGE

When the image pyramid is constructed, a large number of tile data need to be stored and stored. The current tile data storage scheme is to store the tile data in HDFS in the form of directory and file9. The advantage is that the storage structure is simple and clear, and the management has high autonomy. However, when tile data is stored in HDFS, each tile data and directory will occupy memory space, each accounting for about 150Byte. When a large number of tile data sets are stored in HDFS, the directory and tile data will occupy too much memory space of the main node, resulting in low storage efficiency, which is not conducive to the storage management of large-scale tile data and difficult to retrieve local images10. In this paper, the AccumuloNoQL database is used to store tile data, that is, the index and tile data are stored in the database. This method effectively avoids the problem of occupying too much memory of the master node when HDFS is used to store. It is convenient to use the index to find tiles, and improves the efficiency of retrieval.

When encoding using the improved Z curve, the encoding rule is “original image ID + zoom level + row coordinates of the tile + column coordinates of the tile”, and the index value is used to sort the data set to form an ordered tile data key value. If the index value is only sorted according to the two parameters of row and column coordinates, it cannot be positioned to the storage location of the tile according to the two parameters, and additional indexes need to be constructed for searching, which increases the number of comparisons. Add the two parameters of the original image ID and the pyramid zoom level to achieve the nearest neighbor storage of the same layer of data. This storage mode is convenient to physically adjacent tile data with spatial correlation in storage, reducing the memory loss of the system when reading tile data. When reading data, locate specific keys based on parameter information to find tile binary stream data corresponding to keys. As shown in Figure 3, if the image ID of the tile data is 1, the pyramid level is 3, and the row and column coordinates in this layer are (0,0). The index value for the tile is binary to 1000 1100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000. Among them, the highest bit 1 represents the image ID, the second high 5 bits 00011 represents the third level, and the last 46 bits are the row column coordinates in the corresponding matrix of the layer, corresponding to the tile data of the 0th position in Figure 3.

Figure 3.

Pyramid layer 3 tile data storage schematic.

00119_PSISDG12506_1250634_page_4_1.jpg

4.

EXPERIMENTAL RESULTS AND ANALYSIS

The experimental data generated 85054 tile data for multiple different sources by building a pyramid. In order to verify the performance of Accumulo non-relational database for reading tile data and whether the storage design using Z-curve index can retrieve tile data more quickly. By comparing the impact of the Z-index and the Z-curve index on the query efficiency, the results are shown in Figures 4 and 5.

Figure 4.

Find efficiency comparison figure.

00119_PSISDG12506_1250634_page_4_2.jpg

Figure 5.

Performance comparison.

00119_PSISDG12506_1250634_page_4_3.jpg

From the experimental data, the two methods have little difference in query time when the query data is small. With the increase of query data, the query time of the improved index curve is significantly lower than that of the index method without improvement, which effectively improves the query speed. In the case of small data, there is little difference between the two storage methods, but with the increase of the number of tile files read, Accumulo storage is significantly better than HDFS storage.

5.

CONCLUSION

The proposed Z-Curve curve and image storage scheme based on Accumulo database effectively improve the efficiency of storage and search. The scheme in this paper uses distributed database for storage based on Hadoop cluster, which meets the demand for massive image storage and realizes low-cost distributed cluster deployment. The next step is to optimize the Hadoop cluster, improve the efficiency of data management, and combine with practice more closely.

ACKNOWLEDGMENTS

This work was supported by the Natural Science Foundation of Jiangxi Province Grant 20202BABL202040.

REFERENCES

[1] 

Klein, I., Oppelt, N. and Kuenzer, C., “Application of remote sensing data for locust research and management—A review,” Insects, 12 (3), 233 (2021). https://doi.org/10.3390/insects12030233 Google Scholar

[2] 

Tao, H., Feng, H., Xu, L., Miao, M. and Fan, L., “Estimation of crop growth parameters using UAV-based hyperspectral remote sensing data,” Sensors, 20 (5), 1296 (2020). https://doi.org/10.3390/s20051296 Google Scholar

[3] 

Li, L., Jing, W. and Wang, N., “An improved distributed storage model of remote sensing images based on the HDFS and pyramid structure,” International Journal of Computer Applications in Technology, 59 (2), 142 –151 (2019). https://doi.org/10.1504/IJCAT.2019.098037 Google Scholar

[4] 

Chen, S. and Tian, Y. L., “Pyramid of spatial relations for scene-level land use classification,” IEEE Transactions on Geoscience and Remote Sensing, 53 (4), 1947 –1957 (2014). https://doi.org/10.1109/TGRS.2014.2351395 Google Scholar

[5] 

Zhang, H., Chan, Y., Fan, K., et al., “Fast and efficient short read mapping based on a succinct hash index,” BMC Bioinformatics, 19 (1), 1 –14 (2018). https://doi.org/10.1186/s12859-018-2094-5 Google Scholar

[6] 

Park, K., “Location-based grid-index for spatial query processing,” Expert Systems with Applications, 41 (4), 1294 –1300 (2014). https://doi.org/10.1016/j.eswa.2013.08.027 Google Scholar

[7] 

Zhang, H., Chan, Y., Fan, K., et al., “Fast and efficient short read mapping based on a succinct hash index,” BMC Bioinformatics, 19 (1), 1 –14 (2018). https://doi.org/10.1186/s12859-018-2094-5 Google Scholar

[8] 

Yu, A. and Mei, W., “Index model based on top-down greedy splitting R-tree and three-dimensional quadtree for massive point cloud management,” Journal of Applied Remote Sensing, 13 (2), 028501 (2019). https://doi.org/10.1117/1.JRS.13.028501 Google Scholar

[9] 

Li, L. H., Jing, W. P. and Wang, N. H., “An improved distributed storage model of remote sensing images based on the HDFS and pyramid structure,” International Journal of Computer Applications in Technology, 59 (2), 142 –151 (2019). https://doi.org/10.1504/IJCAT.2019.098037 Google Scholar

[10] 

Xu, W., Zhao, X., Lao, B., et al., “Enhancing HDFS with a full-text search system for massive small files,” The Journal of Supercomputing, 77 (7), 7149 –7170 (2021). https://doi.org/10.1007/s11227-020-03526-1 Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Muchun Lu, Yunfeng Nie, and Wantao Liu "Distributed storage and retrieval of massive remote sensing images", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 1250634 (28 December 2022); https://doi.org/10.1117/12.2662384
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data storage

Remote sensing

Databases

Binary data

Zoom lenses

Computer programming

Image retrieval

RELATED CONTENT

Transform coding of image feature descriptors
Proceedings of SPIE (January 19 2009)
The key techniques on establishing image database
Proceedings of SPIE (August 08 2007)
DOM database based on ArcSDE
Proceedings of SPIE (November 10 2008)

Back to Top