Implementing Parallel and Distributed Computing Technologies for Processing Large-Scale GIS Datasets
Description of the research
As the availability of detailed GIS datasets and related remote sensing imagery grows, there is an increasing problem of providing sufficient computational resources to assemble and analyze the datasets in a timely fashion. For example, the slope analysis for a watershed can take up to ten hours on a single computer with a relatively fast processor. Use of real-time imaging and modeling techniques with GIS will be greatly inhibited by these limitations.
Parallel and distributed computing has been applied effectively to a number of similar computational problems. Thus, there are already a number of tools available to assist with optimizing code for parallel operations. The key to the problem is an understanding of the potential bottlenecks to completing all of the required processes. By sharing the computational task across processors in multiprocessor or cluster machines, one may alleviate the bottleneck associated with the processing. However, in large datasets, a secondary bottleneck may emerge in the form of moving data through a single Input/Output channel. In this case, parallel I/O hardware and routines may be called for.
The use of these approaches is becoming more widespread as researchers take advantage of inexpensive commodity processors to build multiprocessor Beowulf cluster computers with extremely high processing capacities but at relatively low cost. Dividing the problem in this way also allows one to easily scale problems from single to a few to a very large number of processors.
Parallel I/O routines can also be adapted to undertaking computations that are distributed amongst geographically dispersed machines. In this case, the bottleneck would be the network connecting those machines. In that instance, emerging tools for grid computing that undertake assessments of the data storage locations and capacities, the network bandwidth, and the processing capabilities closest to the stored datasets will need to be applied to the problem.
Importance to the Nation
Solutions that integrate these innovations will facilitate the implementation of real-time GIS solutions to natural disasters, environmental emergencies, homeland security, and the ability to address large-scale problems at a level of detail not possible previously.
Proposed research projects:
- Extend the previous research on parallel processing in GIS to involve parallel processing, parallel I/O processing, and distributed computing solutions of large-scale data analysis problems.
- Adapt emerging grid computing and portal tools to GIS processing tasks.
- Define and test the requirements for computational, storage, and network resource discovery to manage the allocation of those resources to complex GIS problems.
|