Programmatic Access to OpenTopography's Point Cloud Data with Tile Indexes

Nov 24, 2025

OpenTopography hosts hundreds of high-resolution point cloud datasets covering more than 500,000 km² globally. While our web-based data portal provides intuitive point-and-click access to these data, researchers increasingly need programmatic methods to build automated, reproducible workflows. We're excited to share a new Jupyter notebook tutorial that demonstrates how to efficiently access OpenTopography's point cloud data using tile indexes and cloud-native streaming techniques.

The Challenge: Accessing Point Cloud Data Programmatically

OpenTopography's lidar datasets are organized into tiles—individual LAZ files that each cover a small geographic area. While this tiling structure optimizes data storage, it presents a challenge for programmatic access: how do you identify and retrieve only the tiles you need for your area of interest without manually browsing through hundreds or thousands of files?

The Solution: Tile Indexes and Cloud Streaming

Our new tutorial introduces two complementary approaches for programmatic point cloud data access:

1. Tile Index Discovery and Usage

For each hosted point cloud dataset, OpenTopography provides a tile index—a shapefile containing the spatial extent and download URL for every tile in the dataset. There are three different methods for accessing these tile indexes:

  • Direct download from dataset landing pages
  • Access through OpenTopgraphy's bulk data repository
  • Programmatic URL construction using dataset metadata

By spatially querying the tile index against your area of interest, you can identify exactly which tiles you need before downloading any data.

Tiled Point Clouds
The notebook works through how to select LAZ tiles (boundaries in blue) that intersect a given area of interest (plotted in red). In this example, downloaded LAZ files are colored by classification, where unclassified points are plotted in gray, and points classified as ground are plotted in brown.


2. Cloud-Native Streaming with PDAL

Beyond simple tile identification, the tutorial showcases an advanced workflow using the Point Data Abstraction Library (PDAL) to:

  • Stream point cloud data directly from OpenTopography's S3-compatible cloud storage
  • Clip data to your precise area of interest on-the-fly
  • Merge multiple tiles into a single output file
  • Generate derivative products like Digital Terrain Models (DTMs)

This approach eliminates the need to download full tiles when you only need data from a small region, dramatically reducing bandwidth and storage requirements.

Merged Point Clouds
The second part of the notebook works through how to stream data, and download only the points that intersect a given area of interest. In this plot, points within an area of interest are colored by elevation, and are plot with the LAZ tile boundaries in blue for reference.


Tutorial Highlights

The notebook walks through a complete workflow using a case study from New Zealand's Waikato region:

  1. Dataset Discovery: Query OpenTopography's Data Catalog API to find datasets intersecting your area of interest
  2. Tile Index Access: Programmatically download and extract tile index shapefiles
  3. Spatial Filtering: Use GeoPandas to identify intersecting tiles
  4. Data Retrieval: Download individual LAZ files or stream directly using PDAL
  5. Product Generation: Create a merged, cloud-optimized point cloud (COPC) file and generate a DTM

All code is provided with detailed explanations, making it accessible to users with basic Python experience while demonstrating advanced techniques for those ready to build more sophisticated workflows.

Getting Started

The complete Jupyter notebook, including all code and documentation, is available on OpenTopography's GitHub repository:

The tutorial requires Python with standard geospatial libraries (GeoPandas, Shapely, Requests) plus PDAL for the advanced streaming workflows. Installation instructions and environment setup guidance are provided in the repository.

Looking Ahead

This tutorial represents part of OpenTopography's ongoing commitment to supporting diverse data access methods. Whether you're a student exploring point cloud data for the first time through our map interface, or a researcher building automated processing pipelines, we're working to ensure OpenTopography's data remain accessible and useful for your needs. As cloud-native data formats and access patterns continue to evolve, we'll keep updating our documentation and tools to help the community leverage these technologies effectively.


Resources:

Questions or suggestions about this notebook? Contact us at info@opentopography.org or open an issue on our GitHub repository.