Overview
The PDAL Worker is a specialized service for processing point cloud and geospatial data using PDAL (Point Data Abstraction Library) and GDAL. It handles computationally intensive tasks like LiDAR processing, format conversion, and spatial analysis.Technology Stack
- Core Library: PDAL for point cloud processing
- Spatial Library: GDAL for raster and vector data
- Runtime: Node.js with worker threads
- Processing: Native C++ libraries via bindings
- Storage: Local filesystem with cloud storage support
Key Features
Point Cloud Processing
- Format Conversion: LAS, LAZ, PLY, and other point cloud formats
- Filtering: Classification, noise removal, ground extraction
- Transformation: Coordinate system conversion, scaling, rotation
- Analysis: Density calculation, height statistics, feature extraction
Raster Processing
- Format Support: GeoTIFF, JPEG2000, ECW, and other raster formats
- Reprojection: Coordinate system transformations
- Mosaicking: Combining multiple raster datasets
- Analysis: Terrain analysis, vegetation indices, change detection
Vector Processing
- Format Conversion: Shapefile, GeoJSON, GML, and other vector formats
- Geometry Operations: Buffering, intersection, union operations
- Attribute Processing: Field calculations and data manipulation
- Spatial Analysis: Proximity analysis, overlay operations
Architecture
Processing Pipeline
Project Structure
Processing Engines
PDAL Engine
Handles point cloud data with operations like:- Readers: LAS, LAZ, PLY, PCD file formats
- Writers: Output to various point cloud formats
- Filters: Classification, decimation, outlier removal
- Pipelines: Chain multiple operations together
GDAL Engine
Handles raster and vector data with operations like:- Raster: GeoTIFF processing, reprojection, mosaicking
- Vector: Shapefile operations, geometry processing
- Analysis: Terrain analysis, spatial statistics
Configuration
Environment Variables
Docker Configuration
API Endpoints
Processing Jobs
Job Management
Development
Local Development
Testing Processing
Performance Optimization
Memory Management
- Streaming Processing: Large files processed in chunks
- Temporary Files: Intermediate results stored efficiently
- Resource Limits: Configurable memory and CPU limits
- Cleanup: Automatic cleanup of temporary files
Parallel Processing
- Worker Threads: Multiple processing threads for CPU-intensive tasks
- Job Queue: Asynchronous job processing with prioritization
- Load Balancing: Distribute work across multiple worker instances
- Resource Monitoring: Track CPU, memory, and disk usage
Caching Strategies
- Result Caching: Cache frequently used processing results
- Pipeline Optimization: Reuse optimized processing pipelines
- Format Detection: Cache file format information
Supported Formats
Point Cloud Formats
- LAS/LAZ: Standard LiDAR formats
- PLY: Polygon file format
- PCD: Point Cloud Data format
- TXT: ASCII point cloud data
- E57: ASTM E57 file format
Raster Formats
- GeoTIFF: Georeferenced TIFF
- JPEG2000: Compressed raster format
- ECW: Enhanced Compression Wavelet
- MrSID: Multi-resolution Seamless Image Database
- NetCDF: Network Common Data Form
Vector Formats
- Shapefile: ESRI shapefile format
- GeoJSON: JSON-based geospatial format
- GML: Geography Markup Language
- KML: Keyhole Markup Language
- GPX: GPS Exchange Format
Error Handling
Processing Errors
- File Corruption: Detect and report corrupted input files
- Format Mismatch: Validate input file formats
- Processing Failures: Detailed error messages with recovery suggestions
- Timeout Handling: Automatic job cancellation on timeouts
Recovery Strategies
- Retry Logic: Automatic retry for transient failures
- Partial Results: Save intermediate results on failures
- Rollback: Clean up failed processing attempts
- Logging: Comprehensive error logging for debugging
Monitoring & Metrics
Health Checks
- Service Health: HTTP endpoint for load balancer health checks
- Processing Status: Real-time job status monitoring
- Resource Usage: CPU, memory, and disk usage tracking
- Queue Status: Job queue length and processing rates
Logging
- Structured Logs: JSON-formatted logs for analysis
- Log Levels: Configurable verbosity (debug, info, warn, error)
- Performance Logs: Processing time and resource usage
- Error Tracking: Automatic error aggregation and alerting
Security Considerations
File Access
- Path Validation: Prevent directory traversal attacks
- File Type Validation: Strict input file type checking
- Size Limits: Configurable maximum file sizes
- Quarantine: Suspicious files isolated for inspection
Processing Security
- Sandboxing: Processing runs in isolated environments
- Resource Limits: Prevent resource exhaustion attacks
- Input Sanitization: Validate all processing parameters
- Access Control: API authentication and authorization