Skip to main content

Overview

PostGIS extends PostgreSQL with spatial capabilities, providing advanced geospatial data types, functions, and indexing. It serves as the primary data store for GeoFlow’s spatial data, enabling efficient storage and querying of geospatial information.

Technology Stack

  • Database: PostgreSQL 15 with PostGIS 3.3 extension
  • Spatial Types: Geometry, Geography, Raster data types
  • Indexing: GiST and SP-GiST spatial indexes
  • Functions: 300+ spatial functions for analysis
  • Standards: OGC Simple Features and SQL/MM standards

Key Features

Spatial Data Types

  • Geometry: Cartesian coordinate system geometries
  • Geography: Spherical coordinate system for global data
  • Raster: Grid-based spatial data (DEM, satellite imagery)
  • Topology: Topological relationships and networks

Spatial Operations

  • Measurement: Area, distance, perimeter calculations
  • Relationships: Contains, intersects, touches, overlaps
  • Operations: Buffer, union, intersection, difference
  • Transformations: Reprojection, scaling, rotation

Advanced Features

  • Spatial Indexing: R-tree indexing for fast queries
  • Clustering: Group nearby features for analysis
  • Partitioning: Divide large datasets for performance
  • Replication: High availability and read scaling

Architecture

Database Schema

-- Enable PostGIS extension
CREATE EXTENSION IF NOT EXISTS postgis;

-- Spatial reference systems
SELECT * FROM spatial_ref_sys LIMIT 5;

-- Create spatial table
CREATE TABLE geospatial_data (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    geom GEOMETRY(POINT, 4326),
    properties JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Create spatial index
CREATE INDEX idx_geospatial_data_geom
ON geospatial_data USING GIST (geom);

Table Organization

  • Feature Tables: Store individual spatial features
  • Metadata Tables: Store dataset information and properties
  • Index Tables: Spatial indexes for query performance
  • System Tables: PostGIS internal tables and views

Connection Management

  • Connection Pooling: Efficient connection reuse
  • Prepared Statements: Cached query plans
  • Transaction Management: ACID compliance
  • Lock Management: Minimize blocking operations

Configuration

Docker Configuration

postgres:
  image: postgis/postgis:15-3.3
  environment:
    POSTGRES_DB: geoflow
    POSTGRES_USER: geoflow
    POSTGRES_PASSWORD: password
    POSTGRES_INITDB_ARGS: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
  ports:
    - "5432:5432"
  volumes:
    - postgres_data:/var/lib/postgresql/data
    - ./scripts/init-postgis.sql:/docker-entrypoint-initdb.d/init-postgis.sql
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U geoflow -d geoflow"]
    interval: 10s
    timeout: 5s
    retries: 5

Initialization Script

-- init-postgis.sql
-- Enable required extensions
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS postgis_raster;
CREATE EXTENSION IF NOT EXISTS postgis_topology;

-- Create spatial reference system for common projections
INSERT INTO spatial_ref_sys (srid, proj4text, srtext)
VALUES (
    900913,
    '+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs',
    'PROJCS["Google Maps Global Mercator",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Mercator_1SP"],PARAMETER["central_meridian",0],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["X",EAST],AXIS["Y",NORTH],AUTHORITY["EPSG","900913"]]'
) ON CONFLICT (srid) DO NOTHING;

Connection Settings

# Database connection
DATABASE_URL=postgresql://geoflow:password@localhost:5432/geoflow

# Connection pool
DB_POOL_SIZE=10
DB_POOL_MAX_IDLE_TIME=30000
DB_POOL_MAX_LIFETIME=600000

# Query timeout
DB_QUERY_TIMEOUT=30000

Spatial Operations

Basic Operations

-- Insert spatial data
INSERT INTO points (name, geom)
VALUES ('Location A', ST_GeomFromText('POINT(-122.4194 37.7749)', 4326));

-- Query within distance
SELECT name, ST_AsText(geom)
FROM points
WHERE ST_DWithin(geom, ST_GeomFromText('POINT(-122.4 37.7)', 4326), 1000);

-- Calculate area
SELECT name, ST_Area(geom) as area_sq_meters
FROM polygons
WHERE ST_Area(geom) > 10000;

Advanced Queries

-- Spatial join
SELECT a.name, b.name, ST_Distance(a.geom, b.geom) as distance
FROM points a, polygons b
WHERE ST_Contains(b.geom, a.geom);

-- Buffer analysis
SELECT name, ST_Buffer(geom, 100) as buffered_geom
FROM points
WHERE category = 'poi';

-- Intersection analysis
SELECT ST_Intersection(a.geom, b.geom) as intersection
FROM layer_a a, layer_b b
WHERE ST_Intersects(a.geom, b.geom);

Raster Operations

-- Create raster table
CREATE TABLE elevation (
    rid SERIAL PRIMARY KEY,
    rast RASTER
);

-- Query raster values
SELECT ST_Value(rast, ST_GeomFromText('POINT(-122.4 37.7)', 4326))
FROM elevation
WHERE ST_Intersects(rast, ST_GeomFromText('POINT(-122.4 37.7)', 4326));

-- Raster algebra
SELECT ST_MapAlgebra(rast1, rast2, 'rast1 + rast2') as sum_raster
FROM elevation1, elevation2;

Performance Optimization

Indexing Strategies

-- GiST index for spatial queries
CREATE INDEX idx_points_geom ON points USING GIST (geom);

-- Functional index for distance queries
CREATE INDEX idx_points_geom_dist ON points USING GIST (ST_Expand(geom, 1000));

-- BRIN index for temporal data
CREATE INDEX idx_data_time ON geospatial_data USING BRIN (created_at);

Query Optimization

  • Bounding Box Queries: Use && operator for fast filtering
  • Index-Only Scans: Include indexed columns in SELECT
  • Prepared Statements: Cache query plans for repeated queries
  • Partitioning: Divide large tables by spatial regions

Maintenance

-- Analyze table statistics
ANALYZE geospatial_data;

-- Reindex spatial indexes
REINDEX INDEX idx_points_geom;

-- Vacuum for space reclamation
VACUUM ANALYZE geospatial_data;

-- Cluster on spatial index
CLUSTER geospatial_data USING idx_points_geom;

Backup and Recovery

Backup Strategies

# Full database backup
pg_dump -h localhost -U geoflow -d geoflow > backup.sql

# Compressed backup
pg_dump -h localhost -U geoflow -d geoflow | gzip > backup.sql.gz

# Schema-only backup
pg_dump -h localhost -U geoflow -d geoflow --schema-only > schema.sql

Point-in-Time Recovery

-- Enable WAL archiving
ALTER SYSTEM SET wal_level = replica;
ALTER SYSTEM SET archive_mode = on;
ALTER SYSTEM SET archive_command = 'cp %p /var/lib/postgresql/archive/%f';

-- Create base backup
SELECT pg_start_backup('base_backup');
-- Copy data directory
SELECT pg_stop_backup();

High Availability

  • Streaming Replication: Real-time data synchronization
  • Failover: Automatic promotion of standby servers
  • Load Balancing: Distribute read queries across replicas
  • Connection Pooling: PgBouncer for connection management

Monitoring

System Monitoring

-- Active connections
SELECT * FROM pg_stat_activity;

-- Database size
SELECT pg_size_pretty(pg_database_size('geoflow'));

-- Table sizes
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Spatial-Specific Monitoring

-- Index usage
SELECT * FROM pg_stat_user_indexes
WHERE schemaname = 'public';

-- Spatial index statistics
SELECT * FROM pg_stat_user_indexes
WHERE indexname LIKE '%geom%';

-- Query performance
EXPLAIN ANALYZE
SELECT name FROM points
WHERE ST_DWithin(geom, ST_GeomFromText('POINT(0 0)', 4326), 1000);

Security

Access Control

-- Create read-only user
CREATE USER readonly_user WITH PASSWORD 'password';
GRANT CONNECT ON DATABASE geoflow TO readonly_user;
GRANT USAGE ON SCHEMA public TO readonly_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_user;

-- Row-level security
ALTER TABLE sensitive_data ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_data ON sensitive_data
FOR ALL USING (user_id = current_user_id());

Data Encryption

  • SSL Connections: Encrypt client-server communication
  • Data Encryption: Encrypt sensitive spatial data
  • Backup Encryption: Encrypt database backups
  • Key Management: Secure key storage and rotation

Audit Logging

-- Enable audit logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ';

-- Audit spatial operations
CREATE OR REPLACE FUNCTION audit_spatial_operations()
RETURNS trigger AS $$
BEGIN
    INSERT INTO audit_log (table_name, operation, geom, user_id, timestamp)
    VALUES (TG_TABLE_NAME, TG_OP, NEW.geom, current_user_id(), NOW());
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON geospatial_data
FOR EACH ROW EXECUTE FUNCTION audit_spatial_operations();

Troubleshooting

Common Issues

Out of Memory: Increase PostgreSQL memory settings Slow Queries: Check query plans and add appropriate indexes Connection Issues: Verify connection pool settings Disk Space: Monitor table and index sizes

Performance Tuning

-- Increase work memory
ALTER SYSTEM SET work_mem = '256MB';

-- Increase maintenance work memory
ALTER SYSTEM SET maintenance_work_mem = '512MB';

-- Optimize autovacuum
ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.1;

-- Enable parallel queries
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;

Diagnostic Queries

-- Check PostGIS installation
SELECT PostGIS_Version();

-- List spatial tables
SELECT * FROM geometry_columns;

-- Check spatial indexes
SELECT * FROM pg_indexes WHERE indexdef LIKE '%gist%';

-- Monitor long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - pg_stat_activity.query_start > interval '1 minute';

Integration with GeoFlow

Convex Integration

PostGIS integrates with Convex through:
  • Direct Queries: Convex can query PostGIS for spatial data
  • Data Synchronization: Keep spatial data in sync between systems
  • Complex Queries: Leverage PostGIS functions in Convex queries
  • Performance: Spatial indexing for fast queries

Worker Integration

The PDAL worker uses PostGIS for:
  • Result Storage: Store processed spatial data
  • Reference Data: Access existing spatial datasets
  • Quality Assurance: Validate processing results
  • Metadata: Store processing metadata and statistics

Application Integration

The GeoFlow app interacts with PostGIS through:
  • Data Visualization: Display spatial data on maps
  • Query Interface: Allow users to construct spatial queries
  • Export Functions: Generate spatial data exports
  • Analysis Tools: Provide spatial analysis capabilities