Skip to main content

Monorepo Structure

GeoFlow is organized as a monorepo using Turbo for build orchestration and Bun as the package manager. The repository contains multiple applications and packages:
geoflow/
├── apps/
│   ├── geoflow/          # Main React application
│   ├── backend/          # Convex backend configuration
│   ├── motia/            # Workflow orchestration engine
│   └── docs/             # Documentation (Mintlify)
├── packages/
│   ├── worker/           # PDAL processing service
│   ├── mcp/              # Model Context Protocol server
│   ├── shared/           # Shared utilities and types
│   └── config/           # TypeScript and other configurations
└── scripts/              # Database initialization scripts

Service Architecture

GeoFlow runs as a distributed system with several interconnected services, orchestrated via Docker Compose.

Core Services

GeoFlow App (apps/geoflow)

  • Technology: React 19, TanStack Router, Vite
  • Purpose: Web interface for workflow design, data visualization, and system monitoring
  • Port: 3000 (development)
  • Key Features:
    • Drag-and-drop workflow builder
    • Real-time execution monitoring
    • Data upload and download
    • User authentication and authorization

Convex Backend (apps/backend)

  • Technology: Convex (self-hosted)
  • Purpose: Data persistence, real-time subscriptions, and authentication
  • Ports: 3210 (backend), 3211 (site proxy), 6791 (dashboard)
  • Key Features:
    • Real-time data synchronization
    • User management and authentication
    • Workflow and execution metadata storage
    • File upload coordination

PostgreSQL + PostGIS (postgres)

  • Technology: PostgreSQL 15 with PostGIS 3.3
  • Purpose: Spatial data storage and querying
  • Port: 5432
  • Key Features:
    • Geospatial data types and functions
    • Spatial indexing and queries
    • Coordinate system transformations
    • Large dataset handling

Processing Services

PDAL Worker (packages/worker)

  • Technology: Node.js, PDAL, GDAL
  • Purpose: High-performance point cloud and geospatial data processing
  • Port: 3002
  • Key Features:
    • LiDAR data processing
    • Point cloud filtering and transformation
    • Raster processing
    • Format conversion (LAS, LAZ, GeoTIFF, etc.)

Motia Workflow Engine (apps/motia)

  • Technology: Motia framework, Node.js
  • Purpose: Orchestrates complex geospatial processing pipelines
  • Port: 4010
  • Key Features:
    • Event-driven workflow execution
    • Step-based processing pipelines
    • Error handling and retry logic
    • Parallel processing capabilities

Supporting Services

MCP Server (packages/mcp)

  • Technology: Python, FastAPI, GeoPandas, PySAL
  • Purpose: AI-powered geospatial analysis functions
  • Port: 8000
  • Key Features:
    • Spatial statistics and analysis
    • Machine learning model integration
    • Geospatial AI assistants
    • Custom analysis functions

Data Flow

Workflow Execution Flow

  1. Workflow Design: User creates workflow in GeoFlow App
  2. Storage: Workflow definition stored in Convex
  3. Trigger: Workflow execution initiated via API or UI
  4. Orchestration: Motia Engine coordinates execution steps
  5. Processing: Individual steps executed by appropriate services
  6. Data Storage: Results stored in PostgreSQL/PostGIS
  7. Notification: Real-time updates sent to UI via Convex

Data Storage Strategy

  • Metadata: Workflow definitions, execution logs, user data → Convex
  • Spatial Data: Geospatial datasets, processed results → PostGIS
  • Files: Raw uploads, temporary processing files → Local storage
  • Cache: Frequently accessed data → Redis (planned)

Development vs Production

Development Mode

  • Hot reloading enabled for all services
  • Volume mounts for live code changes
  • Simplified configurations
  • Debug logging enabled
  • Local database with sample data

Production Mode

  • Optimized builds and images
  • Environment-specific configurations
  • Proper secrets management
  • Monitoring and logging
  • Backup strategies

Networking and Communication

Services communicate through:
  • HTTP APIs: RESTful endpoints for service-to-service communication
  • WebSockets: Real-time updates via Convex
  • Database Connections: Direct PostgreSQL connections for data access
  • File System: Shared volumes for large data transfers
  • Docker Networks: Isolated service communication

Scalability Considerations

  • Horizontal Scaling: Most services can be scaled horizontally
  • Load Balancing: Nginx or similar for API services
  • Database Sharding: PostGIS supports partitioning for large datasets
  • Caching: Redis integration planned for performance optimization
  • Storage: Support for S3-compatible storage for large files

Security Architecture

  • Authentication: JWT tokens via Better Auth
  • Authorization: Role-based access control
  • Network Security: Service isolation via Docker networks
  • Data Encryption: TLS for external communications
  • Secrets Management: Environment variables and Docker secrets