Skip to main content

Overview

The MCP (Model Context Protocol) Server provides AI-powered geospatial analysis capabilities using Python-based libraries. It integrates machine learning models, spatial statistics, and advanced analytics to enhance geospatial workflows with intelligent processing.

Technology Stack

  • Framework: FastAPI for high-performance Python APIs
  • Spatial Analysis: GeoPandas, PySAL for geospatial computations
  • Machine Learning: Scikit-learn, TensorFlow for ML models
  • Data Processing: Pandas, NumPy for data manipulation
  • Visualization: Matplotlib, Plotly for charts and maps

Key Features

Spatial Analysis

  • Statistical Analysis: Spatial autocorrelation, clustering analysis
  • Network Analysis: Transportation networks, connectivity analysis
  • Interpolation: Kriging, inverse distance weighting
  • Hotspot Analysis: Getis-Ord Gi*, local Moran’s I

Machine Learning

  • Classification: Land use classification, feature extraction
  • Regression: Predictive modeling for spatial phenomena
  • Clustering: Unsupervised learning for pattern discovery
  • Anomaly Detection: Outlier identification in spatial data

Geospatial Operations

  • Geometry Processing: Buffers, intersections, spatial joins
  • Coordinate Systems: Projection conversions and transformations
  • Spatial Indexing: R-tree indexing for efficient queries
  • Topology Analysis: Adjacency, containment, connectivity

Architecture

Service Components

Project Structure

packages/mcp/src/
├── geopandas_functions.py    # GeoPandas-based operations
├── kml_functions.py          # KML file processing
├── mapview_functions.py      # Map visualization
├── pysal_functions.py        # Spatial statistics
├── pyproj_functions.py       # Coordinate transformations
├── rasterio_functions.py     # Raster data processing
├── shapely_functions.py      # Geometry operations
├── mcp_instance.py           # Main MCP server
└── server.py                 # FastAPI application

API Design

The MCP server provides RESTful endpoints for:
  • Analysis Functions: Statistical and ML analysis operations
  • Data Processing: File format conversion and validation
  • Visualization: Generate maps and charts
  • Model Management: Load, train, and deploy ML models

Configuration

Environment Variables

# Service configuration
MCP_PORT=8000
MCP_HOST=0.0.0.0

# Data paths
GEOFLOW_DATA_PATH=/app/data
TEMP_DIR=/tmp/mcp

# Model settings
MODEL_CACHE_DIR=/app/models
MAX_MODEL_SIZE=1GB

# Performance
MAX_WORKERS=4
REQUEST_TIMEOUT=300

Docker Configuration

mcp:
  build:
    context: packages/mcp
    dockerfile: Dockerfile.mcp
  environment:
    - MCP_PORT=8000
  ports:
    - "8000:8000"
  volumes:
    - ./storage/data:/app/data:rw
  healthcheck:
    test: ["CMD", "curl", "--fail", "http://localhost:8000/"]
    interval: 10s
    timeout: 5s
    retries: 5

API Endpoints

Spatial Analysis

# Calculate spatial autocorrelation
POST /api/analysis/moran
{
  "data": "points.geojson",
  "variable": "population",
  "weights": "queen"
}

# Perform hotspot analysis
POST /api/analysis/hotspot
{
  "data": "crimes.geojson",
  "method": "getis-ord",
  "significance": 0.05
}

Machine Learning

# Train classification model
POST /api/ml/train
{
  "algorithm": "random_forest",
  "features": ["area", "perimeter", "compactness"],
  "target": "land_use_type",
  "training_data": "training.geojson"
}

# Predict with trained model
POST /api/ml/predict
{
  "model_id": "landuse_rf_001",
  "data": "prediction_data.geojson"
}

Data Processing

# Convert file formats
POST /api/convert
{
  "input": "data/input.shp",
  "output": "data/output.geojson",
  "target_format": "GeoJSON"
}

# Validate geospatial data
POST /api/validate
{
  "file": "data/dataset.geojson",
  "checks": ["geometry", "projection", "topology"]
}

Development

Local Development

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Start development server
python -m uvicorn server:app --reload --port 8000

# Run tests
pytest tests/

Adding New Functions

# geopandas_functions.py
from geopandas import GeoDataFrame
from shapely.geometry import Point

def buffer_analysis(gdf: GeoDataFrame, distance: float) -> GeoDataFrame:
    """
    Create buffer zones around geometries

    Args:
        gdf: Input GeoDataFrame
        distance: Buffer distance in coordinate units

    Returns:
        GeoDataFrame with buffered geometries
    """
    buffered = gdf.copy()
    buffered['geometry'] = gdf.geometry.buffer(distance)
    return buffered

Testing Functions

# tests/test_geopandas_functions.py
import pytest
from geopandas_functions import buffer_analysis

def test_buffer_analysis():
    # Create test data
    gdf = gpd.GeoDataFrame({
        'id': [1, 2],
        'geometry': [Point(0, 0), Point(1, 1)]
    })

    # Test buffer operation
    result = buffer_analysis(gdf, 0.5)

    # Assertions
    assert len(result) == 2
    assert result.geometry.iloc[0].area > 0

Supported Libraries

Core Libraries

  • GeoPandas: Geospatial data manipulation
  • PySAL: Spatial statistics and econometrics
  • Shapely: Geometric operations
  • Fiona: Geospatial file I/O

Machine Learning

  • Scikit-learn: Classical ML algorithms
  • TensorFlow: Deep learning models
  • PyTorch: Neural network frameworks
  • XGBoost: Gradient boosting

Visualization

  • Matplotlib: Static plots and charts
  • Plotly: Interactive visualizations
  • Folium: Leaflet-based maps
  • Geoplot: Geospatial plotting

Performance Optimization

Memory Management

  • Chunked Processing: Large datasets processed in chunks
  • Lazy Loading: Data loaded on demand
  • Garbage Collection: Explicit memory cleanup
  • Resource Limits: Configurable memory and CPU limits

Parallel Processing

  • Multiprocessing: CPU-intensive tasks run in parallel
  • Async Operations: Non-blocking I/O operations
  • Worker Pools: Pre-allocated worker processes
  • Load Balancing: Distribute work across CPU cores

Caching Strategies

  • Result Caching: Cache analysis results
  • Model Caching: Keep trained models in memory
  • Data Caching: Cache frequently accessed datasets
  • Computation Caching: Avoid redundant calculations

Model Management

Model Training

# Train and save model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Save model
joblib.dump(model, 'models/landuse_classifier.pkl')

# Register model
register_model('landuse_classifier', 'models/landuse_classifier.pkl')

Model Deployment

# Load model for inference
model = load_model('landuse_classifier')

# Make predictions
predictions = model.predict(new_data)

# Return results
return {
    'predictions': predictions.tolist(),
    'confidence': model.predict_proba(new_data).max(axis=1).tolist()
}

Model Monitoring

  • Performance Metrics: Accuracy, precision, recall
  • Drift Detection: Monitor data distribution changes
  • Resource Usage: Track memory and CPU usage
  • Prediction Latency: Monitor inference time

Error Handling

Validation Errors

def validate_input(data):
    if not isinstance(data, dict):
        raise ValueError("Input must be a dictionary")

    required_fields = ['geometry', 'properties']
    for field in required_fields:
        if field not in data:
            raise ValueError(f"Missing required field: {field}")

    return True

Processing Errors

  • Data Errors: Invalid geometries, missing values
  • Model Errors: Failed predictions, corrupted models
  • System Errors: Memory exhaustion, disk space issues
  • Network Errors: External service communication failures

Recovery Strategies

  • Fallback Models: Use backup models on failure
  • Partial Results: Return partial results when possible
  • Error Logging: Comprehensive error information
  • Graceful Degradation: Reduced functionality on errors

Security Considerations

Input Validation

  • Data Sanitization: Validate all input data
  • Type Checking: Ensure correct data types
  • Size Limits: Prevent oversized inputs
  • Content Filtering: Block malicious content

Access Control

  • API Authentication: Token-based authentication
  • Rate Limiting: Prevent abuse and DoS attacks
  • Request Validation: Schema validation for all inputs
  • Audit Logging: Track all API usage

Model Security

  • Model Validation: Verify model integrity
  • Input Bounds: Check input value ranges
  • Output Sanitization: Clean model outputs
  • Version Control: Track model versions and changes

Monitoring & Observability

Health Checks

@app.get("/health")
async def health_check():
    return {
        "status": "healthy",
        "timestamp": datetime.utcnow().isoformat(),
        "version": get_version(),
        "models_loaded": len(loaded_models)
    }

Metrics Collection

  • Request Metrics: Response times, error rates
  • Model Metrics: Prediction accuracy, latency
  • System Metrics: CPU, memory, disk usage
  • Data Metrics: Dataset sizes, processing times

Logging

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger(__name__)

# Log analysis completion
logger.info(f"Analysis completed for dataset {dataset_id}, duration: {duration}s")

Troubleshooting

Common Issues

Import Errors: Ensure all Python dependencies are installed Memory Errors: Increase Docker memory limits or use chunked processing Model Loading Failures: Check model file paths and permissions Performance Issues: Profile code and optimize bottlenecks

Debug Commands

# Check Python environment
python --version
pip list

# Test library imports
python -c "import geopandas; print('GeoPandas OK')"

# View service logs
docker compose logs mcp

# Test API endpoint
curl http://localhost:8000/health

# Profile memory usage
python -m memory_profiler script.py

Performance Tuning

  • Profiling: Use cProfile for performance analysis
  • Memory Profiling: Track memory usage with memory_profiler
  • Database Optimization: Index geospatial data appropriately
  • Algorithm Selection: Choose appropriate algorithms for data size