Hurricane Landfall Pipeline

This guide covers running the Hurricane Landfall Forecasting pipeline locally for development and testing.

Overview

The hurricane landfall forecasting pipeline predicts where hurricane eyes will make landfall using historical HURDAT2 data from NOAA.

Pipeline Steps

StepCommandDescription
DownloaddownloadDownloads HURDAT2 data and creates training features
TraintrainTrains landfall classifier and location regressors
PredictpredictGenerates predictions for all historical hurricanes
VisualizevisualizeCreates maps and error plots
Allrun-allRuns all steps in sequence

Model Performance

  • Landfall Classification: 95.3% test accuracy
  • Location Prediction:
    • Latitude MAE: 2.19°
    • Longitude MAE: 2.71°
    • Best prediction: 37.7 km error

Quick Start

# Pull or build the container
cd src/hurricane-landfall
docker build -t hurricane-landfall:1.0.0 .

# Run the full pipeline
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 run-all --base-dir /data

Option 2: Using Python Directly

# Install the package
cd src/hurricane-landfall
pip install -e .

# Run the pipeline
hurricane-landfall run-all --base-dir ./output

Development Setup

Prerequisites

  • Python >= 3.10
  • Docker (optional, for containerized runs)
  • MLflow (optional, for experiment tracking)

Installing Dependencies

cd src/hurricane-landfall

# Install with development dependencies
pip install -e ".[dev]"

# Or just runtime dependencies
pip install -e .

Package Structure

src/hurricane-landfall/
├── hurricane_landfall/           # Main package
│   ├── __init__.py
│   ├── cli.py                   # Command-line interface
│   ├── s01_download_data.py     # Data download and processing
│   ├── s02_train_model.py       # Model training
│   ├── s03_predict_landfall.py  # Prediction generation
│   └── s04_visualize.py         # Visualization
├── tests/                        # Unit tests
├── Dockerfile                    # Container definition
├── pyproject.toml               # Package configuration
├── setup.py                     # Setup script
└── requirements.txt             # Dependencies

Running Individual Steps

Step 1: Download Data

Downloads HURDAT2 hurricane track data from NOAA and processes it into training features.

# Using CLI
hurricane-landfall download --base-dir ./output

# Using Python
python -m hurricane_landfall.s01_download_data

Output:

  • data/raw/hurdat2.txt - Raw HURDAT2 data
  • data/processed/hurricane_tracks.csv - Processed track data
  • data/processed/training_data.csv - Training features

Step 2: Train Models

Trains the landfall prediction models and logs to MLflow.

# Using CLI
hurricane-landfall train --base-dir ./output

# Using Python
python -m hurricane_landfall.s02_train_model

Output:

  • models/landfall_classifier.joblib - Binary classifier
  • models/landfall_lat_regressor.joblib - Latitude predictor
  • models/landfall_lon_regressor.joblib - Longitude predictor
  • MLflow run with metrics and artifacts

Step 3: Generate Predictions

Generates landfall predictions for all historical hurricanes.

# Using CLI
hurricane-landfall predict --base-dir ./output

# Using specific MLflow run
hurricane-landfall predict --base-dir ./output --run-id <run_id>

Output:

  • predictions/all_hurricane_predictions.csv - All predictions
  • predictions/landfall_predictions_summary.csv - Landfall summary

Step 4: Create Visualizations

Generates visualization plots and summary reports.

# Using CLI
hurricane-landfall visualize --base-dir ./output

Output:

  • plots/error_histogram.png - Error distribution
  • plots/all_landfalls_map.png - Map of predictions
  • plots/error_by_category.png - Errors by hurricane category
  • predictions/summary_report.txt - Text summary

Running with MLflow Tracking

Start MLflow Server

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 \
    --port 5001

Configure Pipeline

Create or edit config.json:

{
    "mlflow": {
        "tracking_uri": "http://localhost:5001",
        "experiment_name": "hurricane_landfall"
    },
    "model": {
        "test_size": 0.2,
        "random_state": 42,
        "forecast_horizon_hours": 24
    }
}

View Results

Open http://localhost:5001 to see:

  • Training runs with metrics
  • Model artifacts
  • Parameter comparisons

Running Tests

cd src/hurricane-landfall

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ -v --cov=hurricane_landfall

# Run specific test file
pytest tests/test_s02_train_model.py -v

Docker Development

Build Container

cd src/hurricane-landfall
docker build -t hurricane-landfall:1.0.0 .

Run Container

# Run full pipeline
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 run-all --base-dir /data

# Run individual steps
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 download --base-dir /data
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 train --base-dir /data

# Interactive shell
docker run -it --entrypoint /bin/bash hurricane-landfall:1.0.0

Using Docker Compose

For development with MLflow:

cd deploy/local/asus/03-application/hurricane
docker-compose up -d

This starts:

  • MLflow server on port 5001
  • Hurricane landfall pipeline

Troubleshooting

Common Issues

1. Import errors when running scripts directly

Always install the package first:

pip install -e .

2. MLflow connection refused

Ensure MLflow server is running:

curl http://localhost:5001/health

3. Missing data files

Run the download step first:

hurricane-landfall download --base-dir ./output

4. Out of memory during training

Reduce the dataset or use smaller batch sizes in config.

Getting Help

  • Check logs in ./output/logs/
  • View MLflow UI for experiment details
  • Review test output for failures

Next Steps