Hurricane Landfall Pipeline

This guide covers running the Hurricane Landfall Forecasting pipeline locally for development and testing.

Overview

The hurricane landfall forecasting pipeline predicts where hurricane eyes will make landfall using historical HURDAT2 data from NOAA.

Pipeline Steps

Step	Command	Description
Download	`download`	Downloads HURDAT2 data and creates training features
Train	`train`	Trains landfall classifier and location regressors
Predict	`predict`	Generates predictions for all historical hurricanes
Visualize	`visualize`	Creates maps and error plots
All	`run-all`	Runs all steps in sequence

Model Performance

Landfall Classification: 95.3% test accuracy
Location Prediction:
- Latitude MAE: 2.19°
- Longitude MAE: 2.71°
- Best prediction: 37.7 km error

Quick Start

Option 1: Using Docker (Recommended)

# Pull or build the container
cd src/hurricane-landfall
docker build -t hurricane-landfall:1.0.0 .

# Run the full pipeline
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 run-all --base-dir /data

Option 2: Using Python Directly

# Install the package
cd src/hurricane-landfall
pip install -e .

# Run the pipeline
hurricane-landfall run-all --base-dir ./output

Development Setup

Prerequisites

Python >= 3.10
Docker (optional, for containerized runs)
MLflow (optional, for experiment tracking)

Installing Dependencies

cd src/hurricane-landfall

# Install with development dependencies
pip install -e ".[dev]"

# Or just runtime dependencies
pip install -e .

Package Structure

src/hurricane-landfall/
├── hurricane_landfall/           # Main package
│   ├── __init__.py
│   ├── cli.py                   # Command-line interface
│   ├── s01_download_data.py     # Data download and processing
│   ├── s02_train_model.py       # Model training
│   ├── s03_predict_landfall.py  # Prediction generation
│   └── s04_visualize.py         # Visualization
├── tests/                        # Unit tests
├── Dockerfile                    # Container definition
├── pyproject.toml               # Package configuration
├── setup.py                     # Setup script
└── requirements.txt             # Dependencies

Running Individual Steps

Step 1: Download Data

Downloads HURDAT2 hurricane track data from NOAA and processes it into training features.

# Using CLI
hurricane-landfall download --base-dir ./output

# Using Python
python -m hurricane_landfall.s01_download_data

Output:

data/raw/hurdat2.txt - Raw HURDAT2 data
data/processed/hurricane_tracks.csv - Processed track data
data/processed/training_data.csv - Training features

Step 2: Train Models

Trains the landfall prediction models and logs to MLflow.

# Using CLI
hurricane-landfall train --base-dir ./output

# Using Python
python -m hurricane_landfall.s02_train_model

Output:

models/landfall_classifier.joblib - Binary classifier
models/landfall_lat_regressor.joblib - Latitude predictor
models/landfall_lon_regressor.joblib - Longitude predictor
MLflow run with metrics and artifacts

Step 3: Generate Predictions

Generates landfall predictions for all historical hurricanes.

# Using CLI
hurricane-landfall predict --base-dir ./output

# Using specific MLflow run
hurricane-landfall predict --base-dir ./output --run-id <run_id>

Output:

predictions/all_hurricane_predictions.csv - All predictions
predictions/landfall_predictions_summary.csv - Landfall summary

Step 4: Create Visualizations

Generates visualization plots and summary reports.

# Using CLI
hurricane-landfall visualize --base-dir ./output

Output:

plots/error_histogram.png - Error distribution
plots/all_landfalls_map.png - Map of predictions
plots/error_by_category.png - Errors by hurricane category
predictions/summary_report.txt - Text summary

Running with MLflow Tracking

Start MLflow Server

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 \
    --port 5001

Configure Pipeline

Create or edit config.json:

{
    "mlflow": {
        "tracking_uri": "http://localhost:5001",
        "experiment_name": "hurricane_landfall"
    },
    "model": {
        "test_size": 0.2,
        "random_state": 42,
        "forecast_horizon_hours": 24
    }
}

View Results

Open http://localhost:5001 to see:

Training runs with metrics
Model artifacts
Parameter comparisons

Running Tests

cd src/hurricane-landfall

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ -v --cov=hurricane_landfall

# Run specific test file
pytest tests/test_s02_train_model.py -v

Docker Development

Build Container

cd src/hurricane-landfall
docker build -t hurricane-landfall:1.0.0 .

Run Container

# Run full pipeline
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 run-all --base-dir /data

# Run individual steps
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 download --base-dir /data
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 train --base-dir /data

# Interactive shell
docker run -it --entrypoint /bin/bash hurricane-landfall:1.0.0

Using Docker Compose

For development with MLflow:

cd deploy/local/asus/03-application/hurricane
docker-compose up -d

This starts:

MLflow server on port 5001
Hurricane landfall pipeline

Troubleshooting

Common Issues

1. Import errors when running scripts directly

Always install the package first:

pip install -e .

2. MLflow connection refused

Ensure MLflow server is running:

curl http://localhost:5001/health

3. Missing data files

Run the download step first:

hurricane-landfall download --base-dir ./output

4. Out of memory during training

Reduce the dataset or use smaller batch sizes in config.

Getting Help

Check logs in ./output/logs/
View MLflow UI for experiment details
Review test output for failures

Next Steps

Deploy to production - Promote to AWS
Contribute improvements - Submit PRs
Add tests - Improve coverage