Hurricane Landfall Pipeline
This guide covers running the Hurricane Landfall Forecasting pipeline locally for development and testing.
Overview
The hurricane landfall forecasting pipeline predicts where hurricane eyes will make landfall using historical HURDAT2 data from NOAA.
Pipeline Steps
| Step | Command | Description |
|---|---|---|
| Download | download | Downloads HURDAT2 data and creates training features |
| Train | train | Trains landfall classifier and location regressors |
| Predict | predict | Generates predictions for all historical hurricanes |
| Visualize | visualize | Creates maps and error plots |
| All | run-all | Runs all steps in sequence |
Model Performance
- Landfall Classification: 95.3% test accuracy
- Location Prediction:
- Latitude MAE: 2.19°
- Longitude MAE: 2.71°
- Best prediction: 37.7 km error
Quick Start
Option 1: Using Docker (Recommended)
# Pull or build the container
cd src/hurricane-landfall
docker build -t hurricane-landfall:1.0.0 .
# Run the full pipeline
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 run-all --base-dir /data
Option 2: Using Python Directly
# Install the package
cd src/hurricane-landfall
pip install -e .
# Run the pipeline
hurricane-landfall run-all --base-dir ./output
Development Setup
Prerequisites
- Python >= 3.10
- Docker (optional, for containerized runs)
- MLflow (optional, for experiment tracking)
Installing Dependencies
cd src/hurricane-landfall
# Install with development dependencies
pip install -e ".[dev]"
# Or just runtime dependencies
pip install -e .
Package Structure
src/hurricane-landfall/
├── hurricane_landfall/ # Main package
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ ├── s01_download_data.py # Data download and processing
│ ├── s02_train_model.py # Model training
│ ├── s03_predict_landfall.py # Prediction generation
│ └── s04_visualize.py # Visualization
├── tests/ # Unit tests
├── Dockerfile # Container definition
├── pyproject.toml # Package configuration
├── setup.py # Setup script
└── requirements.txt # Dependencies
Running Individual Steps
Step 1: Download Data
Downloads HURDAT2 hurricane track data from NOAA and processes it into training features.
# Using CLI
hurricane-landfall download --base-dir ./output
# Using Python
python -m hurricane_landfall.s01_download_data
Output:
data/raw/hurdat2.txt- Raw HURDAT2 datadata/processed/hurricane_tracks.csv- Processed track datadata/processed/training_data.csv- Training features
Step 2: Train Models
Trains the landfall prediction models and logs to MLflow.
# Using CLI
hurricane-landfall train --base-dir ./output
# Using Python
python -m hurricane_landfall.s02_train_model
Output:
models/landfall_classifier.joblib- Binary classifiermodels/landfall_lat_regressor.joblib- Latitude predictormodels/landfall_lon_regressor.joblib- Longitude predictor- MLflow run with metrics and artifacts
Step 3: Generate Predictions
Generates landfall predictions for all historical hurricanes.
# Using CLI
hurricane-landfall predict --base-dir ./output
# Using specific MLflow run
hurricane-landfall predict --base-dir ./output --run-id <run_id>
Output:
predictions/all_hurricane_predictions.csv- All predictionspredictions/landfall_predictions_summary.csv- Landfall summary
Step 4: Create Visualizations
Generates visualization plots and summary reports.
# Using CLI
hurricane-landfall visualize --base-dir ./output
Output:
plots/error_histogram.png- Error distributionplots/all_landfalls_map.png- Map of predictionsplots/error_by_category.png- Errors by hurricane categorypredictions/summary_report.txt- Text summary
Running with MLflow Tracking
Start MLflow Server
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 \
--port 5001
Configure Pipeline
Create or edit config.json:
{
"mlflow": {
"tracking_uri": "http://localhost:5001",
"experiment_name": "hurricane_landfall"
},
"model": {
"test_size": 0.2,
"random_state": 42,
"forecast_horizon_hours": 24
}
}
View Results
Open http://localhost:5001 to see:
- Training runs with metrics
- Model artifacts
- Parameter comparisons
Running Tests
cd src/hurricane-landfall
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=hurricane_landfall
# Run specific test file
pytest tests/test_s02_train_model.py -v
Docker Development
Build Container
cd src/hurricane-landfall
docker build -t hurricane-landfall:1.0.0 .
Run Container
# Run full pipeline
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 run-all --base-dir /data
# Run individual steps
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 download --base-dir /data
docker run -v $(pwd)/output:/data hurricane-landfall:1.0.0 train --base-dir /data
# Interactive shell
docker run -it --entrypoint /bin/bash hurricane-landfall:1.0.0
Using Docker Compose
For development with MLflow:
cd deploy/local/asus/03-application/hurricane
docker-compose up -d
This starts:
- MLflow server on port 5001
- Hurricane landfall pipeline
Troubleshooting
Common Issues
1. Import errors when running scripts directly
Always install the package first:
pip install -e .
2. MLflow connection refused
Ensure MLflow server is running:
curl http://localhost:5001/health
3. Missing data files
Run the download step first:
hurricane-landfall download --base-dir ./output
4. Out of memory during training
Reduce the dataset or use smaller batch sizes in config.
Getting Help
- Check logs in
./output/logs/ - View MLflow UI for experiment details
- Review test output for failures
Next Steps
- Deploy to production - Promote to AWS
- Contribute improvements - Submit PRs
- Add tests - Improve coverage