Hurricane AWS Deployment

This guide covers deploying the Hurricane Landfall Forecasting pipeline to AWS.

Overview

The Hurricane Landfall pipeline can be deployed to AWS using:

ECR: Container registry for the pipeline image
EC2: Ephemeral instances for running the pipeline
S3: Storage for data, models, and predictions
SageMaker: (Optional) Managed inference endpoints

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         AWS Account                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────┐    ┌─────────────────────────────────┐    │
│  │      ECR        │    │         EC2 Instance            │    │
│  │ hurricane-      │───▶│  ┌─────────────────────────┐    │    │
│  │ landfall:1.0.0  │    │  │  hurricane-landfall     │    │    │
│  └─────────────────┘    │  │  container              │    │    │
│                         │  └───────────┬─────────────┘    │    │
│                         │              │                   │    │
│                         │  ┌───────────▼─────────────┐    │    │
│                         │  │    MLflow Server        │    │    │
│                         │  │    (port 5000)          │    │    │
│                         │  └─────────────────────────┘    │    │
│                         └──────────────┬──────────────────┘    │
│                                        │                        │
│  ┌─────────────────────────────────────▼────────────────────┐  │
│  │                    S3 Bucket                              │  │
│  │  ┌──────────────┐  ┌──────────┐  ┌──────────────────┐   │  │
│  │  │ hurdat2/     │  │ models/  │  │ predictions/     │   │  │
│  │  │ raw data     │  │ joblib   │  │ CSV outputs      │   │  │
│  │  └──────────────┘  └──────────┘  └──────────────────────┘   │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

AWS CLI configured with appropriate permissions
Docker installed and running
Terraform >= 1.0.0 (for infrastructure)
Local pipeline tested successfully

Step 1: Push Container to ECR

Create ECR Repository

# Create repository
aws ecr create-repository \
    --repository-name hurricane-landfall \
    --image-scanning-configuration scanOnPush=true

# Get login credentials
aws ecr get-login-password --region us-east-1 | \
    docker login --username AWS --password-stdin \
    <account-id>.dkr.ecr.us-east-1.amazonaws.com

Tag and Push Image

# Tag the local image
docker tag hurricane-landfall:1.0.0 \
    <account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:1.0.0

# Push to ECR
docker push \
    <account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:1.0.0

Verify Push

aws ecr describe-images \
    --repository-name hurricane-landfall \
    --query 'imageDetails[*].{Tag:imageTags,Pushed:imagePushedAt}'

Step 2: Create S3 Bucket

# Create bucket for data and artifacts
aws s3 mb s3://<account-id>-hurricane-landfall --region us-east-1

# Create folder structure
aws s3api put-object --bucket <account-id>-hurricane-landfall --key data/
aws s3api put-object --bucket <account-id>-hurricane-landfall --key models/
aws s3api put-object --bucket <account-id>-hurricane-landfall --key predictions/
aws s3api put-object --bucket <account-id>-hurricane-landfall --key mlruns/

Step 3: Deploy Infrastructure

Using Terraform

cd deploy/aws/064592191516/us-east-1/hurricane-landfall/01-infrastructure

terraform init
terraform plan
terraform apply

Manual EC2 Launch

# Launch instance with user data
aws ec2 run-instances \
    --image-id ami-0c55b159cbfafe1f0 \
    --instance-type t3.large \
    --key-name mlops-pipeline-key \
    --security-group-ids sg-xxx \
    --iam-instance-profile Name=mlops-pipeline-role \
    --user-data file://userdata.sh \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=hurricane-landfall}]'

User Data Script

Create userdata.sh:

#!/bin/bash
set -e

# Install Docker
yum update -y
amazon-linux-extras install docker -y
service docker start
usermod -a -G docker ec2-user

# Login to ECR
aws ecr get-login-password --region us-east-1 | \
    docker login --username AWS --password-stdin \
    <account-id>.dkr.ecr.us-east-1.amazonaws.com

# Pull container
docker pull <account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:1.0.0

# Create directories
mkdir -p /data/{raw,processed,models,predictions,plots}

# Run pipeline
docker run \
    -v /data:/data \
    -e AWS_DEFAULT_REGION=us-east-1 \
    <account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:1.0.0 \
    run-all --base-dir /data

# Upload results to S3
aws s3 sync /data/predictions s3://<account-id>-hurricane-landfall/predictions/
aws s3 sync /data/models s3://<account-id>-hurricane-landfall/models/
aws s3 sync /data/plots s3://<account-id>-hurricane-landfall/plots/

# Optionally terminate
# aws ec2 terminate-instances --instance-ids $(curl -s http://169.254.169.254/latest/meta-data/instance-id)

Step 4: Run Pipeline on AWS

Launch Script

Create launch_hurricane_pipeline.sh:

#!/bin/bash
set -e

INSTANCE_TYPE="${1:-t3.large}"
VERSION="${2:-1.0.0}"
ACCOUNT_ID="<your-account-id>"
REGION="us-east-1"

echo "=== Launching Hurricane Landfall Pipeline ==="
echo "Instance Type: ${INSTANCE_TYPE}"
echo "Container Version: ${VERSION}"

# Launch instance
INSTANCE_ID=$(aws ec2 run-instances \
    --image-id ami-0c55b159cbfafe1f0 \
    --instance-type ${INSTANCE_TYPE} \
    --key-name mlops-pipeline-key \
    --security-group-ids sg-xxx \
    --iam-instance-profile Name=mlops-pipeline-role \
    --user-data file://userdata.sh \
    --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=hurricane-landfall-${VERSION}}]" \
    --query 'Instances[0].InstanceId' \
    --output text)

echo "Instance ID: ${INSTANCE_ID}"

# Wait for running state
aws ec2 wait instance-running --instance-ids ${INSTANCE_ID}

# Get public IP
PUBLIC_IP=$(aws ec2 describe-instances \
    --instance-ids ${INSTANCE_ID} \
    --query 'Reservations[0].Instances[0].PublicIpAddress' \
    --output text)

echo ""
echo "=== Instance Ready ==="
echo "SSH: ssh -i ~/.ssh/mlops-pipeline-key.pem ec2-user@${PUBLIC_IP}"
echo "MLflow: http://${PUBLIC_IP}:5000"
echo ""
echo "Check progress:"
echo "  ssh -i ~/.ssh/mlops-pipeline-key.pem ec2-user@${PUBLIC_IP} 'docker logs -f hurricane-landfall'"

Check Status

#!/bin/bash
# check_hurricane_status.sh

INSTANCE_ID="${1}"

# Get instance details
aws ec2 describe-instances \
    --instance-ids ${INSTANCE_ID} \
    --query 'Reservations[0].Instances[0].{State:State.Name,IP:PublicIpAddress,Type:InstanceType}'

# Check S3 for results
aws s3 ls s3://<account-id>-hurricane-landfall/predictions/ --recursive

Step 5: Retrieve Results

Download from S3

# Download predictions
aws s3 sync s3://<account-id>-hurricane-landfall/predictions/ ./predictions/

# Download models
aws s3 sync s3://<account-id>-hurricane-landfall/models/ ./models/

# Download visualizations
aws s3 sync s3://<account-id>-hurricane-landfall/plots/ ./plots/

View in MLflow

If MLflow server is still running:

# Get instance IP
PUBLIC_IP=$(aws ec2 describe-instances \
    --filters "Name=tag:Name,Values=hurricane-landfall-*" \
    --query 'Reservations[0].Instances[0].PublicIpAddress' \
    --output text)

echo "MLflow UI: http://${PUBLIC_IP}:5000"

Step 6: Clean Up

Terminate Instance

# Find and terminate hurricane instances
aws ec2 describe-instances \
    --filters "Name=tag:Name,Values=hurricane-landfall-*" \
    --query 'Reservations[*].Instances[*].InstanceId' \
    --output text | xargs -I {} aws ec2 terminate-instances --instance-ids {}

Clean S3 (Optional)

# Remove old predictions (keep models)
aws s3 rm s3://<account-id>-hurricane-landfall/predictions/ --recursive

Version Management

Promoting Versions

Development (local): Test changes locally
Staging (1.0.0-rc1): Push release candidate to ECR
Production (1.0.0): Promote stable version

Rollback

# Pull previous version
docker pull <account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:0.9.0

# Update userdata.sh to reference old version
# Re-launch instance

Cost Optimization

Instance Sizing

Instance Type	Cost/hour	Use Case
t3.medium	~$0.04	Testing
t3.large	~$0.08	Standard runs
t3.xlarge	~$0.16	Large datasets
c5.xlarge	~$0.17	CPU-intensive

Spot Instances

aws ec2 request-spot-instances \
    --instance-count 1 \
    --type "one-time" \
    --launch-specification file://spot-spec.json

Auto-Termination

Add to userdata.sh:

# Self-terminate after completion
aws ec2 terminate-instances \
    --instance-ids $(curl -s http://169.254.169.254/latest/meta-data/instance-id)

Security Considerations

IAM Roles: Use minimal permissions
Security Groups: Restrict SSH to known IPs
Encryption: Enable S3 bucket encryption
Secrets: Use AWS Secrets Manager for sensitive config

Example IAM Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<account-id>-hurricane-landfall",
                "arn:aws:s3:::<account-id>-hurricane-landfall/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "ecr:GetAuthorizationToken"
            ],
            "Resource": "*"
        }
    ]
}

Monitoring

CloudWatch Logs

# View logs
aws logs tail /aws/ec2/hurricane-landfall --follow

CloudWatch Metrics

Create alarms for:

Instance CPU utilization
S3 bucket size
Pipeline execution time

Troubleshooting

Container Fails to Start

# SSH to instance
ssh -i ~/.ssh/mlops-pipeline-key.pem ec2-user@<ip>

# Check docker logs
docker logs hurricane-landfall

# Check cloud-init logs
cat /var/log/cloud-init-output.log

S3 Upload Fails

Check IAM role permissions:

aws sts get-caller-identity
aws s3 ls s3://<bucket>/ --debug

ECR Pull Fails

Ensure ECR login:

aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com

Next Steps

Set up SageMaker endpoint - Real-time inference
Configure alerts - Pipeline monitoring
Multi-region deployment - Disaster recovery