Hurricane AWS Deployment
This guide covers deploying the Hurricane Landfall Forecasting pipeline to AWS.
Overview
The Hurricane Landfall pipeline can be deployed to AWS using:
- ECR: Container registry for the pipeline image
- EC2: Ephemeral instances for running the pipeline
- S3: Storage for data, models, and predictions
- SageMaker: (Optional) Managed inference endpoints
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ AWS Account │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────────────────────┐ │
│ │ ECR │ │ EC2 Instance │ │
│ │ hurricane- │───▶│ ┌─────────────────────────┐ │ │
│ │ landfall:1.0.0 │ │ │ hurricane-landfall │ │ │
│ └─────────────────┘ │ │ container │ │ │
│ │ └───────────┬─────────────┘ │ │
│ │ │ │ │
│ │ ┌───────────▼─────────────┐ │ │
│ │ │ MLflow Server │ │ │
│ │ │ (port 5000) │ │ │
│ │ └─────────────────────────┘ │ │
│ └──────────────┬──────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────▼────────────────────┐ │
│ │ S3 Bucket │ │
│ │ ┌──────────────┐ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ hurdat2/ │ │ models/ │ │ predictions/ │ │ │
│ │ │ raw data │ │ joblib │ │ CSV outputs │ │ │
│ │ └──────────────┘ └──────────┘ └──────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Prerequisites
- AWS CLI configured with appropriate permissions
- Docker installed and running
- Terraform >= 1.0.0 (for infrastructure)
- Local pipeline tested successfully
Step 1: Push Container to ECR
Create ECR Repository
# Create repository
aws ecr create-repository \
--repository-name hurricane-landfall \
--image-scanning-configuration scanOnPush=true
# Get login credentials
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin \
<account-id>.dkr.ecr.us-east-1.amazonaws.com
Tag and Push Image
# Tag the local image
docker tag hurricane-landfall:1.0.0 \
<account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:1.0.0
# Push to ECR
docker push \
<account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:1.0.0
Verify Push
aws ecr describe-images \
--repository-name hurricane-landfall \
--query 'imageDetails[*].{Tag:imageTags,Pushed:imagePushedAt}'
Step 2: Create S3 Bucket
# Create bucket for data and artifacts
aws s3 mb s3://<account-id>-hurricane-landfall --region us-east-1
# Create folder structure
aws s3api put-object --bucket <account-id>-hurricane-landfall --key data/
aws s3api put-object --bucket <account-id>-hurricane-landfall --key models/
aws s3api put-object --bucket <account-id>-hurricane-landfall --key predictions/
aws s3api put-object --bucket <account-id>-hurricane-landfall --key mlruns/
Step 3: Deploy Infrastructure
Using Terraform
cd deploy/aws/064592191516/us-east-1/hurricane-landfall/01-infrastructure
terraform init
terraform plan
terraform apply
Manual EC2 Launch
# Launch instance with user data
aws ec2 run-instances \
--image-id ami-0c55b159cbfafe1f0 \
--instance-type t3.large \
--key-name mlops-pipeline-key \
--security-group-ids sg-xxx \
--iam-instance-profile Name=mlops-pipeline-role \
--user-data file://userdata.sh \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=hurricane-landfall}]'
User Data Script
Create userdata.sh:
#!/bin/bash
set -e
# Install Docker
yum update -y
amazon-linux-extras install docker -y
service docker start
usermod -a -G docker ec2-user
# Login to ECR
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin \
<account-id>.dkr.ecr.us-east-1.amazonaws.com
# Pull container
docker pull <account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:1.0.0
# Create directories
mkdir -p /data/{raw,processed,models,predictions,plots}
# Run pipeline
docker run \
-v /data:/data \
-e AWS_DEFAULT_REGION=us-east-1 \
<account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:1.0.0 \
run-all --base-dir /data
# Upload results to S3
aws s3 sync /data/predictions s3://<account-id>-hurricane-landfall/predictions/
aws s3 sync /data/models s3://<account-id>-hurricane-landfall/models/
aws s3 sync /data/plots s3://<account-id>-hurricane-landfall/plots/
# Optionally terminate
# aws ec2 terminate-instances --instance-ids $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
Step 4: Run Pipeline on AWS
Launch Script
Create launch_hurricane_pipeline.sh:
#!/bin/bash
set -e
INSTANCE_TYPE="${1:-t3.large}"
VERSION="${2:-1.0.0}"
ACCOUNT_ID="<your-account-id>"
REGION="us-east-1"
echo "=== Launching Hurricane Landfall Pipeline ==="
echo "Instance Type: ${INSTANCE_TYPE}"
echo "Container Version: ${VERSION}"
# Launch instance
INSTANCE_ID=$(aws ec2 run-instances \
--image-id ami-0c55b159cbfafe1f0 \
--instance-type ${INSTANCE_TYPE} \
--key-name mlops-pipeline-key \
--security-group-ids sg-xxx \
--iam-instance-profile Name=mlops-pipeline-role \
--user-data file://userdata.sh \
--tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=hurricane-landfall-${VERSION}}]" \
--query 'Instances[0].InstanceId' \
--output text)
echo "Instance ID: ${INSTANCE_ID}"
# Wait for running state
aws ec2 wait instance-running --instance-ids ${INSTANCE_ID}
# Get public IP
PUBLIC_IP=$(aws ec2 describe-instances \
--instance-ids ${INSTANCE_ID} \
--query 'Reservations[0].Instances[0].PublicIpAddress' \
--output text)
echo ""
echo "=== Instance Ready ==="
echo "SSH: ssh -i ~/.ssh/mlops-pipeline-key.pem ec2-user@${PUBLIC_IP}"
echo "MLflow: http://${PUBLIC_IP}:5000"
echo ""
echo "Check progress:"
echo " ssh -i ~/.ssh/mlops-pipeline-key.pem ec2-user@${PUBLIC_IP} 'docker logs -f hurricane-landfall'"
Check Status
#!/bin/bash
# check_hurricane_status.sh
INSTANCE_ID="${1}"
# Get instance details
aws ec2 describe-instances \
--instance-ids ${INSTANCE_ID} \
--query 'Reservations[0].Instances[0].{State:State.Name,IP:PublicIpAddress,Type:InstanceType}'
# Check S3 for results
aws s3 ls s3://<account-id>-hurricane-landfall/predictions/ --recursive
Step 5: Retrieve Results
Download from S3
# Download predictions
aws s3 sync s3://<account-id>-hurricane-landfall/predictions/ ./predictions/
# Download models
aws s3 sync s3://<account-id>-hurricane-landfall/models/ ./models/
# Download visualizations
aws s3 sync s3://<account-id>-hurricane-landfall/plots/ ./plots/
View in MLflow
If MLflow server is still running:
# Get instance IP
PUBLIC_IP=$(aws ec2 describe-instances \
--filters "Name=tag:Name,Values=hurricane-landfall-*" \
--query 'Reservations[0].Instances[0].PublicIpAddress' \
--output text)
echo "MLflow UI: http://${PUBLIC_IP}:5000"
Step 6: Clean Up
Terminate Instance
# Find and terminate hurricane instances
aws ec2 describe-instances \
--filters "Name=tag:Name,Values=hurricane-landfall-*" \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text | xargs -I {} aws ec2 terminate-instances --instance-ids {}
Clean S3 (Optional)
# Remove old predictions (keep models)
aws s3 rm s3://<account-id>-hurricane-landfall/predictions/ --recursive
Version Management
Promoting Versions
- Development (
local): Test changes locally - Staging (
1.0.0-rc1): Push release candidate to ECR - Production (
1.0.0): Promote stable version
Rollback
# Pull previous version
docker pull <account-id>.dkr.ecr.us-east-1.amazonaws.com/hurricane-landfall:0.9.0
# Update userdata.sh to reference old version
# Re-launch instance
Cost Optimization
Instance Sizing
| Instance Type | Cost/hour | Use Case |
|---|---|---|
| t3.medium | ~$0.04 | Testing |
| t3.large | ~$0.08 | Standard runs |
| t3.xlarge | ~$0.16 | Large datasets |
| c5.xlarge | ~$0.17 | CPU-intensive |
Spot Instances
aws ec2 request-spot-instances \
--instance-count 1 \
--type "one-time" \
--launch-specification file://spot-spec.json
Auto-Termination
Add to userdata.sh:
# Self-terminate after completion
aws ec2 terminate-instances \
--instance-ids $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
Security Considerations
- IAM Roles: Use minimal permissions
- Security Groups: Restrict SSH to known IPs
- Encryption: Enable S3 bucket encryption
- Secrets: Use AWS Secrets Manager for sensitive config
Example IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<account-id>-hurricane-landfall",
"arn:aws:s3:::<account-id>-hurricane-landfall/*"
]
},
{
"Effect": "Allow",
"Action": [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:GetAuthorizationToken"
],
"Resource": "*"
}
]
}
Monitoring
CloudWatch Logs
# View logs
aws logs tail /aws/ec2/hurricane-landfall --follow
CloudWatch Metrics
Create alarms for:
- Instance CPU utilization
- S3 bucket size
- Pipeline execution time
Troubleshooting
Container Fails to Start
# SSH to instance
ssh -i ~/.ssh/mlops-pipeline-key.pem ec2-user@<ip>
# Check docker logs
docker logs hurricane-landfall
# Check cloud-init logs
cat /var/log/cloud-init-output.log
S3 Upload Fails
Check IAM role permissions:
aws sts get-caller-identity
aws s3 ls s3://<bucket>/ --debug
ECR Pull Fails
Ensure ECR login:
aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
Next Steps
- Set up SageMaker endpoint - Real-time inference
- Configure alerts - Pipeline monitoring
- Multi-region deployment - Disaster recovery