Scikit-learn Pipeline
The mlflow-sklearn pipeline performs credit card fraud detection using Logistic Regression.
Pipeline Overview
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ s01: Create │────▶│ s02: CSV to │────▶│ s03: Preprocess │
│ Training Data │ │ Parquet │ │ │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
┌─────────────────┐ ┌─────────────────┐ ┌────────▼────────┐
│ s06: Evaluate │◀────│ s05: Score │◀────│ s04: Train │
│ │ │ │ │ │
└────────┬────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ s07: Validate │────▶│ s08: DOE Coarse │────▶│ s09: DOE Fine │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Running the Pipeline
Full Pipeline
cd src/mlflow-sklearn
make all
Individual Steps
Run specific steps as needed:
# Step 1: Create training dataset
python mlflow_sklearn/s01_create_training_dataset.py
# Step 2: Convert CSV to Parquet
python mlflow_sklearn/s02_csv2parquet.py
# Step 3: Preprocessing
python mlflow_sklearn/s03_preprocessing.py
# Step 4: Train model
python mlflow_sklearn/s04_train.py
# Step 5: Score predictions
python mlflow_sklearn/s05_score.py
# Step 6: Evaluate model
python mlflow_sklearn/s06_evaluate.py
# Step 7: Validate model
python mlflow_sklearn/s07_validate.py
Hyperparameter Optimization
# Coarse search
python mlflow_sklearn/s08_doe_coarse.py
# Fine search
python mlflow_sklearn/s09_doe_fine.py
# Full DOE
python mlflow_sklearn/s10_doe_full.py
Data Requirements
The pipeline expects the credit card fraud dataset in S3:
- Source:
s3://064592191516-mlflow/creditcardfraud/creditcard.csv.zip - Format: CSV with fraud labels
Model Configuration
Edit the configuration in s04_train.py:
| Parameter | Default | Description |
|---|---|---|
n_splits | 5 | Number of cross-validation folds |
solver | lbfgs | Optimization algorithm |
max_iter | 1000 | Maximum iterations |
Outputs
After running the pipeline:
- MLflow Experiment:
scikit_learn_experiment - Model Artifact:
log_reg_model - Metrics: Accuracy, Precision, Recall, F1-score
Viewing Results
# Start MLflow UI
make mlflow-ui
# Or directly
mlflow ui -p 1234
Open http://localhost:1234 to view experiments.