TensorFlow Pipeline

The mlflow-tf pipeline performs image classification (MNIST) using TensorFlow/Keras.

Pipeline Overview

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ s01: Create     │────▶│ s02: Preprocess │────▶│ s03: Train      │
│ Training Data   │     │                 │     │                 │
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
┌─────────────────┐     ┌─────────────────┐     ┌────────▼────────┐
│ s06: DOE Optuna │◀────│ s05: Validate   │◀────│ s04: Evaluate   │
│                 │     │                 │     │                 │
└────────┬────────┘     └─────────────────┘     └─────────────────┘
         │
         ▼
┌─────────────────┐
│ s07: Analyze    │
│ DOE Results     │
└─────────────────┘

Running the Pipeline

Full Pipeline

cd src/mlflow-tf

# Using the pipeline runner
python run_pipeline.py all

Individual Steps

# Data preparation
python run_pipeline.py data

# Preprocessing
python run_pipeline.py preprocess

# Training
python run_pipeline.py train

# Evaluation
python run_pipeline.py evaluate

# Validation
python run_pipeline.py validate

# Hyperparameter optimization
python run_pipeline.py optimize

Using Make

# Run with DVC
make dvc-run

Configuration

Pipeline configuration in mlflow_tf/pipeline/config.json:

{
    "SEED": 42,
    "NUM_CLASSES": 10,
    "IMG_HEIGHT": 28,
    "IMG_WIDTH": 28,
    "BATCH_SIZE": 32,
    "EPOCHS": 10,
    "VALIDATION_SPLIT": 0.2
}

Key Parameters

Parameter	Default	Description
`SEED`	42	Random seed for reproducibility
`NUM_CLASSES`	10	Number of output classes
`BATCH_SIZE`	32	Training batch size
`EPOCHS`	10	Number of training epochs
`LEARNING_RATE`	0.001	Initial learning rate

Model Architecture

The default model is a simple neural network:

model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])

Hyperparameter Optimization

The pipeline uses Optuna for hyperparameter search:

python mlflow_tf/pipeline/s06_doe_optuna.py

Optimized parameters:

Learning rate
Batch size
Number of layers
Units per layer
Dropout rate

Outputs

After running the pipeline:

MLflow Experiment: tensorflow_training
Model Artifact: TensorFlow SavedModel format
Metrics: Loss, Accuracy, per-class metrics

GPU Support

For GPU acceleration:

# Check GPU availability
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

# Run on GPU
CUDA_VISIBLE_DEVICES=0 python run_pipeline.py train

Prefect Orchestration

The pipeline supports Prefect for workflow orchestration:

# Run with Prefect
python run_prefect_pipeline.py all

See PREFECT_README.md for detailed Prefect setup instructions.