Getting Started

This guide will help you set up your environment and run your first ML pipeline.

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • Git
  • (Optional) Docker for containerized execution
  • (Optional) AWS CLI for cloud deployments

Installation

1. Clone the Repository

git clone https://bitbucket.org/wilsonify/mlops-with-mlflow.git
cd mlops-with-mlflow

2. Create Virtual Environment

python3 -m venv .venv
source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate     # Windows

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

4. Install Local Libraries

The project includes several shared libraries:

pip install -e src/doe-library
pip install -e src/feature-library
pip install -e src/io-library
pip install -e src/metrics-library
pip install -e src/plot-library

Running Your First Pipeline

Option 1: Scikit-learn Pipeline (Fraud Detection)

cd src/mlflow-sklearn

# Install the package
pip install -e .

# Run the full pipeline
make all

This will execute:

  1. Create training dataset
  2. Convert CSV to Parquet
  3. Preprocessing
  4. Model training
  5. Scoring
  6. Evaluation
  7. Validation

Option 2: TensorFlow Pipeline (Image Classification)

cd src/mlflow-tf

# Install the package
pip install -e .

# Run the full pipeline
python run_pipeline.py all

Starting MLflow UI

To view your experiments:

# From the project root
mlflow ui --port 5000

Then open http://localhost:5000 in your browser.

Next Steps