Getting Started

This guide will help you set up your environment and run your first ML pipeline.

Prerequisites

Python 3.8 or higher
pip package manager
Git
(Optional) Docker for containerized execution
(Optional) AWS CLI for cloud deployments

Installation

1. Clone the Repository

git clone https://bitbucket.org/wilsonify/mlops-with-mlflow.git
cd mlops-with-mlflow

2. Create Virtual Environment

python3 -m venv .venv
source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate     # Windows

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

4. Install Local Libraries

The project includes several shared libraries:

pip install -e src/doe-library
pip install -e src/feature-library
pip install -e src/io-library
pip install -e src/metrics-library
pip install -e src/plot-library

Running Your First Pipeline

Option 1: Scikit-learn Pipeline (Fraud Detection)

cd src/mlflow-sklearn

# Install the package
pip install -e .

# Run the full pipeline
make all

This will execute:

Create training dataset
Convert CSV to Parquet
Preprocessing
Model training
Scoring
Evaluation
Validation

Option 2: TensorFlow Pipeline (Image Classification)

cd src/mlflow-tf

# Install the package
pip install -e .

# Run the full pipeline
python run_pipeline.py all

Starting MLflow UI

To view your experiments:

# From the project root
mlflow ui --port 5000

Then open http://localhost:5000 in your browser.