Contributing

We welcome contributions to the MLOps platform! This guide explains how to contribute code, documentation, and bug fixes.

Getting Started

1. Fork the Repository

# Clone your fork
git clone git@bitbucket.org:YOUR_USERNAME/mlops-with-mlflow.git
cd mlops-with-mlflow

# Add upstream remote
git remote add upstream git@bitbucket.org:wilsonify/mlops-with-mlflow.git

2. Set Up Development Environment

This project uses uv for Python dependency management. See the UV Guide for comprehensive documentation.

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Sync all dependencies and create virtual environment
uv sync

# This installs all dependencies and packages in development mode

That’s it! No need to manually create venvs or install requirements.

Note: For detailed uv usage, see the UV Package Manager Guide.

3. Create a Feature Branch

git checkout -b feature/my-new-feature

Code Style

Python Style

We follow PEP 8 with some modifications:

  • Line length: 120 characters (configured in pyproject.toml)
  • Use type hints for all function signatures
  • Docstrings: Google style
  • Python 3.13+ required
def process_data(
    input_path: str,
    output_path: str,
    batch_size: int = 32
) -> pd.DataFrame:
    """Process input data and save results.
    
    Args:
        input_path: Path to input data file.
        output_path: Path to save processed data.
        batch_size: Number of samples per batch.
    
    Returns:
        Processed DataFrame.
    
    Raises:
        FileNotFoundError: If input file doesn't exist.
    """
    pass

Linting and Formatting

We use ruff for linting and formatting (configured in root pyproject.toml).

# Format code
uv run ruff format .

# Check code
uv run ruff check .

# Fix auto-fixable issues
uv run ruff check --fix .

Testing

Running Tests

# Run all tests
make test
# or
uv run pytest

# Run specific test file
uv run pytest tests/test_feature.py

# Run specific test function
uv run pytest tests/test_feature.py::test_function_name

# Run with coverage
uv run pytest --cov=src --cov-report=html

# Run tests in a specific package
cd src/doe-library
uv run pytest tests/

Writing Tests

All tests must:

  • Use pytest framework
  • Have meaningful test names (prefix with test_)
  • Include docstrings explaining what is tested
  • Achieve minimum 80% code coverage for new code
  • Test both success and failure cases

Example test structure:

import pytest
from my_package import my_function


class TestMyFunction:
    """Tests for my_function."""
    
    def test_returns_expected_value_for_valid_input(self):
        """Test that function returns expected value for valid input."""
        result = my_function("valid_input")
        assert result == "expected_output"
    
    def test_raises_error_for_invalid_input(self):
        """Test that function raises ValueError for invalid input."""
        with pytest.raises(ValueError, match="Invalid input"):
            my_function("invalid_input")
    
    @pytest.mark.parametrize("input,expected", [
        ("a", 1),
        ("b", 2),
        ("c", 3),
    ])
    def test_handles_multiple_inputs(self, input, expected):
        """Test function with multiple input values."""
        assert my_function(input) == expected

Test Organization

src/my-package/
├── my_package/
│   ├── __init__.py
│   └── module.py
└── tests/
    ├── __init__.py
    ├── test_module.py
    └── conftest.py  # Shared fixtures

Run specific test file

pytest tests/test_training.py

Run with coverage

pytest –cov=mlflow_sklearn –cov-report=html


### Writing Tests

```python
import pytest
from mlflow_sklearn.s04_train import train_model

class TestTraining:
    @pytest.fixture
    def sample_data(self):
        """Create sample training data."""
        return pd.DataFrame({
            'feature1': [1, 2, 3],
            'feature2': [4, 5, 6],
            'target': [0, 1, 0]
        })
    
    def test_train_model_returns_model(self, sample_data):
        """Test that training returns a valid model."""
        model = train_model(sample_data)
        assert model is not None
        assert hasattr(model, 'predict')

Pull Request Process

1. Update Your Branch

git fetch upstream
git rebase upstream/master

2. Run Tests

# Run all tests
pytest

# Run linters
make lint

3. Create Pull Request

  • Write a clear title and description
  • Reference any related issues
  • Add screenshots for UI changes
  • Update documentation if needed

4. Code Review

  • Address reviewer feedback
  • Keep commits clean and atomic
  • Squash commits before merge if requested

Commit Messages

Follow conventional commits:

type(scope): description

[optional body]

[optional footer]

Types:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation
  • style: Code style changes
  • refactor: Code refactoring
  • test: Adding tests
  • chore: Maintenance tasks

Examples:

feat(sklearn): add hyperparameter optimization

fix(tf): resolve memory leak in training loop

docs(readme): update installation instructions

Adding a New Pipeline

1. Create Package Structure

mkdir -p src/mlflow-newframework/mlflow_newframework/{pipeline,utils}
touch src/mlflow-newframework/mlflow_newframework/__init__.py

2. Implement Pipeline Classes

# mlflow_newframework/pipeline/base.py
from mlflow_tf.pipeline.base import Pipeline

class NewFrameworkPipeline(Pipeline):
    def run(self):
        # Implementation
        pass

3. Add Tests

# tests/test_pipeline.py
def test_pipeline_runs():
    pipeline = NewFrameworkPipeline(config)
    result = pipeline.run()
    assert result is not None

4. Add Documentation

Create documentation in docs/content/users/newframework-pipeline.md.

Getting Help

  • Questions: Open a discussion on Bitbucket
  • Bugs: Create an issue with reproduction steps
  • Features: Propose in discussions first