Contributing

We welcome contributions to the MLOps platform! This guide explains how to contribute code, documentation, and bug fixes.

Getting Started

1. Fork the Repository

# Clone your fork
git clone git@bitbucket.org:YOUR_USERNAME/mlops-with-mlflow.git
cd mlops-with-mlflow

# Add upstream remote
git remote add upstream git@bitbucket.org:wilsonify/mlops-with-mlflow.git

2. Set Up Development Environment

This project uses uv for Python dependency management. See the UV Guide for comprehensive documentation.

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Sync all dependencies and create virtual environment
uv sync

# This installs all dependencies and packages in development mode

That’s it! No need to manually create venvs or install requirements.

Note: For detailed uv usage, see the UV Package Manager Guide.

3. Create a Feature Branch

git checkout -b feature/my-new-feature

Code Style

Python Style

We follow PEP 8 with some modifications:

Line length: 120 characters (configured in pyproject.toml)
Use type hints for all function signatures
Docstrings: Google style
Python 3.13+ required

def process_data(
    input_path: str,
    output_path: str,
    batch_size: int = 32
) -> pd.DataFrame:
    """Process input data and save results.
    
    Args:
        input_path: Path to input data file.
        output_path: Path to save processed data.
        batch_size: Number of samples per batch.
    
    Returns:
        Processed DataFrame.
    
    Raises:
        FileNotFoundError: If input file doesn't exist.
    """
    pass

Linting and Formatting

We use ruff for linting and formatting (configured in root pyproject.toml).

# Format code
uv run ruff format .

# Check code
uv run ruff check .

# Fix auto-fixable issues
uv run ruff check --fix .

Testing

Running Tests

# Run all tests
make test
# or
uv run pytest

# Run specific test file
uv run pytest tests/test_feature.py

# Run specific test function
uv run pytest tests/test_feature.py::test_function_name

# Run with coverage
uv run pytest --cov=src --cov-report=html

# Run tests in a specific package
cd src/doe-library
uv run pytest tests/

Writing Tests

All tests must:

Use pytest framework
Have meaningful test names (prefix with test_)
Include docstrings explaining what is tested
Achieve minimum 80% code coverage for new code
Test both success and failure cases

Example test structure:

import pytest
from my_package import my_function


class TestMyFunction:
    """Tests for my_function."""
    
    def test_returns_expected_value_for_valid_input(self):
        """Test that function returns expected value for valid input."""
        result = my_function("valid_input")
        assert result == "expected_output"
    
    def test_raises_error_for_invalid_input(self):
        """Test that function raises ValueError for invalid input."""
        with pytest.raises(ValueError, match="Invalid input"):
            my_function("invalid_input")
    
    @pytest.mark.parametrize("input,expected", [
        ("a", 1),
        ("b", 2),
        ("c", 3),
    ])
    def test_handles_multiple_inputs(self, input, expected):
        """Test function with multiple input values."""
        assert my_function(input) == expected

Test Organization

src/my-package/
├── my_package/
│   ├── __init__.py
│   └── module.py
└── tests/
    ├── __init__.py
    ├── test_module.py
    └── conftest.py  # Shared fixtures

Run specific test file

pytest tests/test_training.py

Run with coverage

pytest –cov=mlflow_sklearn –cov-report=html


### Writing Tests

```python
import pytest
from mlflow_sklearn.s04_train import train_model

class TestTraining:
    @pytest.fixture
    def sample_data(self):
        """Create sample training data."""
        return pd.DataFrame({
            'feature1': [1, 2, 3],
            'feature2': [4, 5, 6],
            'target': [0, 1, 0]
        })
    
    def test_train_model_returns_model(self, sample_data):
        """Test that training returns a valid model."""
        model = train_model(sample_data)
        assert model is not None
        assert hasattr(model, 'predict')

Pull Request Process

1. Update Your Branch

git fetch upstream
git rebase upstream/master

2. Run Tests

# Run all tests
pytest

# Run linters
make lint

3. Create Pull Request

Write a clear title and description
Reference any related issues
Add screenshots for UI changes
Update documentation if needed

4. Code Review

Address reviewer feedback
Keep commits clean and atomic
Squash commits before merge if requested

Commit Messages

Follow conventional commits:

type(scope): description

[optional body]

[optional footer]

Types:

feat: New feature
fix: Bug fix
docs: Documentation
style: Code style changes
refactor: Code refactoring
test: Adding tests
chore: Maintenance tasks

Examples:

feat(sklearn): add hyperparameter optimization

fix(tf): resolve memory leak in training loop

docs(readme): update installation instructions

Adding a New Pipeline

1. Create Package Structure

mkdir -p src/mlflow-newframework/mlflow_newframework/{pipeline,utils}
touch src/mlflow-newframework/mlflow_newframework/__init__.py

2. Implement Pipeline Classes

# mlflow_newframework/pipeline/base.py
from mlflow_tf.pipeline.base import Pipeline

class NewFrameworkPipeline(Pipeline):
    def run(self):
        # Implementation
        pass

3. Add Tests

# tests/test_pipeline.py
def test_pipeline_runs():
    pipeline = NewFrameworkPipeline(config)
    result = pipeline.run()
    assert result is not None

4. Add Documentation

Create documentation in docs/content/users/newframework-pipeline.md.

Getting Help

Questions: Open a discussion on Bitbucket
Bugs: Create an issue with reproduction steps
Features: Propose in discussions first