Shared Libraries
The MLOps platform includes several shared libraries that provide common functionality across pipelines.
Overview
| Library | Purpose |
|---|---|
doe-library | Design of Experiments utilities |
feature-library | Feature engineering functions |
io-library | Input/Output utilities |
metrics-library | Metrics computation |
plot-library | Visualization utilities |
doe-library
Design of Experiments library for hyperparameter optimization.
Installation
pip install -e src/doe-library
Usage
from doe_library import create_experiment_grid
# Create parameter grid
grid = create_experiment_grid(
learning_rate=[0.001, 0.01, 0.1],
batch_size=[16, 32, 64],
epochs=[10, 20, 50]
)
# Run experiments
for params in grid:
run_experiment(params)
feature-library
Feature engineering and data splitting utilities.
Installation
pip install -e src/feature-library
Usage
from feature_library.get_splits import get_all_splits
# Split data into train/test/validate
(x_train, y_train), (x_test, y_test), (x_val, y_val) = get_all_splits(data)
Functions
| Function | Description |
|---|---|
get_all_splits() | Split data into train/test/validate |
create_features() | Generate engineered features |
encode_categorical() | Encode categorical variables |
io-library
Input/Output utilities for S3 and local filesystem.
Installation
pip install -e src/io-library
Usage
from io_library.read_from_s3 import read_csv_from_s3, read_parquet_from_s3
from io_library.write_to_s3 import write_csv_to_s3, write_parquet_to_s3
# Read from S3
df = read_csv_from_s3("my-bucket", "path/to/file.csv")
# Write to S3
write_parquet_to_s3(df, "my-bucket", "path/to/output.parquet")
Classes
ConfigDataClass
Configuration management with JSON serialization:
from io_library.ConfigDataClass import Config
# Create config
config = Config(
n_splits=5,
learning_rate=0.001
)
# Save to JSON
config.to_json("config.json")
# Load from S3
config = Config.from_json_s3("bucket", "config.json")
metrics-library
Metrics computation for model evaluation.
Installation
pip install -e src/metrics-library
Usage
from metrics_library import compute_classification_metrics
metrics = compute_classification_metrics(y_true, y_pred)
# Returns: accuracy, precision, recall, f1_score
Functions
| Function | Description |
|---|---|
compute_classification_metrics() | Classification metrics |
compute_regression_metrics() | Regression metrics (MSE, MAE, R²) |
compute_confusion_matrix() | Confusion matrix |
plot-library
Visualization utilities for model analysis.
Installation
pip install -e src/plot-library
Usage
from plot_library import plot_confusion_matrix, plot_roc_curve
# Plot confusion matrix
fig = plot_confusion_matrix(y_true, y_pred, classes=['Normal', 'Fraud'])
fig.savefig('confusion_matrix.png')
# Plot ROC curve
fig = plot_roc_curve(y_true, y_scores)
fig.savefig('roc_curve.png')
Available Plots
- Confusion matrix
- ROC curve
- Precision-Recall curve
- Feature importance
- Learning curves
- Distribution plots
Extending Libraries
Adding New Functions
- Create a new module in the library:
# src/metrics-library/metrics_library/new_metric.py
def compute_custom_metric(y_true, y_pred):
"""Compute custom metric."""
return custom_value
- Export in
__init__.py:
from .new_metric import compute_custom_metric
- Add tests:
# tests/test_new_metric.py
def test_custom_metric():
result = compute_custom_metric([0, 1], [0, 1])
assert result == expected_value
- Update documentation.