
I got tired of copy-pasting ML pipeline YAML across projects, so I built a reusable GitLab CI/CD component
Every ML project I've worked on had the same boilerplate CI: MLflow wiring, data validation, metric checks, model registration. Around the fifth project I no longer remembered which config I'd previously fixed the MLFLOW_RUN_ID passing bug in.
So I built a GitLab CI/CD component that turns this into 10 lines:
yaml
include:
- component: gitlab.com/netOpyr/gitlab-mlops-component/full-pipeline@1.0.0
inputs:
model_name: wine-classifier
training_script: scripts/train.py
data_path: data/train.csv
framework: sklearn
metric_name: accuracy
min_threshold: '0.85'
Which gives you a full 4-stage pipeline:
validate → train → evaluate → register
- validate: schema, nulls, Evidently drift, Great Expectations
- train: MLflow autologging (sklearn/PyTorch/TF/XGBoost/LightGBM), GPU support
- evaluate: threshold check + optional comparison vs production model
- register: GitLab Model Registry, only runs if eval passed
Works on GitLab Free. DVC integration and parallel multi-model training also supported.
Published in GitLab CI/CD Catalog: https://gitlab.com/netOpyr/gitlab-mlops-component
Happy to answer questions — especially on the evaluate stage, compare_with_production was the trickiest part to get right.
u/Na_S04 — 3 days ago