MaldiBatchKit Documentation#
Batch-effect correction methods for MALDI-TOF mass spectrometry in clinical AMR prediction workflows. Scikit-learn compatible transformers, unified CLI, and a diagnostics suite for quantifying batch mixing before and after correction.
Key Features#
Johnson (2007), Fortin (2018) covariate-aware, and Chen (2022)
CovBat re-exported from combatlearn with a unified sklearn API.
ComBat-Fortin preset with species as the protected biological covariate, so inter-species structure is preserved during harmonisation.
Weighted empirical-Bayes extension where low-SNR spectra contribute less to the shrinkage prior.
Limma removeBatchEffect (Ritchie 2015) as a pure-Python
OLS subtraction that protects a user-supplied design.
Iterative soft-clustering integration (Korsunsky 2019) via
harmonypy, suitable for many-batch designs.
Median centering, z-score per batch, and reference scaling - simple, auditable corrections to compare against.
Per-batch m/z warping sharing a global reference, wrapping
maldiamrkit.alignment.Warping for MALDI-specific drift.
kBET, LISI, silhouette-by-batch, per-batch peak drift, TIC CoV, and
a tidy diagnostic_report that summarises before/after deltas.
Meta-corrector with a swappable method hyperparameter — sweep
across corrector families inside GridSearchCV and let the
downstream AUROC pick the winner.
BatchCorrectionBenchmark scores every corrector on every
metric with bootstrap CIs, returning a tidy table ready for
paper-figure comparisons.
MaldiSetAdapter bridges maldiamrkit.MaldiSet and any
MaldiBatchKit corrector in a single call, preserving metadata.
maldibatchkit correct <method> and maldibatchkit diagnose
for batch processing, with matched NPZ / CSV I/O.
Subclass BaseBatchCorrector and implement
two methods to ship a custom corrector that plugs into sklearn
pipelines with no leakage.
Quick Example#
from maldibatchkit import SpeciesAwareComBat
from maldibatchkit.diagnostics import diagnostic_report
# X: (n_samples, n_bins) DataFrame; batch & species indexed by X.index
corrector = SpeciesAwareComBat(batch=batch, species=species)
X_corrected = corrector.fit_transform(X)
# Summarise the effect of correction
report = diagnostic_report(X, X_corrected, batch)
print(report)
Train/test without leakage - the corrector is fit on training data only
and then applied to held-out samples via transform:
from sklearn.model_selection import train_test_split
from maldibatchkit import ComBat
X_train, X_test, y_train, y_test = train_test_split(
X, y, stratify=batch
)
corrector = ComBat(
batch=batch, method="fortin", discrete_covariates=species,
)
corrector.fit(X_train)
X_train_c = corrector.transform(X_train)
X_test_c = corrector.transform(X_test)