_images/maldibatchkit_logo.png _images/maldibatchkit_logo.png

MaldiBatchKit Documentation#

Batch-effect correction methods for MALDI-TOF mass spectrometry in clinical AMR prediction workflows. Scikit-learn compatible transformers, unified CLI, and a diagnostics suite for quantifying batch mixing before and after correction.


Key Features#

ComBat Variants

Johnson (2007), Fortin (2018) covariate-aware, and Chen (2022) CovBat re-exported from combatlearn with a unified sklearn API.

api/corrections.html#combat-variants
Species-Aware ComBat

ComBat-Fortin preset with species as the protected biological covariate, so inter-species structure is preserved during harmonisation.

api/corrections.html#species-aware-combat
Quality-Weighted ComBat

Weighted empirical-Bayes extension where low-SNR spectra contribute less to the shrinkage prior.

api/corrections.html#quality-weighted-combat
Linear-Model Corrections

Limma removeBatchEffect (Ritchie 2015) as a pure-Python OLS subtraction that protects a user-supplied design.

api/corrections.html#linear-model-corrections
Harmony

Iterative soft-clustering integration (Korsunsky 2019) via harmonypy, suitable for many-batch designs.

api/corrections.html#single-cell-style-integration
Baselines & Scaling

Median centering, z-score per batch, and reference scaling - simple, auditable corrections to compare against.

api/corrections.html#baselines
Batch-Aware Warping

Per-batch m/z warping sharing a global reference, wrapping maldiamrkit.alignment.Warping for MALDI-specific drift.

api/corrections.html#maldi-specific-corrections
Diagnostics Suite

kBET, LISI, silhouette-by-batch, per-batch peak drift, TIC CoV, and a tidy diagnostic_report that summarises before/after deltas.

api/diagnostics.html
AutoCorrector

Meta-corrector with a swappable method hyperparameter — sweep across corrector families inside GridSearchCV and let the downstream AUROC pick the winner.

choosing.html
Benchmark

BatchCorrectionBenchmark scores every corrector on every metric with bootstrap CIs, returning a tidy table ready for paper-figure comparisons.

choosing.html
MaldiSet Integration

MaldiSetAdapter bridges maldiamrkit.MaldiSet and any MaldiBatchKit corrector in a single call, preserving metadata.

api/integrations.html
CLI

maldibatchkit correct <method> and maldibatchkit diagnose for batch processing, with matched NPZ / CSV I/O.

cli.html
Extensible Base Class

Subclass BaseBatchCorrector and implement two methods to ship a custom corrector that plugs into sklearn pipelines with no leakage.

extending.html

Quick Example#

from maldibatchkit import SpeciesAwareComBat
from maldibatchkit.diagnostics import diagnostic_report

# X: (n_samples, n_bins) DataFrame; batch & species indexed by X.index
corrector = SpeciesAwareComBat(batch=batch, species=species)
X_corrected = corrector.fit_transform(X)

# Summarise the effect of correction
report = diagnostic_report(X, X_corrected, batch)
print(report)

Train/test without leakage - the corrector is fit on training data only and then applied to held-out samples via transform:

from sklearn.model_selection import train_test_split
from maldibatchkit import ComBat

X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=batch
)

corrector = ComBat(
    batch=batch, method="fortin", discrete_covariates=species,
)
corrector.fit(X_train)
X_train_c = corrector.transform(X_train)
X_test_c = corrector.transform(X_test)