MaldiBatchKit - Quick Start#

Minimum end-to-end workflow: load a demo dataset, fit a ComBat corrector, run the diagnostic report, and plot a before / after summary.

The notebooks use the MALDI-Kleb-AI dataset (Rocchi et al., 2026; Zenodo DOI 10.5281/zenodo.17405072). notebooks/_demo.py caches the 370 MB tarball under ~/.cache/maldibatchkit/ (or $MALDIBATCHKIT_CACHE_DIR) on first use.

1. Load the demo dataset#

[1]:

import sys, pathlib
sys.path.insert(0, str(pathlib.Path.cwd().parent))  # make notebooks/_demo.py importable
from notebooks._demo import load_maldi_kleb_ai

ds = load_maldi_kleb_ai(antibiotic='Amikacin', verbose=True)
ds.X.shape, ds.meta.columns.tolist(), ds.info

[1]:

((741, 6000),
 ['Amikacin', 'Meropenem', 'Species', 'Batch', 'SNR'],
 {'source': 'Zenodo MALDI-Kleb-AI',
  'doi': '10.5281/zenodo.17405072',
  'record_id': '17405072',
  'md5_tar': 'c14b6c6b4210553962faa7f1dc27d275',
  'n_samples': 741,
  'n_bins': 6000,
  'bin_width': 3,
  'antibiotic': 'Amikacin',
  'cache_dir': '/home/ettore/.cache/maldibatchkit/maldi-kleb-ai'})

[2]:

ds.meta.head()

[2]:

	Amikacin	Meropenem	Species	Batch	SNR
1-8317003599	S	S	Klebsiella pneumoniae	Milan	128.163340
10-8320002130	S	S	Klebsiella pneumoniae	Milan	128.492364
100-8660007296	S	S	Klebsiella pneumoniae	Milan	149.638040
1004	R	R	Klebsiella pneumoniae	Rome	70.134876
101-8140000209	S	S	Klebsiella pneumoniae	Milan	84.646561

2. Fit a ComBat corrector#

[3]:

from maldibatchkit import ComBat

corrector = ComBat(batch=ds.batch, method='johnson')
X_corrected = corrector.fit_transform(ds.X)
X_corrected.shape

[3]:

(741, 6000)

ComBat stores batch at construction. For a species-aware correction (Fortin variant) use SpeciesAwareComBat(batch=ds.batch, species=ds.species) one-liner instead.

3. Run the diagnostic report#

[4]:

from maldibatchkit.diagnostics import diagnostic_report

report = diagnostic_report(ds.X, X_corrected, ds.batch, mz_values=ds.mz, top_k_peaks=20)
report[report['scope'] == 'overall']

[4]:

	metric	scope	value_before	value_after	delta	better
0	silhouette_batch	overall	0.147457	-0.011313	-0.158770	lower
1	kbet_acceptance_rate	overall	0.020243	0.260459	0.240216	higher
2	kbet_mean_chi2	overall	54.016409	14.849020	-39.167389	lower
3	lisi	overall	1.014542	1.265755	0.251213	higher

4. Plot the before / after summary#

[5]:

%matplotlib inline
from maldibatchkit.viz import plot_diagnostic_summary, plot_peak_shift
import matplotlib.pyplot as plt

fig, axes = plot_diagnostic_summary(report, scope='overall')
plt.show()

../../_images/tutorials_notebooks_01_quick_start_10_0.png

[6]:

fig, ax = plot_peak_shift(ds.batch, ds.X, mz_values=ds.mz, max_batches=None)
ax.set_xlim(2000, 12000)
plt.show()

../../_images/tutorials_notebooks_01_quick_start_11_0.png

[7]:

fig, ax = plot_peak_shift(ds.batch, X_corrected, mz_values=ds.mz, max_batches=None)
ax.set_xlim(2000, 12000)
plt.show()

../../_images/tutorials_notebooks_01_quick_start_12_0.png

Where next#

02_correction_methods.ipynb - side-by-side comparison of every corrector.
03_quality_weighted_combat.ipynb - deep dive into the quality-weighted empirical-Bayes extension.
04_maldiset_integration.ipynb - end-to-end workflow using maldiamrkit.MaldiSet.