MaldiBatchKit - Quick Start#

Minimum end-to-end workflow: load a demo dataset, fit a ComBat corrector, run the diagnostic report, and plot a before / after summary.

The notebooks use the MALDI-Kleb-AI dataset (Rocchi et al., 2026; Zenodo DOI 10.5281/zenodo.17405072). notebooks/_demo.py caches the 370 MB tarball under ~/.cache/maldibatchkit/ (or $MALDIBATCHKIT_CACHE_DIR) on first use.

1. Load the demo dataset#

[1]:
import sys, pathlib
sys.path.insert(0, str(pathlib.Path.cwd().parent))  # make notebooks/_demo.py importable
from notebooks._demo import load_maldi_kleb_ai

ds = load_maldi_kleb_ai(antibiotic='Amikacin', verbose=True)
ds.X.shape, ds.meta.columns.tolist(), ds.info
[1]:
((741, 6000),
 ['Amikacin', 'Meropenem', 'Species', 'Batch', 'SNR'],
 {'source': 'Zenodo MALDI-Kleb-AI',
  'doi': '10.5281/zenodo.17405072',
  'record_id': '17405072',
  'md5_tar': 'c14b6c6b4210553962faa7f1dc27d275',
  'n_samples': 741,
  'n_bins': 6000,
  'bin_width': 3,
  'antibiotic': 'Amikacin',
  'cache_dir': '/home/ettore/.cache/maldibatchkit/maldi-kleb-ai'})
[2]:
ds.meta.head()
[2]:
Amikacin Meropenem Species Batch SNR
1-8317003599 S S Klebsiella pneumoniae Milan 128.163340
10-8320002130 S S Klebsiella pneumoniae Milan 128.492364
100-8660007296 S S Klebsiella pneumoniae Milan 149.638040
1004 R R Klebsiella pneumoniae Rome 70.134876
101-8140000209 S S Klebsiella pneumoniae Milan 84.646561

2. Fit a ComBat corrector#

[3]:
from maldibatchkit import ComBat

corrector = ComBat(batch=ds.batch, method='johnson')
X_corrected = corrector.fit_transform(ds.X)
X_corrected.shape
[3]:
(741, 6000)

ComBat stores batch at construction. For a species-aware correction (Fortin variant) use SpeciesAwareComBat(batch=ds.batch, species=ds.species) one-liner instead.

3. Run the diagnostic report#

[4]:
from maldibatchkit.diagnostics import diagnostic_report

report = diagnostic_report(ds.X, X_corrected, ds.batch, mz_values=ds.mz, top_k_peaks=20)
report[report['scope'] == 'overall']
[4]:
metric scope value_before value_after delta better
0 silhouette_batch overall 0.147457 -0.011313 -0.158770 lower
1 kbet_acceptance_rate overall 0.020243 0.260459 0.240216 higher
2 kbet_mean_chi2 overall 54.016409 14.849020 -39.167389 lower
3 lisi overall 1.014542 1.265755 0.251213 higher

4. Plot the before / after summary#

[5]:
%matplotlib inline
from maldibatchkit.viz import plot_diagnostic_summary, plot_peak_shift
import matplotlib.pyplot as plt

fig, axes = plot_diagnostic_summary(report, scope='overall')
plt.show()
../../_images/tutorials_notebooks_01_quick_start_10_0.png
[6]:
fig, ax = plot_peak_shift(ds.batch, ds.X, mz_values=ds.mz, max_batches=None)
ax.set_xlim(2000, 12000)
plt.show()
../../_images/tutorials_notebooks_01_quick_start_11_0.png
[7]:
fig, ax = plot_peak_shift(ds.batch, X_corrected, mz_values=ds.mz, max_batches=None)
ax.set_xlim(2000, 12000)
plt.show()
../../_images/tutorials_notebooks_01_quick_start_12_0.png

Where next#