CLI Reference#
MaldiBatchKit ships a command-line interface built on Typer,
organised around two top-level commands: correct (apply a
batch-correction method to a feature matrix) and diagnose (run the
full diagnostic suite on a before/after pair).
maldibatchkit
├── correct
│ ├── combat (Johnson 2007)
│ ├── combat-fortin (Fortin 2018, covariate-aware)
│ ├── combat-chen (Chen 2022, CovBat)
│ ├── species-combat (Fortin preset with species)
│ ├── quality-combat (weighted empirical-Bayes)
│ ├── limma (Ritchie 2015)
│ ├── harmony (Korsunsky 2019)
│ ├── median-center
│ ├── zscore-per-batch
│ ├── reference-scaling
│ └── warping (BatchAwareWarping)
└── diagnose
Every correct subcommand shares the same
-i / -o / --batch-csv contract and adds just the flags that matter
for its method. Run maldibatchkit correct <method> --help to see the
full option list for any corrector.
Command Reference#
Input / Output Formats#
CSV - first column is the sample index, remaining columns are features. A companion
--batch-csvis required (single data column; first column is the sample id).NPZ - a
np.savezarchive with at leastX, and optionallycolumns,index, andbatch. Whenbatchis bundled in the archive,--batch-csvbecomes optional.
Covariate / Auxiliary CSVs#
The following sidecar CSVs are used by the methods that need them. Each has a sample-id column first and one or more data columns after:
Flag |
Accepted columns |
|---|---|
|
1 (species label) |
|
1 (non-negative scalar, SNR…) |
|
>= 1 (categorical covariates) |
|
>= 1 (numeric covariates) |
|
>= 1 (Limma design of interest) |
|
>= 1 (Harmony vars_use) |
|
1 (m/z per feature, for drift) |
Indices in all sidecar CSVs should match X.index. If they do not,
MaldiBatchKit falls back to positional alignment when the row counts
match, and fails with a clear error otherwise.
Usage Examples#
Vanilla ComBat#
maldibatchkit correct combat \
-i X.csv --batch-csv batch.csv \
-o X_corrected.csv
Fortin ComBat with species as a protected categorical covariate:
maldibatchkit correct combat-fortin \
-i X.csv --batch-csv batch.csv \
--discrete-covariates-csv species.csv \
-o X_corrected.csv
Species-Aware Preset#
Exactly the same effect as the previous example, minus the typing overhead:
maldibatchkit correct species-combat \
-i X.csv --batch-csv batch.csv \
--species-csv species.csv \
-o X_corrected.csv
Quality-Weighted ComBat#
maldibatchkit correct quality-combat \
-i X.csv --batch-csv batch.csv \
--quality-csv snr.csv \
--max-iter 30 \
-o X_corrected.csv
Limma#
maldibatchkit correct limma \
-i X.csv --batch-csv batch.csv \
--design-csv species_design.csv \
-o X_corrected.csv
Harmony#
maldibatchkit correct harmony \
-i X.csv --batch-csv batch.csv \
--theta 2.0 --max-iter 20 --random-state 0 \
-o X_corrected.csv
Batch-Aware Warping#
maldibatchkit correct warping \
-i X.csv --batch-csv batch.csv \
--method piecewise --n-segments 8 --max-shift 10 \
-o X_warped.csv
Diagnostics#
maldibatchkit diagnose \
-i X.csv --corrected X_corrected.csv \
--batch-csv batch.csv \
--mz-csv mz.csv --top-k-peaks 40 \
-o report.csv
NPZ End-to-End#
Bundle X, batch, index, and columns in a single archive:
np.savez("maldiset.npz",
X=X.to_numpy(),
columns=X.columns.to_numpy(),
index=X.index.to_numpy(),
batch=batch.to_numpy())
Then every correct subcommand accepts the NPZ directly:
maldibatchkit correct combat-fortin \
-i maldiset.npz \
--discrete-covariates-csv species.csv \
-o corrected.npz
Refusal / Error Modes#
combat-fortin/combat-chenwithout any covariate CSV refuse to run with a hint to use plaincombatinstead - Fortin / CovBat without covariates reduce to Johnson.species-combatwithout--species-csvandquality-combatwithout--quality-csvrefuse to run.Index mismatches between
Xand a sidecar CSV produce a clear error identifying which rows are missing, rather than silently realigning.
See the Quickstart Guide for the matching Python API and the API Reference for full class documentation.