CLI Reference#

MaldiBatchKit ships a command-line interface built on Typer, organised around two top-level commands: correct (apply a batch-correction method to a feature matrix) and diagnose (run the full diagnostic suite on a before/after pair).

maldibatchkit
├── correct
│   ├── combat                (Johnson 2007)
│   ├── combat-fortin         (Fortin 2018, covariate-aware)
│   ├── combat-chen           (Chen 2022, CovBat)
│   ├── species-combat        (Fortin preset with species)
│   ├── quality-combat        (weighted empirical-Bayes)
│   ├── limma                 (Ritchie 2015)
│   ├── harmony               (Korsunsky 2019)
│   ├── median-center
│   ├── zscore-per-batch
│   ├── reference-scaling
│   └── warping               (BatchAwareWarping)
└── diagnose

Every correct subcommand shares the same -i / -o / --batch-csv contract and adds just the flags that matter for its method. Run maldibatchkit correct <method> --help to see the full option list for any corrector.

Command Reference#

Input / Output Formats#

  • CSV - first column is the sample index, remaining columns are features. A companion --batch-csv is required (single data column; first column is the sample id).

  • NPZ - a np.savez archive with at least X, and optionally columns, index, and batch. When batch is bundled in the archive, --batch-csv becomes optional.

Covariate / Auxiliary CSVs#

The following sidecar CSVs are used by the methods that need them. Each has a sample-id column first and one or more data columns after:

Flag

Accepted columns

--species-csv

1 (species label)

--quality-csv

1 (non-negative scalar, SNR…)

--discrete-covariates-csv

>= 1 (categorical covariates)

--continuous-covariates-csv

>= 1 (numeric covariates)

--design-csv

>= 1 (Limma design of interest)

--covariates-csv

>= 1 (Harmony vars_use)

--mz-csv

1 (m/z per feature, for drift)

Indices in all sidecar CSVs should match X.index. If they do not, MaldiBatchKit falls back to positional alignment when the row counts match, and fails with a clear error otherwise.

Usage Examples#

Vanilla ComBat#

maldibatchkit correct combat \
    -i X.csv --batch-csv batch.csv \
    -o X_corrected.csv

Fortin ComBat with species as a protected categorical covariate:

maldibatchkit correct combat-fortin \
    -i X.csv --batch-csv batch.csv \
    --discrete-covariates-csv species.csv \
    -o X_corrected.csv

Species-Aware Preset#

Exactly the same effect as the previous example, minus the typing overhead:

maldibatchkit correct species-combat \
    -i X.csv --batch-csv batch.csv \
    --species-csv species.csv \
    -o X_corrected.csv

Quality-Weighted ComBat#

maldibatchkit correct quality-combat \
    -i X.csv --batch-csv batch.csv \
    --quality-csv snr.csv \
    --max-iter 30 \
    -o X_corrected.csv

Limma#

maldibatchkit correct limma \
    -i X.csv --batch-csv batch.csv \
    --design-csv species_design.csv \
    -o X_corrected.csv

Harmony#

maldibatchkit correct harmony \
    -i X.csv --batch-csv batch.csv \
    --theta 2.0 --max-iter 20 --random-state 0 \
    -o X_corrected.csv

Batch-Aware Warping#

maldibatchkit correct warping \
    -i X.csv --batch-csv batch.csv \
    --method piecewise --n-segments 8 --max-shift 10 \
    -o X_warped.csv

Diagnostics#

maldibatchkit diagnose \
    -i X.csv --corrected X_corrected.csv \
    --batch-csv batch.csv \
    --mz-csv mz.csv --top-k-peaks 40 \
    -o report.csv

NPZ End-to-End#

Bundle X, batch, index, and columns in a single archive:

np.savez("maldiset.npz",
         X=X.to_numpy(),
         columns=X.columns.to_numpy(),
         index=X.index.to_numpy(),
         batch=batch.to_numpy())

Then every correct subcommand accepts the NPZ directly:

maldibatchkit correct combat-fortin \
    -i maldiset.npz \
    --discrete-covariates-csv species.csv \
    -o corrected.npz

Refusal / Error Modes#

  • combat-fortin / combat-chen without any covariate CSV refuse to run with a hint to use plain combat instead - Fortin / CovBat without covariates reduce to Johnson.

  • species-combat without --species-csv and quality-combat without --quality-csv refuse to run.

  • Index mismatches between X and a sidecar CSV produce a clear error identifying which rows are missing, rather than silently realigning.

See the Quickstart Guide for the matching Python API and the API Reference for full class documentation.