CLI Reference
=============
MaldiBatchKit ships a command-line interface built on `Typer `_,
organised around two top-level commands: ``correct`` (apply a
batch-correction method to a feature matrix) and ``diagnose`` (run the
full diagnostic suite on a before/after pair).
.. code-block:: text
maldibatchkit
├── correct
│ ├── combat (Johnson 2007)
│ ├── combat-fortin (Fortin 2018, covariate-aware)
│ ├── combat-chen (Chen 2022, CovBat)
│ ├── species-combat (Fortin preset with species)
│ ├── quality-combat (weighted empirical-Bayes)
│ ├── limma (Ritchie 2015)
│ ├── harmony (Korsunsky 2019)
│ ├── median-center
│ ├── zscore-per-batch
│ ├── reference-scaling
│ └── warping (BatchAwareWarping)
└── diagnose
Every ``correct`` subcommand shares the same
``-i / -o / --batch-csv`` contract and adds just the flags that matter
for its method. Run ``maldibatchkit correct --help`` to see the
full option list for any corrector.
Command Reference
-----------------
.. click:: maldibatchkit.cli:typer_click_object
:prog: maldibatchkit
:nested: full
Input / Output Formats
----------------------
* **CSV** - first column is the sample index, remaining columns are
features. A companion ``--batch-csv`` is required (single data column;
first column is the sample id).
* **NPZ** - a ``np.savez`` archive with at least ``X``, and optionally
``columns``, ``index``, and ``batch``. When ``batch`` is bundled in
the archive, ``--batch-csv`` becomes optional.
Covariate / Auxiliary CSVs
--------------------------
The following sidecar CSVs are used by the methods that need them. Each
has a sample-id column first and one or more data columns after:
+--------------------------------+----------------------------------+
| Flag | Accepted columns |
+================================+==================================+
| ``--species-csv`` | 1 (species label) |
+--------------------------------+----------------------------------+
| ``--quality-csv`` | 1 (non-negative scalar, SNR...) |
+--------------------------------+----------------------------------+
| ``--discrete-covariates-csv`` | >= 1 (categorical covariates) |
+--------------------------------+----------------------------------+
| ``--continuous-covariates-csv``| >= 1 (numeric covariates) |
+--------------------------------+----------------------------------+
| ``--design-csv`` | >= 1 (Limma design of interest) |
+--------------------------------+----------------------------------+
| ``--covariates-csv`` | >= 1 (Harmony vars_use) |
+--------------------------------+----------------------------------+
| ``--mz-csv`` | 1 (m/z per feature, for drift) |
+--------------------------------+----------------------------------+
Indices in all sidecar CSVs should match ``X.index``. If they do not,
MaldiBatchKit falls back to positional alignment when the row counts
match, and fails with a clear error otherwise.
Usage Examples
--------------
Vanilla ComBat
^^^^^^^^^^^^^^
.. code-block:: bash
maldibatchkit correct combat \
-i X.csv --batch-csv batch.csv \
-o X_corrected.csv
Fortin ComBat with species as a protected categorical covariate:
.. code-block:: bash
maldibatchkit correct combat-fortin \
-i X.csv --batch-csv batch.csv \
--discrete-covariates-csv species.csv \
-o X_corrected.csv
Species-Aware Preset
^^^^^^^^^^^^^^^^^^^^
Exactly the same effect as the previous example, minus the typing
overhead:
.. code-block:: bash
maldibatchkit correct species-combat \
-i X.csv --batch-csv batch.csv \
--species-csv species.csv \
-o X_corrected.csv
Quality-Weighted ComBat
^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
maldibatchkit correct quality-combat \
-i X.csv --batch-csv batch.csv \
--quality-csv snr.csv \
--max-iter 30 \
-o X_corrected.csv
Limma
^^^^^
.. code-block:: bash
maldibatchkit correct limma \
-i X.csv --batch-csv batch.csv \
--design-csv species_design.csv \
-o X_corrected.csv
Harmony
^^^^^^^
.. code-block:: bash
maldibatchkit correct harmony \
-i X.csv --batch-csv batch.csv \
--theta 2.0 --max-iter 20 --random-state 0 \
-o X_corrected.csv
Batch-Aware Warping
^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
maldibatchkit correct warping \
-i X.csv --batch-csv batch.csv \
--method piecewise --n-segments 8 --max-shift 10 \
-o X_warped.csv
Diagnostics
^^^^^^^^^^^
.. code-block:: bash
maldibatchkit diagnose \
-i X.csv --corrected X_corrected.csv \
--batch-csv batch.csv \
--mz-csv mz.csv --top-k-peaks 40 \
-o report.csv
NPZ End-to-End
^^^^^^^^^^^^^^
Bundle ``X``, ``batch``, ``index``, and ``columns`` in a single archive:
.. code-block:: python
np.savez("maldiset.npz",
X=X.to_numpy(),
columns=X.columns.to_numpy(),
index=X.index.to_numpy(),
batch=batch.to_numpy())
Then every ``correct`` subcommand accepts the NPZ directly:
.. code-block:: bash
maldibatchkit correct combat-fortin \
-i maldiset.npz \
--discrete-covariates-csv species.csv \
-o corrected.npz
Refusal / Error Modes
---------------------
* ``combat-fortin`` / ``combat-chen`` without any covariate CSV refuse
to run with a hint to use plain ``combat`` instead - Fortin / CovBat
without covariates reduce to Johnson.
* ``species-combat`` without ``--species-csv`` and ``quality-combat``
without ``--quality-csv`` refuse to run.
* Index mismatches between ``X`` and a sidecar CSV produce a clear
error identifying which rows are missing, rather than silently
realigning.
See the :doc:`Quickstart Guide ` for the matching Python API
and the :doc:`API Reference ` for full class documentation.