Choosing a corrector ==================== MaldiBatchKit ships several correctors with overlapping use cases. The v0.2 ``AutoCorrector`` and ``BatchCorrectionBenchmark`` classes are the two recommended tools for picking among them, depending on whether the deciding signal is **downstream task performance** or **post-correction diagnostics**. .. contents:: :local: When to use which ----------------- * Use :class:`~maldibatchkit.AutoCorrector` + ``GridSearchCV`` when the question is *"which corrector gives the best AMR classifier?"*. The scorer is your downstream metric (typically AUROC), and the split defines the generalisation goal. * Use :class:`~maldibatchkit.diagnostics.BatchCorrectionBenchmark` when the question is *"how well does each method mix batches and preserve species?"*. The output is a tidy table suitable for paper figures and side-by-side reporting; it does **not** simulate a downstream classifier. Neither tool simulates a *new clinical site* — both rely on every batch being present at fit time for the inter-batch correctors we ship. AutoCorrector with GridSearchCV ------------------------------- ``AutoCorrector`` exposes ``method`` as a settable hyperparameter that swaps the inner corrector at ``fit`` time. Wrap it in a :class:`sklearn.pipeline.Pipeline` and let ``GridSearchCV`` sweep the method together with any classifier hyperparameters: .. code-block:: python from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV, StratifiedKFold from sklearn.pipeline import Pipeline from maldibatchkit import AutoCorrector # X: (n_samples, n_features) DataFrame; batch & species aligned to X.index pipe = Pipeline([ ("correct", AutoCorrector(batch=batch, discrete_covariates=species)), ("clf", LogisticRegression(max_iter=1000)), ]) param_grid = { "correct__method": [ "noop", # honest baseline "median", "combat-fortin", "combat-johnson", "harmony", "qw-combat", # accepts `quality=` ], "clf__C": [0.1, 1.0, 10.0], } grid = GridSearchCV( pipe, param_grid=param_grid, scoring="roc_auc", cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=0), ) grid.fit(X, y) print(grid.best_params_, grid.best_score_) Covariate routing ^^^^^^^^^^^^^^^^^ ``AutoCorrector`` accepts ``discrete_covariates``, ``continuous_covariates``, ``quality``, ``species``, and ``reference_batch`` and forwards each to the inner method *only when the inner accepts that name*. The mapping table is: ============================ ========================================== ``AutoCorrector`` argument Forwarded as ... ============================ ========================================== ``discrete_covariates`` ComBat ``discrete_covariates``, Harmony ``covariates``, Limma ``design`` ``continuous_covariates`` ComBat ``continuous_covariates`` ``quality`` ``QualityWeightedComBat.quality`` ``species`` ``SpeciesAwareComBat.species`` ``reference_batch`` Any inner that accepts ``reference_batch`` ============================ ========================================== Pass anything else through ``method_kwargs={...}``. Unrecognised entries are dropped silently (sklearn convention), so a single ``param_grid`` can target several methods. BatchCorrectionBenchmark ------------------------ ``BatchCorrectionBenchmark`` is the diagnostic counterpart: configure once, then ``.fit(X, batch=..., species=...)`` to run every corrector under the chosen protocol. .. code-block:: python from maldibatchkit import ComBat, Harmony, Limma, NoOpCorrector from maldibatchkit.diagnostics import BatchCorrectionBenchmark bench = BatchCorrectionBenchmark( correctors={ "none": NoOpCorrector(batch=batch), "limma": Limma(batch=batch, design=species_dummies), "fortin": ComBat(batch=batch, method="fortin", discrete_covariates=species), "harmony": Harmony(batch=batch, covariates=species, verbose=False), }, metrics=("kbet", "lisi_normalized", "species_preservation", "tic_cov_per_batch"), n_bootstrap=500, random_state=0, ) bench.fit(X, batch=batch, species=species) print(bench.rank(by="species_preservation")) bench.plot() # bar facet per metric, error bars from the bootstrap CIs Protocols ^^^^^^^^^ ``protocol="full_data"`` (default) follows the Büttner-2019 convention: fit each corrector on all rows, score on the corrected matrix. This is the right setting for paper figures. ``protocol="stratified_split"`` holds out a stratified test fold per repeat, fits each corrector on train, transforms train + test, and scores on the held-out rows. Useful for checking that a method's benefits survive on unseen samples. Aggregation ^^^^^^^^^^^ Two tables are always available after ``fit``: * ``bench.results_long_`` — one row per (method, metric, repeat, bootstrap iteration). Use it to compute custom statistics or to drive a raincloud / strip plot. * ``bench.results_`` — one row per (method, metric) with ``value`` (mean), ``ci_lo`` / ``ci_hi`` (2.5 / 97.5 percentile), ``std``, ``n``, and ``better`` (``'higher'``, ``'lower'``, or ``'n/a'`` for user callables). The interval columns hold ``NaN`` when fewer than two bootstraps and fewer than three repeats are available — don't read a CI from one point. ``bench.baseline_`` is a parallel one-row-per-metric table computed on the **uncorrected** ``X`` and is useful as a reference line in plots. Bootstrap modes ^^^^^^^^^^^^^^^ ``bootstrap_mode="resample_metric"`` (default, fast) fits each corrector once and resamples rows of the corrected matrix to give the CI. Use it when you want CIs that reflect metric sampling noise on a fixed fit. ``bootstrap_mode="refit"`` resamples rows of ``X`` (stratified by batch) and refits the corrector on each bootstrap. Slower (``n_correctors × n_bootstrap`` extra fits per repeat), but the CI also captures corrector stability. Use it when you want to advertise a method's reproducibility.