Visualization Module#
Plotting helpers for before/after batch-effect inspection. All plotting
functions use lazy matplotlib imports, so matplotlib is only
required when a plot function is actually called.
plot_batch_umap additionally requires umap-learn and can be
installed via pip install maldibatchkit[viz].
UMAP Before/After#
- maldibatchkit.viz.plot_batch_umap(before, after, batch, *, color_by='batch', species=None, random_state=42, pca_preprocess=50, ax=None)[source]#
Plot UMAP embeddings of
Xbefore and after batch correction.- Parameters:
before (
DataFrame|Series|ndarray[tuple[Any,...],dtype[Any]]) – Feature matrix prior to correction.after (
DataFrame|Series|ndarray[tuple[Any,...],dtype[Any]]) – Feature matrix after correction. Must sharebefore’s shape.batch (
DataFrame|Series|ndarray[tuple[Any,...],dtype[Any]]) – Batch labels.color_by (
str) – Which label drives colouring.'species'requires passingspeciesexplicitly.species (
DataFrame|Series|ndarray[tuple[Any,...],dtype[Any]] |None) – Species labels. Required whencolor_by='species'.random_state (
int) – Seed for UMAP.pca_preprocess (
int|None) – If not None, reduce to this many PCs before UMAP. Gives a large speed-up on typical MALDI-TOF matrices without hurting the plot qualitatively.ax (
tuple[Any,Any] |None) – Two axes to draw on. If None, a new 1x2 figure is created.
- Return type:
- Returns:
fig (matplotlib.figure.Figure) – The figure used (or the parent figure of the provided axes).
axes (tuple of matplotlib.axes.Axes) – The two axes that were drawn on.
Peak-Shape Overlay#
- maldibatchkit.viz.plot_peak_shift(batches, X, reference=None, *, mz_values=None, ax=None, max_batches=6)[source]#
Overlay per-batch median spectra against a reference spectrum.
- Parameters:
batches (
DataFrame|Series|ndarray[tuple[Any,...],dtype[Any]]) – Batch labels.X (
DataFrame|Series|ndarray[tuple[Any,...],dtype[Any]]) – Binned intensities.reference (
DataFrame|Series|ndarray[tuple[Any,...],dtype[Any]] |None) – Reference spectrum. If None, the global median across all rows is used.mz_values (
DataFrame|Series|ndarray[tuple[Any,...],dtype[Any]] |None) – m/z coordinates for the x-axis. Defaults to column positions.ax (
Any|None) – Axis to draw on. If None, a new figure is created.max_batches (
int|None) – Maximum number of batches to overlay. Oldest ties broken alphabetically. Set to None to draw every batch (slow on many-batch studies).
- Returns:
fig, ax
- Return type:
Diagnostic Summary#
- maldibatchkit.viz.plot_diagnostic_summary(report_df, *, scope='overall', ncols=None, figsize_per_plot=(3.2, 3.0), axes=None)[source]#
Plot before/after diagnostic values, one subplot per metric.
- Parameters:
report_df (
DataFrame) – Output ofmaldibatchkit.diagnostics.diagnostic_report().scope (
Union[str,Iterable[str]]) – Which slice(s) of the report to plot. Pass a single scope name ("overall","batch_00", …) to render one pair of bars per metric, or a list (["batch_00", "batch_01"]) to group bars by scope inside each metric’s subplot.ncols (
int|None) – Number of subplot columns. Defaults tomin(n_metrics, 4).figsize_per_plot (
tuple[float,float]) – Width, height of each metric’s subplot (inches). The returned figure has size(ncols * w, nrows * h).axes (
Any|None) – Pre-built axes grid. Must have at leastn_metricsentries when flattened.
- Return type:
- Returns:
fig (matplotlib.figure.Figure)
axes (np.ndarray of matplotlib.axes.Axes) – Flattened array of the axes actually used. Unused slots from a non-rectangular grid are turned off.
Example#
from maldibatchkit import SpeciesAwareComBat
from maldibatchkit.diagnostics import diagnostic_report
from maldibatchkit.viz import (
plot_batch_umap,
plot_peak_shift,
plot_diagnostic_summary,
)
corrector = SpeciesAwareComBat(batch=batch, species=species)
X_corrected = corrector.fit_transform(X)
# Side-by-side UMAP of raw vs. corrected matrices
plot_batch_umap(X, X_corrected, batch, random_state=0)
# Per-batch median spectra overlaid on a reference
plot_peak_shift(batch, X_corrected, mz_values=mz)
# Before/after bar chart built from a diagnostic_report DataFrame
report = diagnostic_report(X, X_corrected, batch)
plot_diagnostic_summary(report, scope="overall")