Decoupling template generation and alignment

When performing template-based alignment, one often uses the same subjects both to generate the template and to align to it. However, this can lead to overfitting and optimistic results (see Jeganathan et al.[1]). In this example, we illustrate how out-of-sample template generation can be performed seamlessly using fmralign.

To run this example, you must launch IPython via ipython --matplotlib in a terminal, or use jupyter-notebook.

Retrieve the data

In this example we use the IBC dataset, which include a large number of different contrasts maps for 12 subjects. We download the images for subjects sub-01, sub-02 and sub-04 (or retrieve them if they were already downloaded).

from nilearn.maskers import NiftiMasker

from fmralign.fetch_example_data import fetch_ibc_subjects_contrasts

subjects = ["sub-01", "sub-02", "sub-04"]
files, df, mask = fetch_ibc_subjects_contrasts(subjects)
masker = NiftiMasker(mask_img=mask).fit()
[get_dataset_dir] Dataset found in /home/runner/nilearn_data/ibc

Generating an in-sample template

First, we generate templates using all subjects. This is the standard approach, and the fastest in terms of number of alignments to perform since only one template is generated for the whole group. We split the data into alignment data (task “archi_standard”) and test data (task “archi_spatial”).

from fmralign import GroupAlignment
from fmralign.embeddings.parcellation import get_labels

X_alignment = {
    sub: masker.transform(
        df[(df.subject == sub) & (df.task == "archi_standard")].path
    )
    for sub in subjects
}
X_test = {
    sub: masker.transform(
        df[(df.subject == sub) & (df.task == "archi_spatial")].path
    )
    for sub in subjects
}

# Use only the first image to speed up the computation of the labels
labels = get_labels(
    masker.inverse_transform(X_alignment["sub-01"]),
    n_pieces=150,
    masker=masker,
)

population_algo = GroupAlignment("procrustes", labels=labels)
population_algo.fit(X_alignment, y="template")
/home/runner/work/fmralign/fmralign/fmralign/embeddings/parcellation.py:82: UserWarning: Overriding provided-default estimator parameters with provided masker parameters :
Parameter mask_strategy :
    Masker parameter background - overriding estimator parameter epi
Parameter smoothing_fwhm :
    Masker parameter None - overriding estimator parameter 4.0

  parcellation.fit(images_to_parcel)
/home/runner/work/fmralign/fmralign/.venv/lib/python3.12/site-packages/sklearn/cluster/_agglomerative.py:321: UserWarning: the number of connected components of the connectivity matrix is 35 > 1. Completing it to avoid stopping the tree early.
  connectivity, n_connected_components = _fix_connectivity(
/home/runner/work/fmralign/fmralign/fmralign/alignment/utils.py:212: UserWarning:
 Some parcels are more than 1000 voxels wide it can slow down alignment,especially optimal_transport :
 parcel 1 : 1011 voxels
 parcel 7 : 1315 voxels
 parcel 30 : 1696 voxels
 parcel 43 : 1846 voxels
  warnings.warn(warning)
GroupAlignment(labels=array([96, 96, 86, ..., 21, 43, 43]), method='procrustes')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Generating an out-of-sample template

Now, we generate template alignments in a leave-one-subject-out fashion. For each subject, we generate a template using all other subjects, and align the left-out subject to this template. This ensures that template generation and alignments between the template and subjects are decoupled.

loso_algo = GroupAlignment("procrustes", labels=labels)
loso_algo.fit(X_alignment, y="leave_one_subject_out")
GroupAlignment(labels=array([96, 96, 86, ..., 21, 43, 43]), method='procrustes')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Aligning test data

Finally, we align the test data for each subject to both templates and compute the average aligned test data across subjects.

import numpy as np

aligned_in_sample = population_algo.transform(X_test)
aligned_out_of_sample = loso_algo.transform(X_test)

average_in_sample = np.mean(
    [aligned_in_sample[sub] for sub in subjects], axis=0
)
average_out_of_sample = np.mean(
    [aligned_out_of_sample[sub] for sub in subjects], axis=0
)

Comparing the correlations

We compare the average correlation of the transformed data across subjects for both in-sample and out-of-sample template generation strategies.

import matplotlib.pyplot as plt
from nilearn import plotting

from fmralign.metrics import score_voxelwise

score_in_sample = np.mean(
    [
        score_voxelwise(img, average_in_sample, loss="corr")
        for img in aligned_in_sample.values()
    ],
    axis=0,
)
score_in_sample_img = masker.inverse_transform(score_in_sample)
score_out_of_sample = np.mean(
    [
        score_voxelwise(img, average_out_of_sample, loss="corr")
        for img in aligned_out_of_sample.values()
    ],
    axis=0,
)
score_out_of_sample_img = masker.inverse_transform(score_out_of_sample)

fig, axes = plt.subplots(2, 1, figsize=(8, 12))

plotting.plot_stat_map(
    score_in_sample_img,
    display_mode="z",
    cut_coords=[-5, -15],
    vmax=1,
    title="Inter-Subject Correlations (In-sample Template)",
    axes=axes[0],
    colorbar=True,
)
plotting.plot_stat_map(
    score_out_of_sample_img,
    display_mode="z",
    cut_coords=[-5, -15],
    vmax=1,
    title="Inter-Subject Correlations (Out-of-sample Template)",
    axes=axes[1],
    colorbar=True,
)
plt.show()
plot template bias

The direct comparisons of voxelwise inter-subject correlations show little difference between in-sample and out-of-sample template generation strategies. That is good news, as it indicates that in-sample template generation does not lead to a large bias in this case. To better visualize the differences, we can use nilearn’s plot_img_comparison

plotting.img_comparison.plot_img_comparison(
    score_in_sample_img,
    score_out_of_sample_img,
    masker=masker,
    ref_label="In-sample Template",
    src_label="Out-of-sample Template",
)

plt.show()
Pearson's R: 0.56, Histogram of imgs values

We are now able to see that in-sample template generation leads to slightly higher inter-subject correlations after alignment, indicating a small bias. To conclude, out-of-sample template generation avoids this bias at the cost of having to perform more alignments. As the number of subjects increases, the difference between both strategies narrows. However, when dealing with small datasets, out-of-sample template generation is recommended to avoid overly optimistic results.

References

Total running time of the script: (1 minutes 46.530 seconds)

Gallery generated by Sphinx-Gallery