Functions for preprocessing fmri data and preparing stimulus and fmri data for training voxel-wise encoding models.
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Preprocessing BOLD fMRI

preprocess_bold_fmri[source]

preprocess_bold_fmri(bold, mask=None, detrend=True, standardize='zscore', **kwargs)

Preprocesses BOLD data and returns ndarray of preprocessed data

Parameters

bold : path to bold nifti file or loaded bold nifti
mask : path to mask nifti file or loaded mask nifti, optional
detrend : bool, whether to linearly detrend the data, optional
standardize : {‘zscore’, ‘psc’, False}, default is ‘zscore’
kwargs : further arguments for nilearn's clean function

Returns ndarray of the preprocessed bold data in (samples, voxels)

preprocess_bold_fmri preprocessed a BOLD Nifti and returns a numpy ndarray of the optionally masked and preprocessed fMRI data.

test = [1,2,3]

make_lagged_stimulus[source]

make_lagged_stimulus(stimulus, n_lags, fill_value=nan)

Generates a lagged stimulus representation by adding nans

Aligning stimulus and fMRI

make_X_Y[source]

make_X_Y(stimuli, fmri, TR, stim_TR, lag_time=6.0, start_times=None, offset_stim=0.0, fill_value=nan, remove_nans=True)

Creates (lagged) features and fMRI matrices concatenated along runs

Parameters

stimuli : list, list of stimulus representations
fmri : list, list of fMRI ndarrays
TR : int, float, repetition time of the fMRI data in seconds
stim_TR : int, float, repetition time of the stimulus in seconds
lag_time : int, float, optional,
           lag to introduce for stimuli in seconds,
           if no lagging should be done set this to TR
start_times : list, list of int, float, optional,
              starting time of the stimuli relative to fMRI recordings in seconds
              appends fill_value to stimulus representation to match fMRI and stimulus
offset_stim : int, float, optional,
              time to offset stimulus relative to fMRI in the lagged stimulus,
              i.e. when predicting fmri at time t use only stimulus features
              before t-offset_stim. This reduces the number of time points used
              in the model.
fill_value : int, float, or any valid numpy array element, optional,
             appends fill_value to stimulus array to account for starting_time
             use np.nan here with remove_nans=True to remove fmri/stimulus samples where no stimulus was presented
remove_nans : bool, bool or float 0<=remove_nans<=1, optional
              True/False indicate whether to remove all or none
              stimulus/fmri samples that contain nans
              a proportion keeps all samples in the lagged stimulus that have
              lower number of nans than this proportion.
              Replace nans with zeros in this case.

Returns: tuple of two ndarrays, the first element are the (lagged) stimuli, the second element is the aligned fMRI data

Example

make_X_Y allows you to align the (preprocessed) fMRI and stimulus data by specifying fMRI TR and stimulus stim_TR, as well as the lag_time (how long a stimulus window should be in seconds to predict a single fMRI TR) and potential stimulus offsets. Since we potentially want to preprocess and concatenate multiple runs, both fmri and stimuli are supposed to be lists. To process only a single run, you can use a list of one element.

Let's look at an example, where the stimulus is sample every 100 ms and fMRI every 2s, i.e. every fMRI sample corresponds to 20 stimulus samples.

stim_TR, TR = 0.1, 2

Now create a simulated stimulus object of 80 samples.

stimulus = np.tile(np.arange(80)[:, None], (1, 1))
print(stimulus.shape)
(80, 1)

And an according fmri object of 4 samples and one voxel (since we TRs differ).

fmri = np.tile(np.arange(0, 4)[:, None], (1, 1))
print(fmri.shape)
(4, 1)

Let's first align fMRI and stimulus without any offset or lag:

X, y = make_X_Y([stimulus], [fmri], TR, stim_TR, lag_time=None, offset_stim=0, start_times=[0])
assert X.shape == (4, 20)
assert y.shape == (4, 1)
/home/mboos/anaconda3/envs/mne/lib/python3.7/site-packages/ipykernel_launcher.py:56: RuntimeWarning: lag_time is None or equal to TR, no stimulus lagging will be done.

We keep the original number of samples in fMRI, but represent stimulus (and hence X) by the number of samples per fmri TR: stimulus thus becomes a (4, 20) array.

Lagging the stimulus

We can now call make_X_Y with the stimulus and fMRI TRs and a specified lag_time. Here we want to use 4 seconds of the stimulus to predict fMRI, but do not want to shift fmri relative to stimulus (offset_stim is 0.). This means that our encoding model can approximate a hemodynamic response function (HRF) by estimating a finite impulse response (FIR) that is 4 seconds long.

X, y = make_X_Y([stimulus], [fmri], TR, stim_TR, lag_time=4, offset_stim=0, start_times=[0])
assert X.shape == (3, 40)
assert y.shape == (3, 1)

Shifting the stimulus

We could also shift fmri relative to stimulus, to account for the delayed onset of the hemodynamic response - this is different than estimating the hemodynamic response from the window given by lag_time. In practice this means we estimate an hemodynamic response function (HRF) by a FIR in the time period from -6s to -2s before each fMRI sample.

X, y = make_X_Y([stimulus], [fmri], TR, stim_TR, lag_time=4, offset_stim=2, start_times=[0])
assert X.shape == (2, 40)
assert y.shape == (2, 1)

Handling out-of-recording data

Because of our shift we "lose" one sample, because by default fill_value fills values that lie outside the recording interval by NaNs and by default remove_nans specifies that all samples with NaNs are dropped.

To check that behavior, we see what we get when we don't remove NaNs:

X, y = make_X_Y([stimulus], [fmri], TR, stim_TR, lag_time=4, offset_stim=2, start_times=[0], remove_nans=False)
assert X.shape == (4, 40)
assert y.shape == (4, 1)

We keep the original number of samples, but some are filled with NaNs now:

assert np.isnan(X).sum() == 60
print(X)
[[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
  nan nan nan nan]
 [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
  18. 19. nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
  nan nan nan nan]
 [20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37.
  38. 39.  0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15.
  16. 17. 18. 19.]
 [40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57.
  58. 59. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
  36. 37. 38. 39.]]

We can see that the first samples completely consists of NaNs, because by lagging and offsetting we assume that the fMRI sample at time point t can be predicted by the time period in the stimulus of t-6s to t-2s. However, we don't have any stimulus presented in that time! In the second sample we can see that the first half of the stimulus still consists of NaNs: that's because for t=2s, the time period in the stimulus from t-6s to t-2s has only data for t=0s but not t=4s. Keep in mind that the stimulus at t=0s corresponds to the first 2s of the stimulus (because we reshaped the stimulus TR to correspond to the 2s fmri TR).

generate_lagged_stimulus[source]

generate_lagged_stimulus(stimulus, fmri_samples, TR, stim_TR, lag_time=None, start_time=0.0, offset_stim=0.0, fill_value=nan)

Generates a lagged stimulus representation temporally aligned with the fMRI data

Parameters

stimuli : ndarray, stimulus representation of shape (samples, features)
fmri_samples : int, samples of corresponding fmri run
TR : int, float, repetition time of the fMRI data in seconds
stim_TR : int, float, repetition time of the stimulus in seconds
lag_time : int, float, or None, optional,
       lag to introduce for stimuli in seconds,
       if no lagging should be done set this to TR or None
start_time :  int, float, optional, default 0.
          starting time of the stimulus relative to fMRI recordings in seconds
          appends fill_value to stimulus representation to match fMRI and stimulus
offset_stim : int, float, optional, default 0.
          time to offset stimulus relative to fMRI in the lagged stimulus,
          i.e. when predicting fmri at time t use only stimulus features
          before t-offset_stim. This reduces the number of time points used
          in the model.
fill_value : int, float, or any valid numpy array element, optional, default np.nan
         appends fill_value to stimulus array to account for starting_time
         use np.nan here with remove_nans=True to remove fmri/stimulus samples where no stimulus was presented

Returns: ndarray of the lagged stimulus of shape (samples, lagged features)

generate_lagged_stimulus takes care of aligning fMRI and stimulus data, it is used internally by make_X_Y.

get_remove_idx[source]

get_remove_idx(lagged_stimulus, remove_nan=True)

Returns indices of rows in lagged_stimulus to remove