Functions for training independent Ridge regressions for a large number of voxels and validating their performance

get_model_plus_scores[source]

get_model_plus_scores(X, y, estimator=None, cv=None, scorer=None, voxel_selection=True, validate=True, **kwargs)

Returns multiple estimator trained in a cross-validation on n_splits of the data and scores on the left-out folds

Parameters

X : ndarray of shape (samples, features)
y : ndarray of shape (samples, targets)
estimator : None or estimator object that implements fit and predict
            if None, uses RidgeCV per default
cv : int, None, or a cross-validation object that implements a split method, default is None, optional.
     int specifies the number of cross-validation splits of a KFold cross validation
     None defaults to a scikit-learn KFold cross-validation with default settings
     a scikit-learn-like cross-validation object needs to implement a split method for X and y
scorer : None or any sci-kit learn compatible scoring function, optional
         default uses product moment correlation
voxel_selection : bool, optional, default True
                  Whether to only use voxels with variance larger than zero.
                  This will set scores for these voxels to zero.
validate : bool, optional, default True
             Whether to validate the model via cross-validation
             or to just train the estimator
             if False, scores will be computed on the training set
kwargs : additional parameters that will be used to initialize RidgeCV if estimator is None

Returns tuple of n_splits estimators trained on training folds or single estimator if validation is False and scores for all concatenated out-of-fold predictions

get_model_plus_scores is a convenience function that trains multiple Ridge regressions in a cross-validation scheme and evaluates their performance on the respective test set.

Examples

First, we create some simulated stimulus and fmri data.

stimulus = np.random.randn(1000, 5)
fmri = np.random.randn(1000, 10)

Using the default Ridge regression

We can now use get_model_plus_scores to estimate multiple RidgeCV regressions, one for each voxel (that maps the stimulus representation to this voxel) and one for each split (trained on a different training set and evaluated on the held-out set). Since sklearn's RidgeCV estimator allows multi-output, we get one RidgeCV object per split.

ridges, scores = get_model_plus_scores(stimulus, fmri, cv=3)
assert len(ridges) == 3
ridges
[RidgeCV(alphas=array([ 0.1,  1. , 10. ])),
 RidgeCV(alphas=array([ 0.1,  1. , 10. ])),
 RidgeCV(alphas=array([ 0.1,  1. , 10. ]))]

Each RidgeCV estimator maps from the feature space to each voxel. In our example, that means it has 10 (the number of voxels-9 independently trained regression models with 5 coeficients each (the number of features).

assert ridges[0].coef_.shape == (10, 5)
print(ridges[0].coef_)
[[ 0.01930266  0.0350985  -0.04548384 -0.01058159 -0.07382483]
 [ 0.00183012 -0.00830046 -0.03604675 -0.00016843  0.03161116]
 [-0.04032306  0.01782385  0.02112695  0.01673908 -0.00645515]
 [ 0.0273047  -0.02382577 -0.06169262  0.06232742 -0.03331368]
 [ 0.01294108 -0.04825337 -0.04646228 -0.04701512 -0.00017405]
 [ 0.02008884 -0.07065883  0.01958404 -0.04115758 -0.02967363]
 [ 0.00502653 -0.02164034 -0.00419562 -0.05675778  0.00716245]
 [ 0.0080379   0.03230623  0.01527909 -0.02469508 -0.01681562]
 [ 0.01363082  0.02686557 -0.05923971  0.01392573 -0.00945206]
 [ 0.01665226 -0.01499506 -0.0043113  -0.01658976  0.06103525]]

We also get a set of scores (by default the product moment correlation, but you can supply your own via the scorer argument) that specifies how well we predict left-out data (with the usual caveats of using a correlation coefficient for evaluating it). In our case it is of shape (10, 3) because we predict 10 voxels and use a 3-fold cross-validation, i.e. we split 3 times.

assert scores.shape == (10, 3)
scores
array([[ 0.11195214,  0.10260332,  0.02153565],
       [-0.06450754, -0.07106461, -0.0447989 ],
       [ 0.00559572, -0.03221425,  0.02866726],
       [-0.04101258, -0.02197306, -0.04277958],
       [ 0.02352969,  0.02008923, -0.02062713],
       [ 0.01027339,  0.03074076,  0.01248573],
       [-0.05974497, -0.03980094, -0.11293944],
       [-0.01607721, -0.02264425, -0.07340733],
       [-0.06009815, -0.05553956,  0.02102434],
       [ 0.02388894, -0.01513094,  0.0904367 ]])

We can also change the parameters of the RidgeCV function. For example, we can use pre-specified hyperparameters, like the values of the regularization parameter $\alpha$ we want to perform a gridsearch over or whether we want to normalize features. If we want to use other parameters for the default RidgeCV, we can just pass the parameters as additional keyword arguments:

alphas = [100]
ridges, scores = get_model_plus_scores(stimulus, fmri, alphas=alphas,
                                       normalize=True, alpha_per_target=True)
assert ridges[0].normalize
assert ridges[0].alphas.shape == (1,)

Using your own estimator

Additionally, we can use any other estimator that implements fit and predict. For example, we can use CCA as an encoding model.

from sklearn import cross_decomposition

our_estimator = cross_decomposition.CCA(n_components=2)

ccas, scores = get_model_plus_scores(stimulus, fmri, our_estimator,
                                     cv=3)
assert type(ccas[0]) == cross_decomposition._pls.CCA

If your favorite estimator does not work in the multioutput regime, i.e. it cannot predict multiple targets/voxels, then get_model_plus_scores will wrap it into sklearn's MultiOutputRegressor by default. However, for many voxels this can increase training time by a lot.

from sklearn.linear_model import Lasso
from sklearn.multioutput import MultiOutputRegressor

our_estimator = MultiOutputRegressor(Lasso())

lassos, scores = get_model_plus_scores(stimulus, fmri, our_estimator,
                                       cv=3)
lassos
[MultiOutputRegressor(estimator=Lasso()),
 MultiOutputRegressor(estimator=Lasso()),
 MultiOutputRegressor(estimator=Lasso())]

Training without validation

We can also train an estimator without any validation, if, for example we want to test on a different dataset. In that case, the scores will be computed with the trained estimator on the training set, i.e. they will contain no information about the generalization performance of the estimator.

our_estimator = RidgeCV()

model, scores = get_model_plus_scores(stimulus, fmri, our_estimator,
                                       validate=False)
assert type(model) == RidgeCV
assert scores.shape == (10,)

Using your own cross-validation method

Instead of the default KFold cross-validation without shuffling, we can also use any sckit-learn compatible cross-validation iterators (e.g. these). For example, we could use a TimeSerisSplit to test our predictions on only the most recent part of the data.

from sklearn.model_selection import TimeSeriesSplit

ts_cv = TimeSeriesSplit(n_splits=5)

model, scores = get_model_plus_scores(stimulus, fmri, cv=ts_cv)
assert scores.shape == (10, 5)

Distributed training

Voxel-wise encoding models can take a long time and a lot of memory to train, especially if we use the full brain or high resolution fMRI data.

The BlockMultiOutput class can help distribute the load across multiple cores by splitting the fMRI data into multiple "blocks" (the n_blocks parameter) and training an estimator for each block. Without parallelization, this class allows one to train voxel-wise encoding models, even if training a single, large estimator takes up too much memory, by training the estimator for blocks of your data independently.

This works even if the original fMRI data do not fit into memory, by using a memmapped Numpy array.

class BlockMultiOutput[source]

BlockMultiOutput(estimator, n_blocks=10, n_jobs=1) :: MultiOutputRegressor

Multi target regression with block-wise fit This strategy consists of splitting the targets in blocks and fitting one regressor per block. The estimator used needs to natively support multioutput.

Parameters

estimator : estimator object
    An estimator object implementing `fit` and `predict` and supporting multioutput.
n_blocks : int, optional, default=10
    The number of blocks for the target variable.
    This is a split along *targets* (columns of the array), not observations (rows of the array).
n_jobs : int, optional, default=1
    The number of jobs to run in parallel for `fit`. If -1,
    then the number of jobs is set to the number of cores.
    When individual estimators are fast to train or predict
    using `n_jobs>1` can result in slower performance due
    to the overhead of spawning processes.

Example

Let's generate a larger data set.

our_estimator = BlockMultiOutput(RidgeCV(alphas=[10,100]))

estimators, scores = get_model_plus_scores(stimulus, fmri, our_estimator,
                                     cv=3)
assert len(estimators) == 3
assert len(estimators[0].estimator.alphas) == 2

Each BlockMultiOutput estimator contains n_blocks estimators that are trained on different blocks of the target.

assert len(estimators[0].estimators_) == estimators[0].n_blocks
estimators[0].estimators_
[RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100])),
 RidgeCV(alphas=array([ 10, 100]))]

If fmri is of shape $(n\_samples, n\_targets)$, each of the n_blocks estimators in BlockMultiOutput.estimators_ will contain the coefficients for ${n\_targets}/{n\_blocks}$ targets.

assert estimators[0].estimators_[0].coef_.shape == (1, 5)
print('fmri shape: {} \nn_blocks: {} \n'
      'coefficients of the estimator for one block: {}'.format(
          fmri.shape, our_estimator.n_blocks, estimators[0].estimators_[0].coef_.shape))
fmri shape: (1000, 10) 
n_blocks: 10 
coefficients of the estimator for one block: (1, 5)

We can use MultiBlockOutput instance normally to predict data, i.e. it produces predictions of the full fmri data by concatenating the predictions of every block-estimator.

assert estimators[0].predict(stimulus).shape == (1000, 10)
(1000, 10)

Parallelizing voxel-wise encoding models

We can use this to parallelize encoding models as well, by specifying the n_jobs parameter. Keep in mind that this requires copying the full stimulus data to every worker and can thus increase memory demand.

our_estimator = BlockMultiOutput(RidgeCV(alphas=[10,100]), n_jobs=10)

estimators, scores = get_model_plus_scores(stimulus, fmri, our_estimator,
                                     cv=3)
assert len(estimators) == 3
assert estimators[0].n_jobs == 10