Model selection

Multi-Modal Splitter

class imml.model_selection.MMSplitter(splitter, return_type: str = 'sets')[source]

Generic bridge between scikit-learn splitters and multi-modal inputs.

This helper receives any scikit-learn splitter (such as StratifiedKFold) and yields splits. A single set of train/test indices is computed by the splitter and applied to every modality, guaranteeing aligned partitions across all modalities.

Parameters:
  • splitter (object) -- Any object implementing scikit-learn's splitter interface, for example KFold, StratifiedKFold, GroupKFold or ShuffleSplit.

  • return_type (str, default="sets") -- Controls what each yielded item contains: "sets" returns the actual partition sets, while "indices" return the indices of the partition sets.

Example

>>> import numpy as np
>>> from sklearn.model_selection import StratifiedKFold
>>> from imml.model_selection import MMSplitter
>>> Xs = [np.random.rand(100, 10), np.random.rand(100, 20)]
>>> y = np.random.randint(0, 2, 100)
>>> splitter = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
>>> for Xs_train, Xs_test, y_train, y_test in MMSplitter(splitter=splitter).split(Xs, y):
...     pass
get_n_splits(X=None, y=None, groups=None)[source]

Returns the number of splitting iterations as set with the n_splits param.

Parameters:
  • Xs (Always ignored, exists for API compatibility.)

  • y (Always ignored, exists for API compatibility.)

  • groups (Always ignored, exists for API compatibility.)

Returns:

n_splits -- Returns the number of splitting iterations.

Return type:

int

split(Xs, y=None, groups=None)[source]

Generate indices to split data into training and test set.

Parameters:
  • Xs (list of array-like) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different modalities.

  • y (array-like of shape (n_samples,), optional) -- Target vector relative to Xs.

  • groups (array-like, optional) -- Group labels passed to splitter.split(...).

Returns:

One tuple per split according to return_type.

Return type:

tuple

Train-Test Multi-Modal Split

imml.model_selection.train_test_mm_split(Xs, y=None, **kwargs)[source]

Split multi-modal datasets and labels into train and test sets.

Similar to sklearn's train_test_split, but works with lists of arrays/data (Xs) and single arrays (y). Ensures that all X in a Xs get the same train/test split indices.

Parameters:
  • Xs (list of array-like) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different modalities.

  • y (array-like of shape (n_samples,), optional) -- Target vector relative to Xs.

  • **kwargs (dict) -- Additional keyword arguments to pass to sklearn's train_test_split.

Returns:

Splitting results in the same order as inputs: - For each list input (Xs): (list_train, list_test) - For each array input (y): (array_train, array_test)

Return type:

tuple

Example

>>> import numpy as np
>>> from imml.model_selection import train_test_mm_split
>>> Xs = [np.random.rand(100, 10), np.random.rand(100, 20)]
>>> y = np.random.randint(0, 2, 100)
>>> Xs_train, Xs_test, y_train, y_test = train_test_mm_split(Xs, y, train_size=0.7, random_state=42)