Model selection

Multi-Modal Splitter

imml.model_selection.MMSplitter(splitter, return_type: str = 'all')[source]

Generic bridge between scikit-learn splitters and multi-modal inputs.

This helper receives any scikit-learn splitter (such as StratifiedKFold) and yields splits. A single set of train/test indices is computed by the splitter and applied to every modality, guaranteeing aligned partitions across all modalities.

Parameters:
  • splitter (object) -- Any object implementing scikit-learn's splitter interface, for example KFold, StratifiedKFold, GroupKFold or ShuffleSplit.

  • return_type (str, default="split") -- Controls what each yielded item contains: "split" returns the actual partition sets, while "indices" return the indices of the partition sets.

Example

>>> import numpy as np
>>> from sklearn.model_selection import StratifiedKFold
>>> from imml.model_selection import MMSplitter
>>> Xs = [np.random.rand(100, 10), np.random.rand(100, 20)]
>>> y = np.random.randint(0, 2, 100)
>>> splitter = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
>>> for Xs_train, Xs_test, y_train, y_test in MMSplitter(splitter=splitter).split(Xs, y):
...     pass

Train-Test Multi-Modal Split

imml.model_selection.train_test_mm_split(Xs, y=None, **kwargs)[source]

Split multi-modal datasets and labels into train and test sets.

Similar to sklearn's train_test_split, but works with lists of arrays/data (Xs) and single arrays (y). Ensures that all X in a Xs get the same train/test split indices.

Parameters:
  • *args (list of array-likes or array-like) -- Variable number of inputs to split. Can be: - Lists of arrays (Xs): Multi-modal data where each element is a modality. - Single arrays (y): Labels.

  • **kwargs (dict) -- Additional keyword arguments to pass to sklearn's train_test_split.

Returns:

Splitting results in the same order as inputs: - For each list input (Xs): (list_train, list_test) - For each array input (y): (array_train, array_test)

Return type:

tuple

Example

>>> import numpy as np
>>> from imml.model_selection import train_test_mm_split
>>> Xs = [np.random.rand(100, 10), np.random.rand(100, 20)]
>>> y = np.random.randint(0, 2, 100)
>>> Xs_train, Xs_test, y_train, y_test = train_test_mm_split(Xs, y, train_size=0.7, random_state=42)