Model selection¶
Multi-Modal Splitter¶
- class imml.model_selection.MMSplitter(splitter, return_type: str = 'sets')[source]¶
Generic bridge between scikit-learn splitters and multi-modal inputs.
This helper receives any scikit-learn splitter (such as StratifiedKFold) and yields splits. A single set of train/test indices is computed by the splitter and applied to every modality, guaranteeing aligned partitions across all modalities.
- Parameters:
splitter (object) -- Any object implementing scikit-learn's splitter interface, for example
KFold,StratifiedKFold,GroupKFoldorShuffleSplit.return_type (str, default="sets") -- Controls what each yielded item contains: "sets" returns the actual partition sets, while "indices" return the indices of the partition sets.
Example
>>> import numpy as np >>> from sklearn.model_selection import StratifiedKFold >>> from imml.model_selection import MMSplitter >>> Xs = [np.random.rand(100, 10), np.random.rand(100, 20)] >>> y = np.random.randint(0, 2, 100) >>> splitter = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) >>> for Xs_train, Xs_test, y_train, y_test in MMSplitter(splitter=splitter).split(Xs, y): ... pass
- get_n_splits(X=None, y=None, groups=None)[source]¶
Returns the number of splitting iterations as set with the n_splits param.
- Parameters:
Xs (Always ignored, exists for API compatibility.)
y (Always ignored, exists for API compatibility.)
groups (Always ignored, exists for API compatibility.)
- Returns:
n_splits -- Returns the number of splitting iterations.
- Return type:
- split(Xs, y=None, groups=None)[source]¶
Generate indices to split data into training and test set.
- Parameters:
Xs (list of array-like) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (array-like of shape (n_samples,), optional) -- Target vector relative to Xs.
groups (array-like, optional) -- Group labels passed to
splitter.split(...).
- Returns:
One tuple per split according to
return_type.- Return type:
Train-Test Multi-Modal Split¶
- imml.model_selection.train_test_mm_split(Xs, y=None, **kwargs)[source]¶
Split multi-modal datasets and labels into train and test sets.
Similar to sklearn's train_test_split, but works with lists of arrays/data (Xs) and single arrays (y). Ensures that all X in a Xs get the same train/test split indices.
- Parameters:
- Returns:
Splitting results in the same order as inputs: - For each list input (Xs): (list_train, list_test) - For each array input (y): (array_train, array_test)
- Return type:
Example
>>> import numpy as np >>> from imml.model_selection import train_test_mm_split >>> Xs = [np.random.rand(100, 10), np.random.rand(100, 20)] >>> y = np.random.randint(0, 2, 100) >>> Xs_train, Xs_test, y_train, y_test = train_test_mm_split(Xs, y, train_size=0.7, random_state=42)