Utils¶
Convert dataset format¶
- class imml.utils.convert_dataset_format(Xs: list, keys: list = None)[source]¶
Bases:
Convert the format of a multi-modal dataset. If it is a dict, it will be converted to dict, and if it is a list, it will be converted to dict.
- Parameters:
- Returns:
transformed_Xs --
Xs length: n_mods
Xs[key] shape: (n_samples, n_features_i)
- Return type:
dict of array-likes objects.
Example
>>> from imml.utils.convert_dataset_format import convert_dataset_format >>> import numpy as np >>> import pandas as pd >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> convert_dataset_format(Xs = Xs)
Check Xs and y¶
- class imml.utils.check_Xs_y(Xs: list, y=None, modalities: list = None, mod_types: list = None, copy=False, ensure_all_finite='allow-nan', return_dimensions=False, supervised: bool = False)[source]¶
Bases:
Checks Xs and y and ensures they have the correct format.
- Parameters:
Xs (list of array-likes objects) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (array-like of shape (n_samples,), (default=None)) -- Target vector relative to X.
modalities (list of str, default=None) -- If provided, ensures the number of modalities. Otherwise not checked.
mod_types (list of str, default=None) -- If provided, ensures the type of modalities. Otherwise not checked.
copy (boolean, default=False) -- If True, the returned Xs is a copy of the input Xs, and operations on the output will not affect the input. If False, the returned Xs is a modality of the input Xs, and operations on the output will change the input.
ensure_all_finite (bool or 'allow-nan', default='allow-nan') --
Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:
True: Force all values of array to be finite.
False: accepts np.inf, np.nan, pd.NA in array.
'allow-nan': accepts only np.nan and pd.NA values in array. Values cannot be infinite.
return_dimensions (boolean, default=False) -- If True, the function also returns the dimensions of the multi-modal dataset. The dimensions are n_mods, n_samples, n_features where n_samples and n_mods are respectively the number of modalities and the number of samples, and n_features is a list of length n_mods containing the number of features of each modality.
supervised (bool, default=False) -- If True, it checks y.
- Returns:
Xs_converted (object) -- The converted and validated Xs (list of data arrays).
n_mods (int) -- The number of modalities in the dataset. Returned only if
return_dimensionsisTrue.n_samples (int) -- The number of samples in the dataset. Returned only if
return_dimensionsisTrue.n_features (list) -- List of length
n_modscontaining the number of features in each modality. Returned only ifreturn_dimensionsisTrue.