Utils

Convert dataset format

class imml.utils.convert_dataset_format(Xs: list, keys: list = None)[source]

Bases:

Convert the format of a multi-modal dataset. If it is a dict, it will be converted to dict, and if it is a list, it will be converted to dict.

Parameters:
  • Xs (list of array-likes objects) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different modalities.

  • keys (list, default=None) -- keys for the dict. If None, it will use numbers starting from 0. Only used when to_dict is True.

Returns:

transformed_Xs --

  • Xs length: n_mods

  • Xs[key] shape: (n_samples, n_features_i)

Return type:

dict of array-likes objects.

Examples

>>> from imml.utils.convert_dataset_format import convert_dataset_format    >>> import numpy as np
>>> import pandas as pd
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> convert_dataset_format(Xs = Xs)

Check Xs

class imml.utils.check_Xs(Xs, enforce_modalities=None, copy=False, ensure_all_finite='allow-nan', return_dimensions=False)[source]

Bases:

Checks Xs and ensures it to be a list of 2D matrices. Adapted from ̀mvlearn [1] [2] .

Parameters:
  • Xs (list of array-likes objects) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different modalities.

  • enforce_modalities (int, (default=not checked)) -- If provided, ensures this number of modalities in Xs. Otherwise not checked.

  • copy (boolean, (default=False)) -- If True, the returned Xs is a copy of the input Xs, and operations on the output will not affect the input. If False, the returned Xs is a modality of the input Xs, and operations on the output will change the input.

  • ensure_all_finite (bool or 'allow-nan', default='allow-nan') --

    Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:

    • True: Force all values of array to be finite.

    • False: accepts np.inf, np.nan, pd.NA in array.

    • 'allow-nan': accepts only np.nan and pd.NA values in array. Values cannot be infinite.

  • return_dimensions (boolean, (default=False)) -- If True, the function also returns the dimensions of the multi-modal dataset. The dimensions are n_mods, n_samples, n_features where n_samples and n_mods are respectively the number of modalities and the number of samples, and n_features is a list of length n_mods containing the number of features of each modality.

References

Returns:

  • Xs_converted (object) -- The converted and validated Xs (list of data arrays).

  • n_mods (int) -- The number of modalities in the dataset. Returned only if return_dimensions is True.

  • n_samples (int) -- The number of samples in the dataset. Returned only if return_dimensions is True.

  • n_features (list) -- List of length n_mods containing the number of features in each modality. Returned only if return_dimensions is True.