Explore

Get summary

class imml.explore.get_summary(Xs: list, mod_names: list = None, one_row: bool = False, compute_pct: bool = True, return_df: bool = False)[source]

Bases:

Get a summary of an incomplete multi-modal dataset.

Parameters:
  • Xs (list of array-likes objects) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different modalities.

  • mod_names (list, default=None) -- Name of each modality. By default, it will be set to the modality index. Only applicable when one_row is False.

  • one_row (bool, default=False) -- If True, return a one-row summary of the dataset. If False, each row will correspond to a modality.

  • compute_pct (bool, default=True) -- If True, compute percent of each value.

  • return_df (bool, default=False) -- If True, it will return a pd.DataFrame. It returns a dict otherwise.

Returns:

summary -- Summary of a multi-modal dataset.

Return type:

dict or pd.DataFrame

See also

plot_summary

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_summary
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> get_summary(Xs = Xs)

Get number of modalities

class imml.explore.get_n_mods(Xs: list)[source]

Bases:

Get the number of modalities of a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

n_mods -- Number of modalities.

Return type:

int

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_n_mods
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> get_n_mods(Xs = Xs)

Get number of samples by modalities

class imml.explore.get_n_samples_by_mod(Xs: list)[source]

Bases:

Get the number of samples in each modality.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

n_samples_by_mod -- Number of samples in each modality.

Return type:

pd.Series

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_n_samples_by_mod
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_n_samples_by_mod(Xs = Xs)

Get complete samples

class imml.explore.get_com_samples(Xs: list)[source]

Bases:

Get name (index) of complete samples in a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

samples -- Sample names with full data.

Return type:

pd.Index

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_com_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_com_samples(Xs = Xs)

Get incomplete samples

class imml.explore.get_incom_samples(Xs: list)[source]

Bases:

Get name (index) of incomplete samples in a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

samples -- Sample names with incomplete data.

Return type:

pd.Index

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_incom_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_incom_samples(Xs = Xs)

Get samples

class imml.explore.get_samples(Xs: list)[source]

Bases:

Get name (index) of samples in a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples_i, n_features_i)

A list of different modalities.

Returns:

samples -- Sample names.

Return type:

pd.Index (n_samples,)

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_samples(Xs = Xs)

Get samples by modality

class imml.explore.get_samples_by_mod(Xs: list, return_as_list: bool = True)[source]

Bases:

Get the samples for each modality in a multi-modal dataset.

Parameters:
  • Xs (list of array-likes objects) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different modalities.

  • return_as_list (bool, default=True) -- If True, the function will return a list; a dict otherwise.

Returns:

samples -- If list, each element in the list is the sample names for each modality. If dict, keys are the modalities and the values are the sample names.

Return type:

list or dict of pd.Index

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_samples_by_mod
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_samples_by_mod(Xs = Xs)

Get missing samples by modality

class imml.explore.get_missing_samples_by_mod(Xs: list, return_as_list: bool = True)[source]

Bases:

Get the samples not present in each modality in a multi-modal dataset.

Parameters:
  • Xs (list of array-likes objects) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different modalities.

  • return_as_list (bool, default=True) -- If list, each element in the list is the sample names for each modality. If dict, keys are the modalities and the values are the sample names.

Returns:

samples -- Dictionary or list of missing samples for each modality.

Return type:

dict of pd.Index or list of pd.Index.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_missing_samples_by_mod
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_missing_samples_by_mod(Xs = Xs)

Get number of complete samples

class imml.explore.get_n_com_samples(Xs: list)[source]

Bases:

Get the number of complete samples in a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

n_samples -- number of complete samples.

Return type:

int

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_n_com_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_n_com_samples(Xs = Xs)

Get number of incomplete samples

class imml.explore.get_n_incom_samples(Xs: list)[source]

Bases:

Get the number of incomplete samples in a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

n_samples -- number of incomplete samples.

Return type:

int

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_n_incom_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_n_incom_samples(Xs = Xs)

Get percentage of complete samples

class imml.explore.get_pct_com_samples(Xs: list)[source]

Bases:

Get the percentage of complete samples in a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

percentage_samples -- percentage of complete samples.

Return type:

float

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_pct_com_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_pct_com_samples(Xs = Xs)

Get percentage of incomplete samples

class imml.explore.get_pct_incom_samples(Xs: list)[source]

Bases:

Get the percentage of incomplete samples in a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

percentage_samples -- percentage of incomplete samples.

Return type:

float

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.explore import get_pct_incom_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> get_pct_incom_samples(Xs = Xs)