Preprocessing

Drop Modality

class imml.preprocessing.DropMod(X_idx: int = 0)[source]

Bases: FunctionTransformer

A transformer that drops a specified modality from a multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with drop_mod as a function.

Parameters:

X_idx (int, default=0) -- The index of the modality to drop from the input data.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import DropMod
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> transformer = DropMod(X_idx = 1)
>>> transformer.fit_transform(Xs)
class imml.preprocessing.drop_mod(Xs, X_idx: int = 0)[source]

A function that drops a specified modality from a multi-modal dataset.

Parameters:
  • Xs (list of array-likes objects) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different mods.

  • X_idx (int, default=0) -- The index of the mod to drop from the input data.

Returns:

transformed_Xs -- The transformed multi-modal dataset.

Return type:

array-like, shape (n_samples, n_features)

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import drop_mod
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> drop_mod(Xs=Xs, X_idx = 1)

Concatenate Modalities

class imml.preprocessing.ConcatenateMods[source]

Bases: FunctionTransformer

A transformer that concatenates all modalities from a multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with concatenate_mods as a function.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import ConcatenateMods
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> transformer = ConcatenateMods()
>>> transformer.fit_transform(Xs)
class imml.preprocessing.concatenate_mods(Xs: list)[source]

A function that concatenate all features from a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different mods.

Returns:

transformed_Xs -- The transformed dataset.

Return type:

array-like, shape (n_samples, n_features)

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import concatenate_mods
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> concatenate_mods(Xs=Xs)

Single Modality

class imml.preprocessing.SingleMod(X_idx: int = 0)[source]

Bases: FunctionTransformer

Transformer that selects a single modality from multi-modal data. Apply FunctionTransformer (from Scikit-learn) with single_mod as a function.

Parameters:

X_idx (int, default=0) -- The index of the modality to select from the input data.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import SingleMod
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> transformer = SingleMod(X_idx = 1)
>>> transformer.fit_transform(Xs)
class imml.preprocessing.single_mod(Xs, X_idx: int = 0)[source]

A function that selects a specified modality from a multi-modal dataset.

Parameters:
  • Xs (list of array-likes objects) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different mods.

  • X_idx (int, default=0) -- The index of the mod to select from the input data.

Returns:

transformed_Xs -- The transformed dataset.

Return type:

array-like, shape (n_samples, n_features)

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import single_mod
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> single_mod(Xs=Xs, X_idx = 1)

Add Missing Modalities

class imml.preprocessing.AddMissingMods(samples: Index)[source]

Bases: FunctionTransformer

Transformer to add missing samples in each modality, in a way that all the modalities will have the same samples.

Apply FunctionTransformer (from Scikit-learn) with add_missing_mods as a function.

This transformer is applied on individual modalities, so for applying in a multi-modal dataset, we recommend to use it with MultiModTransformer.

Parameters:

samples (array-like (n_samples,)) -- pd.Index with all samples

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import AddMissingMods
>>> from imml.explore import get_samples
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> samples = get_samples(Xs= Xs)
>>> transformer = AddMissingMods(samples= samples)
>>> transformer.fit_transform(Xs)
class imml.preprocessing.add_missing_mods(Xs, samples)[source]

Add missing samples in each modality, in a way that all the modalities will have the same samples.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different mods.

Returns:

transformed_Xs -- The transformed multi-modal dataset.

Return type:

array-like, shape (n_samples, n_features)

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import add_missing_mods
>>> from imml.explore import get_samples
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> samples = get_samples(Xs= Xs)
>>> add_missing_mods(Xs, samples= samples)

Sort Data

class imml.preprocessing.SortData[source]

Bases: FunctionTransformer

Transformer that establish and assess the order of the incomplete multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with sort_data as a function.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import SortData
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> transformer = SortData()
>>> transformer.fit_transform(Xs)
class imml.preprocessing.sort_data(Xs: list)[source]

A function that establish and assess the order of the incomplete multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

transformed_X -- The transformed multi-modal dataset.

Return type:

list of array-likes objects (n_samples, n_features_i)

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import sort_data
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> sort_data(Xs=Xs)

Multi-Modal Transformer

class imml.preprocessing.MultiModTransformer(transformer)[source]

Bases: BaseEstimator, TransformerMixin

A transformer that applies the same transformation to multiple modalities of data.

Parameters:

transformer (scikit-learn transformer object or list of scikit-learn transformer object) -- A scikit-learn transformer object that will be used to transform each modality of data. If a list is provided, each transformer will be applied on each modality, otherwise the same transformer will be applied on each modality.

transformer_list_

A list of preprocessing, one for each modality of data.

Type:

list of preprocessing (n_mods,)

same_transformer_

A booleaing indicating if the same transformer will be applied on each modality of data.

Type:

boolean

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import MultiModTransformer
>>> from sklearn.impute import SimpleImputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> transformer = MultiModTransformer(transformer = SimpleImputer.set_output(transform = 'pandas'))
>>> transformer.fit_transform(Xs)
fit(Xs, y=None)[source]

Fit the transformer to the input data.

Parameters:
  • Xs (list of array-likes objects) --

    • Xs length: n_mods

    • Xs[i] shape: (n_samples, n_features_i)

    A list of different modalities.

  • y (array-like, shape (n_samples,)) -- Labels for each sample. Only used by supervised algorithms.

Returns:

self

Return type:

returns an instance of self.

transform(Xs)[source]

Transform the input data using the transformers.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

transformed_Xs -- A list of transformed mods of data, one for each input modality.

Return type:

list of array-likes objects, shape (n_samples, n_features_i)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') MultiModTransformer

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.

Returns:

self -- The updated object.

Return type:

object

set_transform_request(*, Xs: bool | None | str = '$UNCHANGED$') MultiModTransformer

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in transform.

Returns:

self -- The updated object.

Return type:

object

Normalizer NaN

class imml.preprocessing.NormalizerNaN(norm: str = 'l2')[source]

Bases: Normalizer

Similar to sklearn.preprocessing.Normalizer but handles NaN values.

Parameters:

norm ({‘l1’, ‘l2’, ‘max’}, default=’l2’) -- The norm to use to normalize each non zero sample. If norm=’max’ is used, values will be rescaled by the maximum of the absolute values.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.ampute import Amputer
>>> from imml.preprocessing import NormalizerNaN, MultiModTransformer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> amp = Amputer(p=0.3)
>>> Xs = amp.fit_transform(Xs)
>>> transformer = MultiModTransformer(NormalizerNaN())
>>> transformer.fit_transform(Xs)
fit(X, y=None)[source]

Fit the transformer to the input data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) -- Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

returns an instance of self.

transform(X, y=None)[source]

Scale each non zero row of X to unit norm.

Parameters:
  • X (array-like of shape (n_samples, n_features)) -- Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

transformed_X -- Transformed data.

Return type:

array-like of shape (n_samples, n_features)

Select Complete Samples

class imml.preprocessing.SelectCompleteSamples[source]

Bases: FunctionTransformer

Remove incomplete samples from a multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with select_complete_samples as a function.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import SelectCompleteSamples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> transformer = SelectCompleteSamples()
>>> transformer.fit_transform(Xs)
class imml.preprocessing.select_complete_samples(Xs: list)[source]

Remove incomplete samples from a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features)

A list of different mods.

Returns:

transformed_Xs -- The transformed data.

Return type:

list of array-likes objects, shape (n_samples, n_features_i)

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import select_complete_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> select_complete_samples(Xs)

Select Incomplete Samples

class imml.preprocessing.SelectIncompleteSamples[source]

Bases: FunctionTransformer

Remove complete samples from a multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with select_incomplete_samples as a function.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import SelectIncompleteSamples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> transformer = SelectIncompleteSamples()
>>> transformer.fit_transform(Xs)
class imml.preprocessing.select_incomplete_samples(Xs: list)[source]

Remove complete samples from a multi-modal dataset.

Parameters:

Xs (list of array-likes objects) --

  • Xs length: n_mods

  • Xs[i] shape: (n_samples, n_features)

A list of different mods.

Returns:

transformed_Xs -- The transformed data.

Return type:

list of array-likes objects, shape (n_samples, n_features_i)

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.preprocessing import select_incomplete_samples
>>> from imml.ampute import Amputer
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs)
>>> select_incomplete_samples(Xs)