Preprocessing¶
Drop Modality¶
- class imml.preprocessing.DropMod(X_idx: int = 0)[source]¶
Bases:
FunctionTransformerA transformer that drops a specified modality from a multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with drop_mod as a function.
- Parameters:
X_idx (int, default=0) -- The index of the modality to drop from the input data.
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import DropMod >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> transformer = DropMod(X_idx = 1) >>> transformer.fit_transform(Xs)
- class imml.preprocessing.drop_mod(Xs, X_idx: int = 0)[source]¶
A function that drops a specified modality from a multi-modal dataset.
- Parameters:
- Returns:
transformed_Xs -- The transformed multi-modal dataset.
- Return type:
array-like, shape (n_samples, n_features)
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import drop_mod >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> drop_mod(Xs=Xs, X_idx = 1)
Concatenate Modalities¶
- class imml.preprocessing.ConcatenateMods[source]¶
Bases:
FunctionTransformerA transformer that concatenates all modalities from a multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with concatenate_mods as a function.
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import ConcatenateMods >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> transformer = ConcatenateMods() >>> transformer.fit_transform(Xs)
- class imml.preprocessing.concatenate_mods(Xs: list)[source]¶
A function that concatenate all features from a multi-modal dataset.
- Parameters:
Xs (list of array-likes objects) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)
A list of different mods.
- Returns:
transformed_Xs -- The transformed dataset.
- Return type:
array-like, shape (n_samples, n_features)
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import concatenate_mods >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> concatenate_mods(Xs=Xs)
Single Modality¶
- class imml.preprocessing.SingleMod(X_idx: int = 0)[source]¶
Bases:
FunctionTransformerTransformer that selects a single modality from multi-modal data. Apply FunctionTransformer (from Scikit-learn) with single_mod as a function.
- Parameters:
X_idx (int, default=0) -- The index of the modality to select from the input data.
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import SingleMod >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> transformer = SingleMod(X_idx = 1) >>> transformer.fit_transform(Xs)
- class imml.preprocessing.single_mod(Xs, X_idx: int = 0)[source]¶
A function that selects a specified modality from a multi-modal dataset.
- Parameters:
- Returns:
transformed_Xs -- The transformed dataset.
- Return type:
array-like, shape (n_samples, n_features)
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import single_mod >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> single_mod(Xs=Xs, X_idx = 1)
Add Missing Modalities¶
- class imml.preprocessing.AddMissingMods(samples: Index)[source]¶
Bases:
FunctionTransformer- Transformer to add missing samples in each modality, in a way that all the modalities will have the same samples.
Apply FunctionTransformer (from Scikit-learn) with add_missing_mods as a function.
This transformer is applied on individual modalities, so for applying in a multi-modal dataset, we recommend to use it with MultiModTransformer.
- Parameters:
samples (array-like (n_samples,)) -- pd.Index with all samples
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import AddMissingMods >>> from imml.explore import get_samples >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> samples = get_samples(Xs= Xs) >>> transformer = AddMissingMods(samples= samples) >>> transformer.fit_transform(Xs)
- class imml.preprocessing.add_missing_mods(Xs, samples)[source]¶
Add missing samples in each modality, in a way that all the modalities will have the same samples.
- Parameters:
Xs (list of array-likes objects) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)
A list of different mods.
- Returns:
transformed_Xs -- The transformed multi-modal dataset.
- Return type:
array-like, shape (n_samples, n_features)
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import add_missing_mods >>> from imml.explore import get_samples >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> samples = get_samples(Xs= Xs) >>> add_missing_mods(Xs, samples= samples)
Sort Data¶
- class imml.preprocessing.SortData[source]¶
Bases:
FunctionTransformerTransformer that establish and assess the order of the incomplete multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with sort_data as a function.
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import SortData >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> transformer = SortData() >>> transformer.fit_transform(Xs)
- class imml.preprocessing.sort_data(Xs: list)[source]¶
A function that establish and assess the order of the incomplete multi-modal dataset.
- Parameters:
Xs (list of array-likes objects) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
- Returns:
transformed_X -- The transformed multi-modal dataset.
- Return type:
list of array-likes objects (n_samples, n_features_i)
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import sort_data >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> sort_data(Xs=Xs)
Multi-Modal Transformer¶
- class imml.preprocessing.MultiModTransformer(transformer)[source]¶
Bases:
BaseEstimator,TransformerMixinA transformer that applies the same transformation to multiple modalities of data.
- Parameters:
transformer (scikit-learn transformer object or list of scikit-learn transformer object) -- A scikit-learn transformer object that will be used to transform each modality of data. If a list is provided, each transformer will be applied on each modality, otherwise the same transformer will be applied on each modality.
- transformer_list_¶
A list of preprocessing, one for each modality of data.
- Type:
list of preprocessing (n_mods,)
- same_transformer_¶
A booleaing indicating if the same transformer will be applied on each modality of data.
- Type:
boolean
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import MultiModTransformer >>> from sklearn.impute import SimpleImputer >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> transformer = MultiModTransformer(transformer = SimpleImputer.set_output(transform = 'pandas')) >>> transformer.fit_transform(Xs)
- fit(Xs, y=None)[source]¶
Fit the transformer to the input data.
- Parameters:
Xs (list of array-likes objects) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (array-like, shape (n_samples,)) -- Labels for each sample. Only used by supervised algorithms.
- Returns:
self
- Return type:
returns an instance of self.
- transform(Xs)[source]¶
Transform the input data using the transformers.
- Parameters:
Xs (list of array-likes objects) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
- Returns:
transformed_Xs -- A list of transformed mods of data, one for each input modality.
- Return type:
list of array-likes objects, shape (n_samples, n_features_i)
- set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') MultiModTransformer¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_transform_request(*, Xs: bool | None | str = '$UNCHANGED$') MultiModTransformer¶
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Normalizer NaN¶
- class imml.preprocessing.NormalizerNaN(norm: str = 'l2')[source]¶
Bases:
NormalizerSimilar to sklearn.preprocessing.Normalizer but handles NaN values.
- Parameters:
norm ({‘l1’, ‘l2’, ‘max’}, default=’l2’) -- The norm to use to normalize each non zero sample. If norm=’max’ is used, values will be rescaled by the maximum of the absolute values.
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.ampute import Amputer >>> from imml.preprocessing import NormalizerNaN, MultiModTransformer >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> amp = Amputer(p=0.3) >>> Xs = amp.fit_transform(Xs) >>> transformer = MultiModTransformer(NormalizerNaN()) >>> transformer.fit_transform(Xs)
- fit(X, y=None)[source]¶
Fit the transformer to the input data.
- Parameters:
X (array-like of shape (n_samples, n_features)) -- Training vector, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) -- Not used, present here for API consistency by convention.
- Returns:
self
- Return type:
returns an instance of self.
- transform(X, y=None)[source]¶
Scale each non zero row of X to unit norm.
- Parameters:
X (array-like of shape (n_samples, n_features)) -- Training vector, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) -- Not used, present here for API consistency by convention.
- Returns:
transformed_X -- Transformed data.
- Return type:
array-like of shape (n_samples, n_features)
Select Complete Samples¶
- class imml.preprocessing.SelectCompleteSamples[source]¶
Bases:
FunctionTransformerRemove incomplete samples from a multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with select_complete_samples as a function.
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import SelectCompleteSamples >>> from imml.ampute import Amputer >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs) >>> transformer = SelectCompleteSamples() >>> transformer.fit_transform(Xs)
- class imml.preprocessing.select_complete_samples(Xs: list)[source]¶
Remove incomplete samples from a multi-modal dataset.
- Parameters:
Xs (list of array-likes objects) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features)
A list of different mods.
- Returns:
transformed_Xs -- The transformed data.
- Return type:
list of array-likes objects, shape (n_samples, n_features_i)
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import select_complete_samples >>> from imml.ampute import Amputer >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs) >>> select_complete_samples(Xs)
Select Incomplete Samples¶
- class imml.preprocessing.SelectIncompleteSamples[source]¶
Bases:
FunctionTransformerRemove complete samples from a multi-modal dataset. Apply FunctionTransformer (from Scikit-learn) with select_incomplete_samples as a function.
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import SelectIncompleteSamples >>> from imml.ampute import Amputer >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs) >>> transformer = SelectIncompleteSamples() >>> transformer.fit_transform(Xs)
- class imml.preprocessing.select_incomplete_samples(Xs: list)[source]¶
Remove complete samples from a multi-modal dataset.
- Parameters:
Xs (list of array-likes objects) --
Xs length: n_mods
Xs[i] shape: (n_samples, n_features)
A list of different mods.
- Returns:
transformed_Xs -- The transformed data.
- Return type:
list of array-likes objects, shape (n_samples, n_features_i)
Example
>>> import numpy as np >>> import pandas as pd >>> from imml.preprocessing import select_incomplete_samples >>> from imml.ampute import Amputer >>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)] >>> Xs = Amputer(p=0.2, mechanism="mcar", random_state=42).fit_transform(Xs) >>> select_incomplete_samples(Xs)