Feature selection¶

JNMF Feature Selector¶

class imml.feature_selection.JNMFFeatureSelector(select_by: str = 'component', f_per_component: int = 1, **kwargs)[source]¶

Bases: JNMF

Feature selection for multi-modal datasets using the Joint Non-negative Matrix Factorization (JNMF) method. [1] [2] [3] [4] [5] [6] [7] [8]

This class extends the functionality of the JNMF method to perform feature selection across multiple modalities or blocks of data. The selected features are those with the highest contributions to the derived components from JNMF. This feature selection can be based on either the largest contribution for each component, the maximum overall contribution, or the average contribution across all components.

Parameters:

select_by (str, default="component") --
Criterion used to select features. Must be one of ["component", "max", "average"]:
- "component": Selects the feature with the largest contribution for each component.
- "max": Selects the features with the largest overall contribution.
- "average": Selects the features with the highest average contribution across all components.
f_per_component (int, default=1) --
Number of features to select per component.
- If select_by="component", this controls how many features are selected for each component.
- If select_by="max", the top n_components * f_per_component features across all components are selected.
- If select_by="average", it selects n_components * f_per_component features with the highest average contribution for each component.
kwargs (dict) -- Arguments passed to the JNMF method.

selected_features_¶

List of selected features.

Type:: list of str of shape (n_components * f_per_component,)

weights_¶

The importance or contribution scores of the selected features in absolute values. These scores reflect how strongly each feature contributes to the components derived from JNMF.

Type:: list of float of shape (n_components * f_per_component,)

References

See also

JNMF, JNMFImputer

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.feature_selection import JNMFFeatureSelector
>>> Xs = [pd.DataFrame(np.random.default_rng(42).uniform(size=(20, 10))) for i in range(3)]
>>> transformer = JNMFFeatureSelector(n_components = 5)
>>> transformed_Xs = transformer.fit_transform(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

returns an instance of self.

transform(Xs)[source]¶

Project data into the learned space.

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

transformed_Xs -- The projected data.

Return type:

list of array-likes objects, shape (n_samples, n_components)

fit_transform(Xs, y=None, **fit_params)[source]¶

Fit to data, then transform it.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples_i, n_features_i)
A list of different mods.
y (Ignored) -- Not used, present here for API consistency by convention.
fit_params (Ignored) -- Not used, present here for API consistency by convention.

Returns:

transformed_X -- The projected data.

Return type:

array-likes objects of shape (n_samples, n_components)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → JNMFFeatureSelector¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_transform_request(*, Xs: bool | None | str = '$UNCHANGED$') → JNMFFeatureSelector¶

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in transform.
Returns:: self -- The updated object.
Return type:: object