Partition utils#

Utility Functions for Ex-Fuzzy Library

This module provides utility functions that support fuzzy system operations but are not fuzzy-specific themselves. The main focus is on quantile computation for fuzzy partitions, data preprocessing, and helper functions for fuzzy variable creation.

Main Components:

Quantile computation functions for different partition schemes
Fuzzy variable creation utilities
Data preprocessing and partitioning helpers
Membership computation and precomputation utilities
Support for temporal fuzzy systems

These utilities are essential for creating well-distributed fuzzy partitions and preparing data for fuzzy system operations.

ex_fuzzy.utils.quartile_compute(x)[source]#

Compute quartiles for each feature in the dataset.

This function calculates the 0%, 25%, 50%, and 100% quantiles (min, Q1, median, max) for each feature in the input array. These quartiles are commonly used for creating three-partition fuzzy variables.

Parameters:

x (np.array) – Input data array with shape (samples, features)

Returns:

Array of quartiles with shape (4, n_features) where each column: contains [min, Q1, median, max] for the corresponding feature

Return type:

list[float]

Example

>>> data = np.array([[1, 10], [2, 20], [3, 30], [4, 40]])
>>> quartiles = quartile_compute(data)
>>> print(quartiles)
[[1. 10.]   # Min values
 [1.75 17.5] # Q1 values
 [2.5 25.]   # Median values
 [4. 40.]]   # Max values

ex_fuzzy.utils.fixed_quantile_compute(x)[source]#

Compute a fixed set of quantiles for each feature in the dataset.

This function calculates a predefined set of quantiles optimized for creating well-distributed fuzzy partitions. The quantiles are: [0, 0.20, 0.30, 0.45, 0.55, 0.7, 0.8, 1].

Parameters:

x (np.array) – Input data array with shape (samples, features)

Returns:

Array of quantiles with shape (8, n_features) where each column: contains the 8 quantile values for the corresponding feature

Return type:

list[float]

Example

>>> data = np.random.normal(0, 1, (100, 2))
>>> quantiles = fixed_quantile_compute(data)
>>> print(quantiles.shape)
(8, 2)

Note

This quantile scheme is particularly useful for creating fuzzy variables with overlapping membership functions that provide good coverage of the data space.

ex_fuzzy.utils.partition3_quantile_compute(x)[source]#

Compute quantiles for three-partition fuzzy variables.

This function calculates quantiles specifically designed for creating three-partition fuzzy variables (typically low, medium, high). The quantiles are: [0.00, 0.20, 0.50, 0.80, 1.00].

Parameters:

x (np.array) – Input data array with shape (samples, features)

Returns:

Array of quantiles with shape (5, n_features) where each column: contains the 5 quantile values for the corresponding feature

Return type:

list[float]

Example

>>> data = np.random.uniform(0, 100, (1000, 3))
>>> quantiles = partition3_quantile_compute(data)
>>> print(quantiles.shape)
(5, 3)

Note

These quantiles are specifically chosen to create three overlapping trapezoidal membership functions that provide balanced coverage of the data range with appropriate overlap between adjacent fuzzy sets.

ex_fuzzy.utils.t1_simple_partition(x)[source]#

Partitions the fuzzy variable in four trapezoidal memberships.

Parameters:: x (array) – numpy array, vector of shape (samples, ).
Returns:: numpy array, vector of shape (variables, 4, 4).
Return type:: array

ex_fuzzy.utils.t1_simple_gaussian_partition(x)[source]#

Partitions the fuzzy variable in four Gaussian memberships.

Parameters:: x (array) – numpy array, vector of shape (samples, ).
Returns:: numpy array, vector of shape (variables, 4, 2) where the last dimension contains [mean, standard_deviation] for each Gaussian membership function.
Return type:: array

ex_fuzzy.utils.compute_quantiles(x, n_partitions)[source]#

Computes the quantiles needed for n-partition fuzzy membership.

Parameters:

x – numpy array, vector of shape (samples, ).
n_partitions – int, number of partitions.

Returns:

numpy array, quantiles for partitioning.

ex_fuzzy.utils.t1_n_partition_parameters(x, n_partitions)[source]#

Partitions the fuzzy variable in n trapezoidal memberships.

Parameters:

x – numpy array, matrix of shape (samples, variables).
n_partitions – int, number of partitions.

Returns:

numpy array, tensor of shape (variables, n_partitions, 4) containing the trapezoidal parameters.

ex_fuzzy.utils.t1_n_gaussian_partition_parameters(x, n_partitions)[source]#

Partitions the fuzzy variable in n Gaussian memberships.

Parameters:

x (array) – numpy array, matrix of shape (samples, variables).
n_partitions (int) – int, number of partitions.

Returns:

numpy array, tensor of shape (variables, n_partitions, 2) where the last dimension contains [mean, standard_deviation] for each Gaussian membership function.

Return type:

array

ex_fuzzy.utils.t1_three_partition(x)[source]#

Partitions the fuzzy variable in three trapezoidal memberships.

Parameters:: x (array) – numpy array, vector of shape (samples, ).
Returns:: numpy array, vector of shape (variables, 3, 4).
Return type:: array

ex_fuzzy.utils.t2_n_partition_parameters(x, n_partitions)[source]#

Partitions the fuzzy variable in n trapezoidal memberships.

Parameters:

x – numpy array, matrix of shape (samples, variables).
n_partitions – int, number of partitions.

Returns:

numpy array, tensor of shape (variables, n_partitions, 4, 2) containing the trapezoidal parameters.

ex_fuzzy.utils.t1_simple_triangular_partition_parameters(x)[source]#

Partitions the fuzzy variable in three triangular memberships.

Parameters:: x (array) – numpy array, vector of shape (samples, ).
Returns:: numpy array, vector of shape (variables, 3, 3).
Return type:: array

ex_fuzzy.utils.t1_simple_triangular_partition(x, n_partitions=3)[source]#

Partitions the dataset features into different fuzzy variables. Parameters are prefixed. Use it for simple testing and initial solution.

Parameters:: x (array) – numpy array|pandas dataframe, shape samples x features.
Returns:: list of fuzzy variables.
Return type:: list[array]

ex_fuzzy.utils.t1_fuzzy_partitions_dataset(x0, n_partition=3, shape='trapezoid')[source]#

Partitions the dataset features into different fuzzy variables. Parameters are prefixed. Use it for simple testing and initial solution.

Parameters:

x – numpy array|pandas dataframe, shape samples x features.
n_partition – number of partitions to use in the fuzzy variables.

Returns:

list of fuzzy variables.

Return type:

list[fuzzyVariable]

ex_fuzzy.utils.t2_fuzzy_partitions_dataset(x0, n_partition=3, shape='trapezoid')[source]#

Partitions the dataset features into different fuzzy variables using iv fuzzy sets. Parameters are prefixed. Use it for simple testing and initial solution.

Parameters:

x – numpy array|pandas dataframe, shape samples x features.
n_partition – number of partitions to use in the fuzzy variables.

Returns:

list of fuzzy variables.

Return type:

list[fuzzyVariable]

ex_fuzzy.utils.gt2_fuzzy_partitions_dataset(x0, resolution_exp=2, n_partition=3, shape='trapezoid')[source]#

Partitions the dataset features into different fuzzy variables using gt2 fuzzy sets. Parameters are prefixed. Use it for simple testing and initial solution.

Parameters:

x – numpy array|pandas dataframe, shape samples x features.
resolution_exp (int) – exponent of the resolution of the partition. Default is 2, which means 0.01. (Number of significant decimals)
n_partition – number of partitions to use in the fuzzy variables.

Returns:

list of fuzzy variables.

Return type:

list[fuzzyVariable]

ex_fuzzy.utils.construct_partitions(X, fz_type_studied=FUZZY_SETS.t1, categorical_mask=None, n_partitions=3, shape='trapezoid')[source]#

Create a list of fuzzy variables from data with automatic partitioning.

This function automatically creates fuzzy variables from the input data by computing appropriate quantiles and creating membership functions. It supports both numerical and categorical variables.

Parameters:

X (np.array or pd.DataFrame) – Input data with shape (samples, features). Can be either a numpy array or pandas DataFrame.
fz_type_studied (fs.FUZZY_SETS) – Type of fuzzy sets to create (t1, t2, or gt2).
categorical_mask (list, optional) – Boolean mask indicating which variables are categorical. If None, all variables are treated as numerical.
n_partitions (int, optional) – Number of partitions (fuzzy sets) to create for each numerical variable. Default is 3.

Returns:

List of fuzzyVariable objects, one for each feature in the input data.: Each variable contains the appropriate number of fuzzy sets with membership functions fitted to the data distribution.

Return type:

list

Example

>>> X = np.random.normal(0, 1, (100, 3))
>>> lvs = create_fuzzy_variables(X, fs.FUZZY_SETS.t1, n_partitions=3)
>>> print(len(lvs))  # Should print 3
>>> print(len(lvs[0]))  # Should print 3 (number of fuzzy sets)

Note

For numerical variables, trapezoidal membership functions are created
For categorical variables, one fuzzy set is created per unique category
The function automatically handles feature naming from DataFrame columns

ex_fuzzy.utils.construct_crisp_categorical_partition(x, name, fz_type_studied)[source]#

Creates a fuzzy variable for a categorical feature.

Parameters:: x (array) – array with values of the categorical variable.

:param name of the fuzzy variable. :param fz_type_studied: fuzzy set type studied. :return: a fuzzy variable that works as a categorical crips variable (each fuzzy set is 1 exactly on each class value, and 0 on the rest).

ex_fuzzy.utils.construct_conditional_frequencies(X, discrete_time_labels, initial_ffss)[source]#

Computes the conditional temporal function for a set of fuzzy sets according to their variation in time.

Parameters:

X (array) – numpy array, shape samples x features.
discrete_time_labels (list[int]) – discrete time labels.
initial_fs – initial fuzzy set list.

Returns:

conditional frequencies. Array shape (time steps, initial fuzzy sets)

ex_fuzzy.utils.classify_temp(dates, cutpoints, time)[source]#

Classifies a set of dates according to the temporal cutpoints. Uses {time} as a the time resolution. Returns an array where true values are those values contained between those two date points.

Parameters:

dates (DataFrame) – data observations to cut.
cutpoints (tuple[str, str]) – points to check.
time (str) – time field to use as the criteria.

Returns:

boolean array. True values are those contained between the cutpoints.

Return type:

array

ex_fuzzy.utils.assign_time(a, observations)[source]#

Assigns a temporal moment to a set of observations.

Parameters:

a (array) – array of boolean values.
observations (list[array]) – list of boolean arrays with the corresponding timestamps.

Returns:

the index of the correspondent time moment for the a-th observation.

Raises:

ValueError if a is not timestamped in any of the observation arrays.

Return type:

int

ex_fuzzy.utils.create_tempVariables(X_train, time_moments, precomputed_partitions)[source]#

Creates a list of temporal fuzzy variables.

Parameters:

X_train (array) – numpy array, shape samples x features.
time_moments (array) – time moments. Array shape (samples,). Each value is an integer denoting the n-th time moment of that observation.
precomputed_partitions (list[fuzzyVariable]) – precomputed partitions for each feature.

Returns:

list of temporal fuzzy variables.

Return type:

list[temporalFS]

ex_fuzzy.utils.create_multi_tempVariables(X_train, time_moments, fuzzy_type)[source]#

Creates a of list of lists of temporal fuzzy variables. Each corresponds to a fuzzy partition in a different moment in time. (So, instead of having one vl for all time moments, you have one different for each time moment that represents the same idea)

Parameters:

X_train (array) – numpy array, shape samples x features.
time_moments (array) – time moments. Array shape (samples,). Each value is an integer denoting the n-th time moment of that observation.
precomputed_partitions – precomputed partitions for each feature.

Returns:

list of lists of temporal fuzzy variables.

Return type:

list[list[temporalFS]]

ex_fuzzy.utils.temporal_cuts(X, cutpoints, time_resolution='hour')[source]#

Returns a list of boolean indexes for each temporal moment. Performs the cuts between time steps using the cutpoints list.

Parameters:

X (DataFrame) – data observations to cut in temrporal moments.
temporal_moments – list of temporal moments to cut.
cutpoints (list[tuple[str, str]]) – list of tuples with the cutpoints for each temporal moment.
time_resolution (str) – time field to use as the criteria.

Returns:

list of boolean arrays. True values are those contained between the cutpoints in each moment.

Return type:

list[array]

ex_fuzzy.utils.temporal_assemble(X, y, temporal_moments)[source]#

Assembles the data in the temporal moments in order to have partitions with balanced time moments in each one.

Parameters:

X (array) – data observations.
y (array) – labels.
temporal_moments (list[array]) – list of boolean arrays. True values are those contained between the cutpoints in each moment.

Returns:

tuple of lists of data and labels for each temporal moment. First tuple is: X_train, X_test, y_train, y_test Second tuple is: train temporal moments, test temporal moments.

ex_fuzzy.utils.extend_fuzzy_sets_enum(new_fuzzy_sets_enum)[source]#

Extends the fuzzy sets enum with additional types.

Parameters:: fuzzy_sets_enum – fuzzy sets enum.
Returns:: extended fuzzy sets enum.
Return type:: list[FUZZY_SETS]

ex_fuzzy.utils.mcc_loss(ruleBase, X, y, tolerance, alpha=0.99, beta=0.0125, gamma=0.0125, precomputed_truth=None)[source]#

Fitness function for the optimization problem. Uses only the MCC, ignores the size penalization terms.

Parameters:

ruleBase (RuleBase) – RuleBase object
X (array) – array of train samples. X shape = (n_samples, n_features)
y (array) – array of train labels. y shape = (n_samples,)
tolerance (float) – float. Tolerance for the size evaluation.
alpha (float) – ignored.
beta (float) – ignored.
gamma (float) – ignored.

Returns:

float. Fitness value.

Return type:

float

ex_fuzzy.utils.validate_partitions(X, fuzzy_partitions, categorical_mask=None, verbose=False)[source]#

Validates the partitions of the fuzzy variables. Checks that the partitions are valid and that they cover the whole range of the data.

Parameters:

X – numpy array, shape samples x features.
fuzzy_partitions (list[fuzzyVariable]) – list of fuzzy variables.
categorical_mask (array) – boolean mask vector that indicates for each variable if its categorical or not.

Returns:

True if the partitions are valid, False otherwise.

Return type:

list[bool]

Partition utils#

This Page