fasterrisk.wrapper

Classes

FasterRisk

Wrapper for FasterRisk algorithm

Functions

`shift_coefficients`(→ numpy.ndarray)	Shift coefficients to [0, inf) for better visualization, note that predicted risk remain unchanged so doesn't affect model performance.
`get_max_features`(→ Dict[str, float])	Get maximum features, helper for checking edge cases in risk score card

Module Contents

fasterrisk.wrapper.shift_coefficients(feature_offsets: pandas.DataFrame, features_and_betas: List) → numpy.ndarray

Shift coefficients to [0, inf) for better visualization, note that predicted risk remain unchanged so doesn’t affect model performance.

Parameters:

feature_offsets (pd.DataFrame) – dataframe containing feature with their corresponding offsets
features_and_betas (List) – list containing features and their corresponding coefficients

Returns:

shifted coefficients

Return type:

np.ndarray

fasterrisk.wrapper.get_max_features(X_train: pandas.DataFrame) → Dict[str, float]

Get maximum features, helper for checking edge cases in risk score card

Returns:: dictionary keyed by feature name and valued by the maximum value of the feature
Return type:: Dict[str, float]

class fasterrisk.wrapper.FasterRisk(k: int = 10, select_top_m: int = 50, lb: float = -5, ub: float = 5, gap_tolerance: float = 0.05, parent_size: int = 10, child_size: int = None, maxAttempts: int = 50, num_ray_search: int = 20, lineSearch_early_stop_tolerance: float = 0.001, group_sparsity: int = None, featureIndex_to_groupIndex: numpy.ndarray = None)

Bases: sklearn.base.BaseEstimator

Wrapper for FasterRisk algorithm

Parameters:

k (int) – sparsity constraint, equivalent to number of selected (binarized) features in the final sparse model(s)
select_top_m (int, optional) – number of top solutions to keep among the pool of diverse sparse solutions, by default 50
lb (float, optional) – lower bound of the coefficients, by default -5
ub (float, optional) – upper bound of the coefficients, by default 5
gap_tolerance (float, optional) – tolerance in logistic loss for creating diverse sparse solutions, by default 0.05
parent_size (int, optional) – how many solutions to retain after beam search, by default 10
child_size (int, optional) – how many new solutions to expand for each existing solution, by default None
maxAttempts (int, optional) – how many alternative features to try in order to replace the old feature during the diverse set pool generation, by default None
num_ray_search (int, optional) – how many multipliers to try for each continuous sparse solution, by default 20
lineSearch_early_stop_tolerance (float, optional) – tolerance level to stop linesearch early (error_of_loss_difference/loss_of_continuous_solution), by default 0.001
group_sparsity (int, optional) – number of groups to be selected, by default None
featureIndex_to_groupIndex (ndarray, optional) – (1D array with int type) featureIndex_to_groupIndex[i] is the group index of feature i, by default None

multipliers_

multipliers used in the final diverse sparse models

Type:: List

beta0_

intercepts used in the final diverse sparse models

Type:: List

betas_

coefficients used in the final diverse sparse models

Type:: List

k

select_top_m

lb

ub

gap_tolerance

parent_size

child_size

maxAttempts

num_ray_search

lineSearch_early_stop_tolerance

group_sparsity

featureIndex_to_groupIndex

fit(X: numpy.ndarray, y: numpy.ndarray) → None

Train FasterRisk

Parameters:

X (np.ndarray) – training data
y (np.ndarray) – training data labels

predict_proba(X: numpy.ndarray, model_idx: int = 0) → numpy.ndarray[float]

make probability predictions for binary classification

Parameters:

X (np.ndarray) – input data
model_idx (int, optional) – used to specify which model to use (ranked by increasing order of logistic loss) among diverse sparse models, by default 0, which is the model with minimum logistic loss

Returns:

probability predictions for binary classification

Return type:

np.ndarray[float]

predict(X: numpy.ndarray, model_idx: int = 0) → numpy.ndarray[int]

make bianry prediction

Parameters:

X (np.ndarray) – input data
model_idx (int, optional) – used to specify which model to use (ranked by increasing order of logistic loss) among diverse sparse models, by default 0, which is the model with minimum logistic loss

Returns:

binary predictions

Return type:

np.ndarray[float]

get_model_params() → Tuple[List[float], List[float], List[float]]

Get model parameters for FasterRisk

Return type:: Three lists of multipliers, beta0s (intercepts), and betas (coefficients)

print_risk_card(feature_names: List[str], X_train: numpy.ndarray, y_train: numpy.ndarray = None, model_idx: int = 0, quantile_len: int = 30) → None

print risk score card

Parameters:

feature_names (list) – feature names for the features
X_train (np.ndarray)
y_train (np.ndarray, optional)
provided (if)
set (prints logistic loss on training)
model_idx (int, optional) – index for the classifier in the pool of solutions, by default 0, which is the classifier with minimum logistic loss

visualize score card as an image

Parameters:

names (List[str]) – feature names
X_train (np.ndarray | pd.DataFrame) – training data
title (str) – title of the entire score card
save_path (str, optional) – directory to save score card to, by default None
quantile_len (int, optional) – number of quantiles to use for score to risk conversion table, equivalent to number of cells in risk table, by default 30
custom_row_order (List[str], optional) – custom order for features in risk score card, by default None
center_box_width (int, optional) – width of the center box, by default 600
border_width (int, optional) – width of the border, by default 2

Returns:

if save_path is None, return the score card as an image

Return type:

Optional[Image.Image]

static define_bounds(X: pandas.DataFrame, feature_bound_pairs: Dict[str, Tuple[float | int, float | int]], lb_else: float | int, ub_else: float | int) → Tuple[List, List]

Obtain user defined bounds for each feature in X.

Parameters:

X (pd.DataFrame) – data of interest
feature_bound_pairs (Dict) – dictionary of feature names and their corresponding bounds, such as {‘feature_name’: (lb, ub), …}
lb_else (Union[float, int]) – lb bounds for all other features not specified by feature_bound_pairs, if all features are included in feature_bound_pairs, then this is lb for the intercept
ub_else (Union[float, int]) – ub bounds for all other features not specified by feature_bound_pairs, if all features are included in feature_bound_pairs, then this is up for the intercept

Returns:

lb_bounds, ub_bounds

Return type:

Tuple[List, List]