fasterrisk.wrapper

Classes

FasterRisk

Wrapper for FasterRisk algorithm

Functions

shift_coefficients(→ numpy.ndarray)

Shift coefficients to [0, inf) for better visualization, note that predicted risk remain unchanged so doesn't affect model performance.

get_max_features(→ Dict[str, float])

Get maximum features, helper for checking edge cases in risk score card

Module Contents

fasterrisk.wrapper.shift_coefficients(feature_offsets: pandas.DataFrame, features_and_betas: List) numpy.ndarray

Shift coefficients to [0, inf) for better visualization, note that predicted risk remain unchanged so doesn’t affect model performance.

Parameters:
  • feature_offsets (pd.DataFrame) – dataframe containing feature with their corresponding offsets

  • features_and_betas (List) – list containing features and their corresponding coefficients

Returns:

shifted coefficients

Return type:

np.ndarray

fasterrisk.wrapper.get_max_features(X_train: pandas.DataFrame) Dict[str, float]

Get maximum features, helper for checking edge cases in risk score card

Returns:

dictionary keyed by feature name and valued by the maximum value of the feature

Return type:

Dict[str, float]

class fasterrisk.wrapper.FasterRisk(k: int = 10, select_top_m: int = 50, lb: float = -5, ub: float = 5, gap_tolerance: float = 0.05, parent_size: int = 10, child_size: int = None, maxAttempts: int = 50, num_ray_search: int = 20, lineSearch_early_stop_tolerance: float = 0.001, group_sparsity: int = None, featureIndex_to_groupIndex: numpy.ndarray = None)

Bases: sklearn.base.BaseEstimator

Wrapper for FasterRisk algorithm

Parameters:
  • k (int) – sparsity constraint, equivalent to number of selected (binarized) features in the final sparse model(s)

  • select_top_m (int, optional) – number of top solutions to keep among the pool of diverse sparse solutions, by default 50

  • lb (float, optional) – lower bound of the coefficients, by default -5

  • ub (float, optional) – upper bound of the coefficients, by default 5

  • gap_tolerance (float, optional) – tolerance in logistic loss for creating diverse sparse solutions, by default 0.05

  • parent_size (int, optional) – how many solutions to retain after beam search, by default 10

  • child_size (int, optional) – how many new solutions to expand for each existing solution, by default None

  • maxAttempts (int, optional) – how many alternative features to try in order to replace the old feature during the diverse set pool generation, by default None

  • num_ray_search (int, optional) – how many multipliers to try for each continuous sparse solution, by default 20

  • lineSearch_early_stop_tolerance (float, optional) – tolerance level to stop linesearch early (error_of_loss_difference/loss_of_continuous_solution), by default 0.001

  • group_sparsity (int, optional) – number of groups to be selected, by default None

  • featureIndex_to_groupIndex (ndarray, optional) – (1D array with int type) featureIndex_to_groupIndex[i] is the group index of feature i, by default None

multipliers_

multipliers used in the final diverse sparse models

Type:

List

beta0_

intercepts used in the final diverse sparse models

Type:

List

betas_

coefficients used in the final diverse sparse models

Type:

List

k
select_top_m
lb
ub
gap_tolerance
parent_size
child_size
maxAttempts
lineSearch_early_stop_tolerance
group_sparsity
featureIndex_to_groupIndex
fit(X: numpy.ndarray, y: numpy.ndarray) None

Train FasterRisk

Parameters:
  • X (np.ndarray) – training data

  • y (np.ndarray) – training data labels

predict_proba(X: numpy.ndarray, model_idx: int = 0) numpy.ndarray[float]

make probability predictions for binary classification

Parameters:
  • X (np.ndarray) – input data

  • model_idx (int, optional) – used to specify which model to use (ranked by increasing order of logistic loss) among diverse sparse models, by default 0, which is the model with minimum logistic loss

Returns:

probability predictions for binary classification

Return type:

np.ndarray[float]

predict(X: numpy.ndarray, model_idx: int = 0) numpy.ndarray[int]

make bianry prediction

Parameters:
  • X (np.ndarray) – input data

  • model_idx (int, optional) – used to specify which model to use (ranked by increasing order of logistic loss) among diverse sparse models, by default 0, which is the model with minimum logistic loss

Returns:

binary predictions

Return type:

np.ndarray[float]

get_model_params() Tuple[List[float], List[float], List[float]]

Get model parameters for FasterRisk

Return type:

Three lists of multipliers, beta0s (intercepts), and betas (coefficients)

print_risk_card(feature_names: List[str], X_train: numpy.ndarray, y_train: numpy.ndarray = None, model_idx: int = 0, quantile_len: int = 30) None

print risk score card

Parameters:
  • feature_names (list) – feature names for the features

  • X_train (np.ndarray)

  • y_train (np.ndarray, optional)

  • provided (if)

  • set (prints logistic loss on training)

  • model_idx (int, optional) – index for the classifier in the pool of solutions, by default 0, which is the classifier with minimum logistic loss

visualize_risk_card(names: List[str], X_train: numpy.ndarray | pandas.DataFrame, title: str = 'RISK SCORE CARD', model_idx: int | None = 0, save_path: str | None = None, quantile_len: int | None = 30, custom_row_order: List[str] | None = None, center_box_width: int | None = 600, border_width: int | None = 1) PIL.Image.Image | None

visualize score card as an image

Parameters:
  • names (List[str]) – feature names

  • X_train (np.ndarray | pd.DataFrame) – training data

  • title (str) – title of the entire score card

  • save_path (str, optional) – directory to save score card to, by default None

  • quantile_len (int, optional) – number of quantiles to use for score to risk conversion table, equivalent to number of cells in risk table, by default 30

  • custom_row_order (List[str], optional) – custom order for features in risk score card, by default None

  • center_box_width (int, optional) – width of the center box, by default 600

  • border_width (int, optional) – width of the border, by default 2

Returns:

if save_path is None, return the score card as an image

Return type:

Optional[Image.Image]

static define_bounds(X: pandas.DataFrame, feature_bound_pairs: Dict[str, Tuple[float | int, float | int]], lb_else: float | int, ub_else: float | int) Tuple[List, List]

Obtain user defined bounds for each feature in X.

Parameters:
  • X (pd.DataFrame) – data of interest

  • feature_bound_pairs (Dict) – dictionary of feature names and their corresponding bounds, such as {‘feature_name’: (lb, ub), …}

  • lb_else (Union[float, int]) – lb bounds for all other features not specified by feature_bound_pairs, if all features are included in feature_bound_pairs, then this is lb for the intercept

  • ub_else (Union[float, int]) – ub bounds for all other features not specified by feature_bound_pairs, if all features are included in feature_bound_pairs, then this is up for the intercept

Returns:

lb_bounds, ub_bounds

Return type:

Tuple[List, List]