fasterrisk.wrapper
Classes
Wrapper for FasterRisk algorithm |
Functions
|
Shift coefficients to [0, inf) for better visualization, note that predicted risk remain unchanged so doesn't affect model performance. |
|
Get maximum features, helper for checking edge cases in risk score card |
Module Contents
- fasterrisk.wrapper.shift_coefficients(feature_offsets: pandas.DataFrame, features_and_betas: List) numpy.ndarray
Shift coefficients to [0, inf) for better visualization, note that predicted risk remain unchanged so doesn’t affect model performance.
- Parameters:
feature_offsets (pd.DataFrame) – dataframe containing feature with their corresponding offsets
features_and_betas (List) – list containing features and their corresponding coefficients
- Returns:
shifted coefficients
- Return type:
np.ndarray
- fasterrisk.wrapper.get_max_features(X_train: pandas.DataFrame) Dict[str, float]
Get maximum features, helper for checking edge cases in risk score card
- Returns:
dictionary keyed by feature name and valued by the maximum value of the feature
- Return type:
Dict[str, float]
- class fasterrisk.wrapper.FasterRisk(k: int = 10, select_top_m: int = 50, lb: float = -5, ub: float = 5, gap_tolerance: float = 0.05, parent_size: int = 10, child_size: int = None, maxAttempts: int = 50, num_ray_search: int = 20, lineSearch_early_stop_tolerance: float = 0.001, group_sparsity: int = None, featureIndex_to_groupIndex: numpy.ndarray = None)
Bases:
sklearn.base.BaseEstimatorWrapper for FasterRisk algorithm
- Parameters:
k (int) – sparsity constraint, equivalent to number of selected (binarized) features in the final sparse model(s)
select_top_m (int, optional) – number of top solutions to keep among the pool of diverse sparse solutions, by default 50
lb (float, optional) – lower bound of the coefficients, by default -5
ub (float, optional) – upper bound of the coefficients, by default 5
gap_tolerance (float, optional) – tolerance in logistic loss for creating diverse sparse solutions, by default 0.05
parent_size (int, optional) – how many solutions to retain after beam search, by default 10
child_size (int, optional) – how many new solutions to expand for each existing solution, by default None
maxAttempts (int, optional) – how many alternative features to try in order to replace the old feature during the diverse set pool generation, by default None
num_ray_search (int, optional) – how many multipliers to try for each continuous sparse solution, by default 20
lineSearch_early_stop_tolerance (float, optional) – tolerance level to stop linesearch early (error_of_loss_difference/loss_of_continuous_solution), by default 0.001
group_sparsity (int, optional) – number of groups to be selected, by default None
featureIndex_to_groupIndex (ndarray, optional) – (1D array with int type) featureIndex_to_groupIndex[i] is the group index of feature i, by default None
- multipliers_
multipliers used in the final diverse sparse models
- Type:
List
- beta0_
intercepts used in the final diverse sparse models
- Type:
List
- betas_
coefficients used in the final diverse sparse models
- Type:
List
- k
- select_top_m
- lb
- ub
- gap_tolerance
- parent_size
- child_size
- maxAttempts
- num_ray_search
- lineSearch_early_stop_tolerance
- group_sparsity
- featureIndex_to_groupIndex
- fit(X: numpy.ndarray, y: numpy.ndarray) None
Train FasterRisk
- Parameters:
X (np.ndarray) – training data
y (np.ndarray) – training data labels
- predict_proba(X: numpy.ndarray, model_idx: int = 0) numpy.ndarray[float]
make probability predictions for binary classification
- Parameters:
X (np.ndarray) – input data
model_idx (int, optional) – used to specify which model to use (ranked by increasing order of logistic loss) among diverse sparse models, by default 0, which is the model with minimum logistic loss
- Returns:
probability predictions for binary classification
- Return type:
np.ndarray[float]
- predict(X: numpy.ndarray, model_idx: int = 0) numpy.ndarray[int]
make bianry prediction
- Parameters:
X (np.ndarray) – input data
model_idx (int, optional) – used to specify which model to use (ranked by increasing order of logistic loss) among diverse sparse models, by default 0, which is the model with minimum logistic loss
- Returns:
binary predictions
- Return type:
np.ndarray[float]
- get_model_params() Tuple[List[float], List[float], List[float]]
Get model parameters for FasterRisk
- Return type:
Three lists of multipliers, beta0s (intercepts), and betas (coefficients)
- print_risk_card(feature_names: List[str], X_train: numpy.ndarray, y_train: numpy.ndarray = None, model_idx: int = 0, quantile_len: int = 30) None
print risk score card
- Parameters:
feature_names (list) – feature names for the features
X_train (np.ndarray)
y_train (np.ndarray, optional)
provided (if)
set (prints logistic loss on training)
model_idx (int, optional) – index for the classifier in the pool of solutions, by default 0, which is the classifier with minimum logistic loss
- visualize_risk_card(names: List[str], X_train: numpy.ndarray | pandas.DataFrame, title: str = 'RISK SCORE CARD', model_idx: int | None = 0, save_path: str | None = None, quantile_len: int | None = 30, custom_row_order: List[str] | None = None, center_box_width: int | None = 600, border_width: int | None = 1) PIL.Image.Image | None
visualize score card as an image
- Parameters:
names (List[str]) – feature names
X_train (np.ndarray | pd.DataFrame) – training data
title (str) – title of the entire score card
save_path (str, optional) – directory to save score card to, by default None
quantile_len (int, optional) – number of quantiles to use for score to risk conversion table, equivalent to number of cells in risk table, by default 30
custom_row_order (List[str], optional) – custom order for features in risk score card, by default None
center_box_width (int, optional) – width of the center box, by default 600
border_width (int, optional) – width of the border, by default 2
- Returns:
if save_path is None, return the score card as an image
- Return type:
Optional[Image.Image]
- static define_bounds(X: pandas.DataFrame, feature_bound_pairs: Dict[str, Tuple[float | int, float | int]], lb_else: float | int, ub_else: float | int) Tuple[List, List]
Obtain user defined bounds for each feature in X.
- Parameters:
X (pd.DataFrame) – data of interest
feature_bound_pairs (Dict) – dictionary of feature names and their corresponding bounds, such as {‘feature_name’: (lb, ub), …}
lb_else (Union[float, int]) – lb bounds for all other features not specified by feature_bound_pairs, if all features are included in feature_bound_pairs, then this is lb for the intercept
ub_else (Union[float, int]) – ub bounds for all other features not specified by feature_bound_pairs, if all features are included in feature_bound_pairs, then this is up for the intercept
- Returns:
lb_bounds, ub_bounds
- Return type:
Tuple[List, List]