metadata

library_name: sklearn
tags:
  - sklearn
  - skops
  - tabular-classification
model_format: pickle
model_file: skops-89qohtne.pkl
widget:
  - structuredData:
      AGE:
        - 32
        - 45
        - 25
      GENDER:
        - m
        - f
        - f
      HOWPAID:
        - 'weekly '
        - 'weekly '
        - 'weekly '
      INCOME:
        - 21772
        - 27553
        - 23477
      LOANS:
        - 1
        - 2
        - 1
      MARITAL:
        - 'married  '
        - divsepwid
        - 'single   '
      MORTGAGE:
        - 'y'
        - 'y'
        - 'n'
      NUMCARDS:
        - 2
        - 6
        - 1
      NUMKIDS:
        - 1
        - 4
        - 1
      STORECAR:
        - 3
        - 5
        - 2

Model description

This is a logistic regression model trained on customers' credit card risk data in a bank using sklearn library. The model predicts whether a customer is worth issuing a credit card or not. The full dataset can be viewed at the following link: https://huggingface.co/datasets/saifhmb/CreditCardRisk

Training Procedure

The data preprocessing steps applied include the following:

Dropping high cardinality features, specifically ID
Transforming and Encoding categorical features namely: GENDER, MARITAL, HOWPAID, MORTGAGE and the target variable, RISK
Splitting the dataset into training/test set using 85/15 split ratio
Applying feature scaling on all features

Hyperparameters

Click to expand

Hyperparameter	Value
memory
steps	[('preprocessor', ColumnTransformer(remainder='passthrough', transformers=[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['GENDER', 'MARITAL', 'HOWPAID', 'MORTGAGE']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))])), ('classifier', LogisticRegression())]
verbose	False
preprocessor	ColumnTransformer(remainder='passthrough', transformers=[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['GENDER', 'MARITAL', 'HOWPAID', 'MORTGAGE']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))])
classifier	LogisticRegression()
preprocessor__n_jobs
preprocessor__remainder	passthrough
preprocessor__sparse_threshold	0.3
preprocessor__transformer_weights
preprocessor__transformers	[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['GENDER', 'MARITAL', 'HOWPAID', 'MORTGAGE']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))]
preprocessor__verbose	False
preprocessor__verbose_feature_names_out	True
preprocessor__cat	Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor__num	Pipeline(steps=[('scale', StandardScaler())])
preprocessor__cat__memory
preprocessor__cat__steps	[('onehot', OneHotEncoder(handle_unknown='ignore'))]
preprocessor__cat__verbose	False
preprocessor__cat__onehot	OneHotEncoder(handle_unknown='ignore')
preprocessor__cat__onehot__categories	auto
preprocessor__cat__onehot__drop
preprocessor__cat__onehot__dtype	<class 'numpy.float64'>
preprocessor__cat__onehot__handle_unknown	ignore
preprocessor__cat__onehot__max_categories
preprocessor__cat__onehot__min_frequency
preprocessor__cat__onehot__sparse	deprecated
preprocessor__cat__onehot__sparse_output	True
preprocessor__num__memory
preprocessor__num__steps	[('scale', StandardScaler())]
preprocessor__num__verbose	False
preprocessor__num__scale	StandardScaler()
preprocessor__num__scale__copy	True
preprocessor__num__scale__with_mean	True
preprocessor__num__scale__with_std	True
classifier__C	1.0
classifier__class_weight
classifier__dual	False
classifier__fit_intercept	True
classifier__intercept_scaling	1
classifier__l1_ratio
classifier__max_iter	100
classifier__multi_class	auto
classifier__n_jobs
classifier__penalty	l2
classifier__random_state
classifier__solver	lbfgs
classifier__tol	0.0001
classifier__verbose	0
classifier__warm_start	False

Model Plot

Pipeline(steps=[('preprocessor',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore'))]),['GENDER', 'MARITAL','HOWPAID', 'MORTGAGE']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))])),('classifier', LogisticRegression())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

The target variable, RISK is multiclass. In sklearn, precision and recall functions have a parameter called, average. This parameter is required for a multiclass/multilabel target. average = 'micro' was used to calculate the precision and recall metrics globally by counting the total true positives, false negatives and false positives

Metric	Value
accuracy	0.699187
precision	0.699187
recall	0.699187

Feature Importance

SHAP was used to determine the important features that helps the model make decisions

Confusion Matrix

Model Card Authors

This model card is written by following authors: Seifullah Bello

Model Card Contact

You can contact the model card authors through following channels: [email protected]