= 5 # Re-run this for datasets 1-5 to see how it performs
dataset_no = 40 # 40 is the (hard-coded) cut-off as set out previously
cutoff = 1 # Testing only: 4, proper runs of the models: 24
cv_runs = 500 # 500 epochs for gradient descent fitting of GLMs is very quick
glm_iter = 500 # 100 for experimentation, 500 appears to give models that have fully converged or are close to convergence.
nn_iter = 100 # Lower for CV to save running time
nn_cv_iter = 500 # Latest architecture and loss converges fairly quickly mdn_iter
Introduction
In an upcoming talk I’ll be looking at the use of neural networks in individual claims modelling. I start with a Chain Ladder model and progressively expand it into an deep learning individual claim model while still having sensible results. Additionally, there is a benchmark against a GBM model as well. I use a number of different synthetic data sets for this purpose, the first satisfying Chain Ladder assumptions and the remaining ones breaking these assumptions in different ways.
My talk is supported by 5 freely available Jupyter notebooks for each of the data sets so please refer to these if you want to run the code. These may be downloaded as this zip file
The remainder of this post covers the contents and results of the notebook for data set 5.
The notebook is long but structured into sections as follows:
- Data: Read in SPLICE data (including case estimates) and create additional fields, test and train datasets,
- Chain ladder: Directly calculate Chain Ladder factors as a baseline,
- Chain ladder (GLM) Recreate Chain Ladder results, with a GLM formulation,
- Individual GLM Convert the GLM into a very basic individual claims model,
- Individual GLM with splines Replace the overparametrized one-hot encoding with splines,
- Residual network Consider a ResNet,
- Spline-based individual network: Create a customised neural network architecture with similar behaviour to the individual GLM with splines,
- Spline-based Individual Log-normal Mixture Density Network: Introduce a MDN-based network, with modifications to accommodate a log-normal distribution,
- Rolling origin cross-validation and parameter search,
- Additional data augmentation for training individual claim features,
- Detailed individual models using claim features, including a hyperparameter tuned GBM as a benchmark,
- Review and summary.
!pip show pandas numpy scikit-learn torch matplotlib
Name: pandas
Version: 1.5.2
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: The Pandas Development Team
Author-email: pandas-dev@python.org
License: BSD-3-Clause
Location: /Users/jacky/GitRepos/NNs-on-Individual-Claims/.conda/lib/python3.10/site-packages
Requires: numpy, python-dateutil, pytz
Required-by: catboost, category-encoders, chainladder, cmdstanpy, darts, datasets, nfoursid, pmdarima, prophet, pytorch-tabular, seaborn, shap, statsforecast, statsmodels, xarray
---
Name: numpy
Version: 1.23.5
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email:
License: BSD
Location: /Users/jacky/GitRepos/NNs-on-Individual-Claims/.conda/lib/python3.10/site-packages
Requires:
Required-by: catboost, category-encoders, cmdstanpy, contourpy, darts, datasets, duckdb, lightgbm, matplotlib, nfoursid, numba, pandas, patsy, pmdarima, prophet, pyarrow, pyod, pytorch-lightning, pytorch-tabnet, pytorch-tabular, ray, scikit-learn, scikit-optimize, scipy, seaborn, shap, skorch, sparse, statsforecast, statsmodels, tbats, tensorboard, tensorboardX, torchmetrics, tune-sklearn, xarray, xgboost
---
Name: scikit-learn
Version: 1.2.0
Summary: A set of python modules for machine learning and data mining
Home-page: http://scikit-learn.org
Author:
Author-email:
License: new BSD
Location: /Users/jacky/GitRepos/NNs-on-Individual-Claims/.conda/lib/python3.10/site-packages
Requires: joblib, numpy, scipy, threadpoolctl
Required-by: category-encoders, chainladder, darts, lightgbm, pmdarima, pyod, pytorch-tabnet, pytorch-tabular, scikit-optimize, shap, skorch, tbats, tune-sklearn
---
Name: torch
Version: 1.13.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /Users/jacky/GitRepos/NNs-on-Individual-Claims/.conda/lib/python3.10/site-packages
Requires: typing-extensions
Required-by: darts, pytorch-lightning, pytorch-tabnet, pytorch-tabular, torchmetrics
---
Name: matplotlib
Version: 3.6.2
Summary: Python plotting package
Home-page: https://matplotlib.org
Author: John D. Hunter, Michael Droettboom
Author-email: matplotlib-users@python.org
License: PSF
Location: /Users/jacky/GitRepos/NNs-on-Individual-Claims/.conda/lib/python3.10/site-packages
Requires: contourpy, cycler, fonttools, kiwisolver, numpy, packaging, pillow, pyparsing, python-dateutil
Required-by: catboost, chainladder, darts, nfoursid, prophet, pyod, pytorch-tabular, seaborn, statsforecast
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# from torch.utils.data.sampler import BatchSampler, RandomSampler
# from torch.utils.data import DataLoader
import torch
import torch.nn as nn
from torch.nn import functional as F
from torch.autograd import Variable
from sklearn.compose import ColumnTransformer, TransformedTargetRegressor
from sklearn.preprocessing import OneHotEncoder, StandardScaler, MinMaxScaler, SplineTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error
from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, PredefinedSplit
# from skopt import BayesSearchCV
import math
= '{:,.2f}'.format pd.options.display.float_format
Scikit Module
This has been used in past experiments so it is basically my boilerplate code. With minor tweaks.
class TabularNetRegressor(BaseEstimator, RegressorMixin):
def __init__(
self,
module,=nn.PoissonNLLLoss,
criterion=100,
max_iter=0.01,
max_lr=False,
keep_best_model=None,
batch_function=1,
rebatch_every_iter=20,
n_hidden=0.0, # lambda is a reserved word
l1_penalty=["linear.weight", "hidden.weight"],
l1_applies_params=0.0,
weight_decay=False,
batch_norm=True,
interactions=0.0,
dropout=None,
clip_value=3,
n_gaussians=1,
verbose="cpu" if torch.backends.mps.is_available() else ("cuda" if torch.cuda.is_available() else "cpu"), # Use GPU if available, leave mps off until more stable
device=None,
init_extra**kwargs
):""" Tabular Neural Network Regressor (for Claims Reserving)
This trains a neural network with specified loss, Log Link and l1 LASSO penalties
using Pytorch. It has early stopping.
Args:
module: pytorch nn.Module. Should have n_input and n_output as parameters and
if l1_penalty, init_weight, or init_bias are used, a final layer
called "linear".
criterion: pytorch loss function. Consider nn.PoissonNLLLoss for log link.
max_iter (int): Maximum number of epochs before training stops.
Previously this used a high value for triangles since the record count is so small.
For larger regression problems, a lower number of iterations may be sufficient.
max_lr (float): Min / Max learning rate - we will use one_cycle_lr
keep_best_model (bool): If true, keep and use the model weights with the best loss rather
than the final weights.
batch_function (None or fn): If not None, used to get a batch from X and y
rebatch_every_iter (int): redo batches every
l1_penalty (float): l1 penalty factor. If not zero, is applied to
the layers in the Module with names matching l1_applies_params.
(we use l1_penalty because lambda is a reserved word in Python
for anonymous functions)
weight_decay (float): weight decay - analogous to l2 penalty factor
Applied to all weights
clip_value (None or float): clip gradient norms at a particular value
n_hidden (int), batch_norm(bool), dropout (float), interactions(bool), n_gaussians(int):
Passed to module. Hidden layer size, batch normalisation, dropout percentages, and interactions flag.
init_extra (coerces to torch.Tensor): set init_bias, passed to module. If none, default to np.log(y.mean()).values.astype(np.float32)
verbose (int): 0 means don't print. 1 means do print.
"""
self.module = module
self.criterion = criterion
self.keep_best_model = keep_best_model
self.l1_penalty = l1_penalty
self.l1_applies_params = l1_applies_params
self.weight_decay = weight_decay
self.max_iter = max_iter
self.n_hidden = n_hidden
self.batch_norm = batch_norm
self.batch_function = batch_function
self.rebatch_every_iter = rebatch_every_iter
self.interactions = interactions
self.dropout = dropout
self.n_gaussians = n_gaussians
self.device = device
self.target_device = torch.device(device)
self.max_lr = max_lr
self.init_extra = init_extra
self.print_loss_every_iter = max(1, int(max_iter / 10))
self.verbose = verbose
self.clip_value = clip_value
self.kwargs = kwargs
def fix_array(self, y):
"Need to be picky about array formats"
if isinstance(y, pd.DataFrame) or isinstance(y, pd.Series):
= y.values
y if y.ndim == 1:
= y.reshape(-1, 1)
y = y.astype(np.float32)
y return y
def setup_module(self, n_input, n_output):
# Training new model
self.module_ = self.module(
=n_input,
n_input=n_output,
n_output=self.n_hidden,
n_hidden=self.batch_norm,
batch_norm=self.dropout,
dropout=self.interactions,
interactions_trainable=self.n_gaussians,
n_gaussians=self.init_bias_calc,
init_bias=self.init_extra,
init_extra**self.kwargs
self.target_device)
).to(
def fit(self, X, y):
# The main fit logic is in partial_fit
# We will try a few times if numbers explode because NN's are finicky and we are doing CV
= X.shape[-1]
n_input = 1 if y.ndim == 1 else y.shape[-1]
n_output self.init_bias_calc = np.log(y.mean()).values.astype(np.float32)
self.setup_module(n_input=n_input, n_output=n_output)
# Partial fit means you take an existing model and keep training
# so the logic is basically the same
self.partial_fit(X, y)
return self
def partial_fit(self, X, y):
# Check that X and y have correct shape
= check_X_y(X, y, multi_output=True)
X, y
# Convert to Pytorch Tensor
= torch.from_numpy(self.fix_array(X)).to(self.target_device)
X_tensor = torch.from_numpy(self.fix_array(y)).to(self.target_device)
y_tensor
# Optimizer - the generically useful AdamW. Other options like SGD
# are also possible.
= torch.optim.AdamW(
optimizer =self.module_.parameters(),
params=self.max_lr / 10,
lr=self.weight_decay
weight_decay
)
# Scheduler - one cycle LR
= torch.optim.lr_scheduler.OneCycleLR(
scheduler
optimizer, =self.max_lr,
max_lr=1,
steps_per_epoch=self.max_iter
epochs
)
# Loss Function
try:
= self.criterion(log_input=False).to(self.target_device) # Pytorch loss function
loss_fn except TypeError:
= self.criterion # Custom loss function
loss_fn
= float('inf') # set to infinity initially
best_loss
if self.batch_function is not None:
= self.batch_function(X_tensor, y_tensor)
X_tensor_batch, y_tensor_batch else:
= X_tensor, y_tensor
X_tensor_batch, y_tensor_batch
# Training loop
for epoch in range(self.max_iter): # Repeat max_iter times
self.module_.train()
= self.module_(X_tensor_batch) # Apply current model
y_pred
= loss_fn(y_pred, y_tensor_batch) # What is the loss on it?
loss if self.l1_penalty > 0.0: # Lasso penalty
+= self.l1_penalty * sum(
loss
[abs().sum()
w.for p, w in self.module_.named_parameters()
if p in self.l1_applies_params
]
)
if self.keep_best_model & (loss.item() < best_loss):
= loss.item()
best_loss self.best_model = self.module_.state_dict()
# Reset optimizer
optimizer.zero_grad() # Apply back propagation
loss.backward()
# gradient norm clipping
if self.clip_value is not None:
= torch.nn.utils.clip_grad_norm_(self.module_.parameters(), self.clip_value)
grad_norm # check if gradients have been clipped
if (self.verbose >= 2) & (grad_norm > self.clip_value):
print(f'Gradient norms have been clipped in epoch {epoch}, value before clipping: {grad_norm}')
# Update model parameters
optimizer.step()
scheduler.step()
if torch.isnan(loss.data).tolist():
raise ValueError('Error: nan loss')
# Every self.print_loss_every_iter steps, print RMSE
if (epoch % self.print_loss_every_iter == 0) and (self.verbose > 0):
self.module_.eval() # Eval mode
self.module_.point_estimates=True # Distributional models - set to point
= self.module_(X_tensor) # Get "real" model estimates
y_pred_point assert(y_pred_point.size() == y_tensor.size())
= torch.sqrt(torch.mean(torch.square(y_pred_point - y_tensor)))
rmse self.module_.train() # back to training
self.module_.point_estimates=False # Distributional models - set to point
print("Train RMSE: ", rmse.data.tolist(), " Train Loss: ", loss.data.tolist())
if (self.batch_function is not None) & (epoch % self.rebatch_every_iter == 0):
print(f"refreshing batch on epoch {epoch}")
= self.batch_function(X_tensor, y_tensor)
X_tensor_batch, y_tensor_batch
if self.keep_best_model:
self.module_.load_state_dict(self.best_model)
self.module_.eval()
# Return the regressor
return self
def predict(self, X, point_estimates=True):
# Checks
self) # Check is fit had been called
check_is_fitted(= check_array(X) # Check input
X
# Convert to Pytorch Tensor
= torch.from_numpy(self.fix_array(X)).to(self.target_device)
X_tensor
self.module_.eval() # Eval (prediction) mode
self.module_.point_estimates = point_estimates
# Apply current model and convert back to numpy
if point_estimates:
= self.module_(X_tensor).cpu().detach().numpy()
y_pred if y_pred.shape[-1] == 1:
return y_pred.ravel()
else:
return y_pred
else:
= self.module_(X_tensor)
y_pred return y_pred
def score(self, X, y):
# Negative RMSE score (higher needs to be better)
= self.predict(X)
y_pred = self.fix_array(y)
y return -np.sqrt(np.mean((y_pred - y)**2))
class ColumnKeeper(BaseEstimator, TransformerMixin):
"""Keeps named cols, preserves DataFrame output"""
def __init__(self, cols):
self.cols = cols
def fit(self, X, y):
return self
def transform(self, X):
return X.copy()[self.cols]
Data
Convert to a time series with one record per time period (even if multiple or no claims transactions). This follows my earlier gist here but rewritten in pure pandas (shorter, less dependencies, but probably less readable so refer to gist for the concepts).
= pd.read_csv(f"https://raw.githubusercontent.com/agi-lab/SPLICE/main/datasets/complexity_{dataset_no}/payment_{dataset_no}.csv")
paid = pd.read_csv(f"https://raw.githubusercontent.com/agi-lab/SPLICE/main/datasets/complexity_{dataset_no}/incurred_{dataset_no}.csv")
incurred
# Recreate some of the columns from paid into the incurred set
"trn_no"] = incurred.groupby("claim_no").cumcount() + 1
incurred.loc[:, str.contains("P"), "pmt_no"] = incurred.loc[incurred.txn_type.str.contains("P")].groupby("claim_no").cumcount() + 1
incurred.loc[incurred.txn_type."payment_period"] = np.ceil(incurred.txn_time).astype("int")
incurred["payment_size"] = incurred.groupby("claim_no").cumpaid.diff()
incurred["payment_size"] = incurred["payment_size"].fillna(incurred["cumpaid"])
incurred["incurred_incremental"] = incurred.groupby("claim_no").incurred.diff()
incurred["incurred_incremental"] = incurred["incurred_incremental"].fillna(incurred["incurred"])
incurred[= incurred.merge(
transactions "claim_no", "occurrence_period", "occurrence_time", "notidel", "setldel"]].drop_duplicates(),
paid[[="claim_no",
on
)"noti_period"] = np.ceil(transactions["occurrence_time"] + transactions["notidel"]).astype('int')
transactions["settle_period"] = np.ceil(transactions["occurrence_time"] + transactions["notidel"] + transactions["setldel"]).astype('int')
transactions[
# Apply cut-off since some of the logic in this notebook assumes an equal set of dimensions
"development_period"] = np.minimum(transactions["payment_period"] - transactions["occurrence_period"], cutoff)
transactions[= cutoff - 1 # (transactions["payment_period"] - transactions["occurrence_period"]).max()
num_dev_periods transactions.head()
Unnamed: 0 | claim_no | claim_size | txn_time | txn_delay | txn_type | incurred | OCL | cumpaid | multiplier | ... | payment_period | payment_size | incurred_incremental | occurrence_period | occurrence_time | notidel | setldel | noti_period | settle_period | development_period | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 232,310.09 | 1.87 | 0.00 | Ma | 31,291.71 | 31,291.71 | 0.00 | 1.00 | ... | 2 | 0.00 | 31,291.71 | 1 | 0.73 | 1.13 | 32.37 | 2 | 35 | 1 |
1 | 2 | 1 | 232,310.09 | 7.35 | 5.49 | P | 31,291.71 | 17,575.20 | 13,716.51 | NaN | ... | 8 | 13,716.51 | 0.00 | 1 | 0.73 | 1.13 | 32.37 | 2 | 35 | 7 |
2 | 3 | 1 | 232,310.09 | 13.99 | 12.12 | PMi | 32,133.97 | 1,606.70 | 30,527.27 | 0.94 | ... | 14 | 16,810.76 | 842.26 | 1 | 0.73 | 1.13 | 32.37 | 2 | 35 | 13 |
3 | 4 | 1 | 232,310.09 | 14.23 | 12.36 | Ma | 133,072.86 | 102,545.59 | 30,527.27 | 7.75 | ... | 15 | 0.00 | 100,938.89 | 1 | 0.73 | 1.13 | 32.37 | 2 | 35 | 14 |
4 | 5 | 1 | 232,310.09 | 14.41 | 12.54 | Ma | 314,565.32 | 284,038.05 | 30,527.27 | 2.36 | ... | 15 | 0.00 | 181,492.46 | 1 | 0.73 | 1.13 | 32.37 | 2 | 35 | 14 |
5 rows × 22 columns
# Transactions summarised by claim/dev:
= (transactions
transactions_group "claim_no", "development_period"], as_index=False)
.groupby(["payment_size": "sum", "incurred_incremental": "sum", "pmt_no": "max", "trn_no": "max"})
.agg({=["claim_no", "development_period"])
.sort_values(by
)
# This is varied from the original version:
= pd.DataFrame.from_dict({"development_period": range(0, num_dev_periods + 1)})
range_payment_delay
# Claims header + development periods
= (
claim_head_expand_dev
transactions"claim_no", "occurrence_period", "occurrence_time", "noti_period", "notidel", "settle_period"]]
.loc[:, [
.drop_duplicates()
).merge(
range_payment_delay,="cross"
how
).assign(=lambda df: (df.occurrence_period + df.development_period),
payment_period=lambda df: (df.occurrence_period + df.development_period) >= df.settle_period
is_settled
)
# create the dataset
= claim_head_expand_dev.merge(
dat
transactions_group,="left",
how=["claim_no", "development_period"],
on0)
).fillna(
# Only periods after notification
= dat.loc[dat.payment_period >= dat.noti_period]
dat
# Clean close to zero values
"payment_size"] = np.where(abs(dat.payment_size) < 1e-2, 0.0, dat.payment_size)
dat[
# Cumulative payments
"payment_size_cumulative"] = dat[["claim_no", "payment_size"]].groupby('claim_no').cumsum()
dat["incurred_cumulative"] = dat[["claim_no", "incurred_incremental"]].groupby('claim_no').cumsum()
dat["outstanding_claims"] = dat["incurred_cumulative"] - dat["payment_size_cumulative"]
dat["outstanding_claims"] = np.where(abs(dat.outstanding_claims) < 1e-2, 0.0, dat.outstanding_claims)
dat[
"payment_to_prior_period"] = dat["payment_size_cumulative"] - dat["payment_size"]
dat["has_payment_to_prior_period"] = np.where(dat.payment_to_prior_period > 1e-2, 1, 0)
dat["log1_payment_to_prior_period"] = np.log1p(dat.payment_to_prior_period)
dat[
"incurred_to_prior_period"] = dat["incurred_cumulative"] - dat["incurred_incremental"]
dat["has_incurred_to_prior_period"] = np.where(dat.incurred_to_prior_period > 1e-2, 1, 0)
dat["log1_incurred_to_prior_period"] = np.log1p(dat.incurred_to_prior_period)
dat[
"outstanding_to_prior_period"] = dat["incurred_to_prior_period"] - dat["payment_to_prior_period"]
dat["outstanding_to_prior_period"] = np.where(abs(dat.outstanding_to_prior_period) < 1e-2, 0.0, dat.outstanding_to_prior_period)
dat[
"log1_outstanding_to_prior_period"] = np.log1p(dat.outstanding_to_prior_period)
dat["has_outstanding_to_prior_period"] = np.where(dat.outstanding_to_prior_period > 1e-2, 1, 0)
dat[
"pmt_no"] = dat.groupby("claim_no")["pmt_no"].cummax()
dat["trn_no"] = dat.groupby("claim_no")["trn_no"].cummax()
dat["payment_count_to_prior_period"] = dat.groupby("claim_no")["pmt_no"].shift(1).fillna(0)
dat["transaction_count_to_prior_period"] = dat.groupby("claim_no")["trn_no"].shift(1).fillna(0)
dat[
"data_as_at_development_period"] = dat.development_period # See data augmentation section
dat["backdate_periods"] = 0
dat["payment_period_as_at"] = dat.payment_period
dat[
# Potential features for model later:
= [
data_cols "notidel",
"occurrence_time",
"development_period",
"payment_period",
"has_payment_to_prior_period",
"log1_payment_to_prior_period",
"has_incurred_to_prior_period",
"log1_incurred_to_prior_period",
"has_outstanding_to_prior_period",
"log1_outstanding_to_prior_period",
"payment_count_to_prior_period",
"transaction_count_to_prior_period"]
# The notebook is created on an MacBook Pro M1, which supports GPU mode in float32 only.
= dat.select_dtypes(np.float64).astype(np.float32)
dat[dat.select_dtypes(np.float64).columns]
"train_ind"] = (dat.payment_period <= cutoff)
dat["cv_ind"] = dat.payment_period % 5 # Cross validate on this column
dat[ dat
claim_no | occurrence_period | occurrence_time | noti_period | notidel | settle_period | development_period | payment_period | is_settled | payment_size | ... | outstanding_to_prior_period | log1_outstanding_to_prior_period | has_outstanding_to_prior_period | payment_count_to_prior_period | transaction_count_to_prior_period | data_as_at_development_period | backdate_periods | payment_period_as_at | train_ind | cv_ind | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 1 | 2 | False | 0.00 | ... | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 1 | 0 | 2 | True | 2 |
2 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 2 | 3 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 2 | 0 | 3 | True | 3 |
3 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 3 | 4 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 3 | 0 | 4 | True | 4 |
4 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 4 | 5 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 4 | 0 | 5 | True | 0 |
5 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 5 | 6 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 5 | 0 | 6 | True | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
146515 | 3663 | 40 | 39.87 | 46 | 5.27 | 56 | 35 | 75 | True | 0.00 | ... | 0.00 | 0.00 | 0 | 6.00 | 9.00 | 35 | 0 | 75 | False | 0 |
146516 | 3663 | 40 | 39.87 | 46 | 5.27 | 56 | 36 | 76 | True | 0.00 | ... | 0.00 | 0.00 | 0 | 6.00 | 9.00 | 36 | 0 | 76 | False | 1 |
146517 | 3663 | 40 | 39.87 | 46 | 5.27 | 56 | 37 | 77 | True | 0.00 | ... | 0.00 | 0.00 | 0 | 6.00 | 9.00 | 37 | 0 | 77 | False | 2 |
146518 | 3663 | 40 | 39.87 | 46 | 5.27 | 56 | 38 | 78 | True | 0.00 | ... | 0.00 | 0.00 | 0 | 6.00 | 9.00 | 38 | 0 | 78 | False | 3 |
146519 | 3663 | 40 | 39.87 | 46 | 5.27 | 56 | 39 | 79 | True | 0.00 | ... | 0.00 | 0.00 | 0 | 6.00 | 9.00 | 39 | 0 | 79 | False | 4 |
138644 rows × 32 columns
10) dat.head(
claim_no | occurrence_period | occurrence_time | noti_period | notidel | settle_period | development_period | payment_period | is_settled | payment_size | ... | outstanding_to_prior_period | log1_outstanding_to_prior_period | has_outstanding_to_prior_period | payment_count_to_prior_period | transaction_count_to_prior_period | data_as_at_development_period | backdate_periods | payment_period_as_at | train_ind | cv_ind | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 1 | 2 | False | 0.00 | ... | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 1 | 0 | 2 | True | 2 |
2 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 2 | 3 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 2 | 0 | 3 | True | 3 |
3 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 3 | 4 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 3 | 0 | 4 | True | 4 |
4 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 4 | 5 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 4 | 0 | 5 | True | 0 |
5 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 5 | 6 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 5 | 0 | 6 | True | 1 |
6 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 6 | 7 | False | 0.00 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 6 | 0 | 7 | True | 2 |
7 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 7 | 8 | False | 13,716.51 | ... | 31,291.71 | 10.35 | 1 | 0.00 | 1.00 | 7 | 0 | 8 | True | 3 |
8 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 8 | 9 | False | 0.00 | ... | 17,575.20 | 9.77 | 1 | 1.00 | 2.00 | 8 | 0 | 9 | True | 4 |
9 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 9 | 10 | False | 0.00 | ... | 17,575.20 | 9.77 | 1 | 1.00 | 2.00 | 9 | 0 | 10 | True | 0 |
10 | 1 | 1 | 0.73 | 2 | 1.13 | 35 | 10 | 11 | False | 0.00 | ... | 17,575.20 | 9.77 | 1 | 1.00 | 2.00 | 10 | 0 | 11 | True | 1 |
10 rows × 32 columns
Train test split is set up by payment period - past vs future.
There is a caveat for an unrealistic assumption - there is one record per claim even if it is reported “in the future” past the cutoff.
This affects the model training and it does mean when looking at aggregate prediction data that the model knows what the ultimate count is.
However, normally when we are building individual claim models they are used to predict expected costs of known claims so this is not a huge issue.
Chain Ladder, Direct Calculation, Aggregated Data
Chain ladder calculation, from first principles. Based on this article
= (dat
triangle "occurrence_period", "development_period", "payment_period"], as_index=False)
.groupby(["payment_size_cumulative": "sum", "payment_size": "sum"})
.agg({=["occurrence_period", "development_period"])
.sort_values(by
)= triangle.loc[triangle.payment_period <= cutoff]
triangle_train = triangle.loc[triangle.payment_period > cutoff]
triangle_test
= "occurrence_period", columns = "development_period", values = "payment_size_cumulative") triangle_train.pivot(index
development_period | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
occurrence_period | |||||||||||||||||||||
1 | 0.00 | 31,551.25 | 152,793.12 | 680,415.06 | 1,650,518.75 | 1,932,605.62 | 2,602,592.25 | 3,222,089.50 | 3,775,028.25 | 4,788,738.50 | ... | 15,739,832.00 | 16,082,119.00 | 16,780,246.00 | 17,400,658.00 | 17,422,826.00 | 17,422,826.00 | 17,485,892.00 | 17,485,892.00 | 17,485,892.00 | 17,485,892.00 |
2 | 10,286.75 | 49,493.38 | 132,020.67 | 430,670.75 | 1,225,081.75 | 2,044,801.12 | 2,433,025.00 | 2,808,325.75 | 3,586,153.00 | 4,218,801.50 | ... | 14,386,516.00 | 14,880,894.00 | 15,342,595.00 | 15,378,178.00 | 15,378,178.00 | 15,409,743.00 | 15,409,743.00 | 15,433,019.00 | 15,795,103.00 | NaN |
3 | 0.00 | 62,389.38 | 640,980.81 | 987,640.00 | 1,536,782.25 | 2,385,404.00 | 3,177,442.25 | 3,698,036.50 | 3,918,063.50 | 4,333,062.00 | ... | 15,336,449.00 | 15,437,039.00 | 15,646,281.00 | 15,867,258.00 | 16,067,564.00 | 16,067,564.00 | 16,071,147.00 | 16,108,434.00 | NaN | NaN |
4 | 5,646.32 | 63,203.44 | 245,984.03 | 584,083.62 | 1,030,191.31 | 2,371,411.75 | 3,179,435.75 | 4,964,554.00 | 6,244,163.00 | 6,612,675.00 | ... | 17,058,208.00 | 17,107,564.00 | 17,202,888.00 | 17,256,650.00 | 17,256,650.00 | 17,598,124.00 | 18,155,534.00 | NaN | NaN | NaN |
5 | 0.00 | 22,911.55 | 177,779.52 | 431,647.50 | 872,805.94 | 1,619,887.00 | 2,862,273.00 | 3,462,464.75 | 4,956,769.50 | 6,237,362.50 | ... | 19,066,434.00 | 19,121,178.00 | 19,157,450.00 | 19,222,538.00 | 19,222,538.00 | 19,235,524.00 | NaN | NaN | NaN | NaN |
6 | 2,259.24 | 62,889.91 | 357,107.84 | 813,191.12 | 1,218,815.75 | 2,030,866.88 | 3,065,646.00 | 4,652,615.50 | 5,097,719.50 | 6,097,211.00 | ... | 21,011,932.00 | 21,511,248.00 | 21,511,248.00 | 21,511,248.00 | 21,611,440.00 | NaN | NaN | NaN | NaN | NaN |
7 | 25,200.05 | 212,996.06 | 303,872.44 | 522,915.66 | 1,180,820.88 | 3,314,907.25 | 3,931,794.25 | 4,110,920.00 | 5,112,089.00 | 5,522,927.00 | ... | 17,723,798.00 | 18,120,490.00 | 18,120,490.00 | 18,526,526.00 | NaN | NaN | NaN | NaN | NaN | NaN |
8 | 0.00 | 101,324.27 | 482,201.94 | 1,302,672.88 | 1,762,479.38 | 2,816,009.75 | 3,599,199.00 | 4,404,539.50 | 5,549,806.50 | 6,296,549.50 | ... | 27,771,816.00 | 27,771,816.00 | 27,839,500.00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
9 | 0.00 | 21,563.11 | 207,902.11 | 1,019,339.94 | 1,467,485.12 | 2,183,192.50 | 2,792,763.75 | 3,155,922.75 | 4,188,714.25 | 5,294,733.50 | ... | 19,726,484.00 | 19,798,286.00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10 | 0.00 | 62,438.00 | 642,477.19 | 1,220,443.88 | 1,512,127.75 | 2,125,860.75 | 3,205,997.25 | 5,542,220.00 | 6,233,909.50 | 7,278,581.00 | ... | 23,628,224.00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
11 | 6,573.56 | 233,421.70 | 858,658.38 | 1,487,226.75 | 2,326,138.75 | 4,001,959.50 | 4,689,719.00 | 5,703,576.50 | 7,421,133.00 | 8,338,611.50 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
12 | 0.00 | 337,785.12 | 602,649.94 | 947,861.81 | 1,960,324.38 | 2,438,331.25 | 3,263,283.25 | 4,443,729.00 | 4,600,126.00 | 4,950,643.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
13 | 0.00 | 48,432.94 | 257,028.44 | 713,263.12 | 1,489,202.25 | 2,399,702.50 | 3,479,596.75 | 4,570,094.00 | 5,228,066.00 | 8,120,785.50 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
14 | 18,463.75 | 352,189.09 | 1,015,863.62 | 1,451,495.75 | 2,553,104.25 | 3,266,427.50 | 5,020,334.00 | 6,462,674.00 | 8,515,544.00 | 9,761,134.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
15 | 0.00 | 64,434.53 | 272,821.25 | 689,155.88 | 1,832,749.12 | 3,660,129.75 | 4,680,136.50 | 7,358,847.00 | 9,028,388.00 | 10,283,629.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
16 | 172,118.23 | 276,003.81 | 1,468,529.00 | 2,473,900.50 | 3,320,296.25 | 4,172,269.25 | 6,421,418.50 | 7,663,898.50 | 9,201,556.00 | 9,852,889.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
17 | 0.00 | 115,323.28 | 533,563.88 | 1,628,512.75 | 3,782,490.00 | 5,496,876.00 | 7,221,503.00 | 8,019,986.00 | 8,644,119.00 | 9,252,501.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
18 | 0.00 | 113,693.24 | 534,180.81 | 1,126,199.38 | 2,086,124.12 | 2,636,258.50 | 3,415,784.25 | 4,550,892.50 | 6,384,181.50 | 7,276,585.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
19 | 0.00 | 65,176.80 | 593,353.38 | 1,377,092.75 | 2,679,311.25 | 4,561,606.50 | 5,190,826.50 | 6,896,302.50 | 7,889,414.50 | 10,040,150.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
20 | 3,269.31 | 303,522.59 | 789,422.50 | 1,556,370.50 | 2,431,901.50 | 4,251,400.00 | 6,718,960.50 | 9,005,605.00 | 10,659,464.00 | 12,031,561.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
21 | 0.00 | 99,668.09 | 1,164,366.00 | 1,829,524.38 | 3,262,251.50 | 4,136,417.75 | 5,829,693.00 | 7,624,765.00 | 9,753,255.00 | 11,093,190.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
22 | 0.00 | 73,295.09 | 370,612.00 | 1,025,605.56 | 2,284,124.00 | 3,889,548.75 | 5,347,188.00 | 6,029,389.00 | 7,281,124.50 | 8,422,002.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
23 | 738,959.81 | 944,850.44 | 1,558,499.88 | 2,354,515.50 | 4,080,220.25 | 6,339,416.00 | 7,833,012.00 | 9,289,169.00 | 11,375,192.00 | 12,184,491.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
24 | 32,878.80 | 139,590.50 | 722,637.12 | 2,000,021.75 | 3,699,119.00 | 4,275,938.50 | 5,317,856.00 | 7,233,627.50 | 8,291,626.50 | 8,943,028.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
25 | 0.00 | 67,300.14 | 794,740.44 | 1,203,488.25 | 1,704,533.00 | 2,902,497.25 | 3,623,180.75 | 4,985,555.50 | 6,397,038.50 | 7,434,320.50 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
26 | 0.00 | 222,579.27 | 724,894.44 | 1,461,998.50 | 2,063,614.62 | 3,727,648.00 | 6,192,158.00 | 7,523,790.50 | 8,246,265.50 | 10,140,250.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
27 | 0.00 | 21,265.95 | 431,409.00 | 1,352,109.50 | 2,414,385.00 | 3,756,162.50 | 4,874,574.00 | 8,397,044.00 | 10,588,447.00 | 11,796,448.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
28 | 0.00 | 281,462.16 | 1,313,574.88 | 2,223,132.25 | 3,606,104.75 | 5,205,782.00 | 8,583,559.00 | 9,386,136.00 | 12,726,494.00 | 14,633,823.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
29 | 109,492.41 | 749,180.50 | 2,196,492.25 | 3,589,296.25 | 5,510,728.00 | 7,582,901.50 | 9,060,910.00 | 12,036,241.00 | 14,089,931.00 | 16,412,483.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
30 | 0.00 | 152,130.56 | 1,171,098.38 | 2,392,989.25 | 4,038,955.50 | 5,796,675.50 | 6,487,593.00 | 7,935,940.00 | 9,448,145.00 | 12,658,112.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
31 | 0.00 | 102,149.25 | 587,694.50 | 1,739,223.50 | 3,054,168.25 | 4,391,554.00 | 6,488,115.50 | 8,532,481.00 | 9,589,075.00 | 13,596,397.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
32 | 0.00 | 137,841.00 | 841,636.31 | 1,810,914.25 | 3,203,533.50 | 4,339,657.00 | 7,193,541.50 | 8,792,350.00 | 11,048,168.00 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
33 | 0.00 | 134,954.59 | 1,254,893.38 | 2,674,034.25 | 4,403,876.00 | 5,591,899.50 | 6,810,711.50 | 8,387,086.00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
34 | 38,893.38 | 652,741.81 | 1,456,564.38 | 2,050,356.00 | 3,044,004.00 | 4,796,483.00 | 7,570,764.00 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
35 | 0.00 | 167,482.09 | 926,322.88 | 2,872,798.50 | 5,040,902.00 | 6,421,425.50 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
36 | 21,154.29 | 320,832.66 | 1,568,433.12 | 2,393,324.00 | 4,121,508.50 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37 | 83,818.48 | 819,248.56 | 1,881,916.25 | 3,489,761.75 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
38 | 28,038.49 | 216,292.89 | 1,041,738.56 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
39 | 0.00 | 237,799.91 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
40 | 0.00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
40 rows × 40 columns
(triangle_train= "development_period", columns = "occurrence_period", values = "payment_size_cumulative")
.pivot(index =True)
.plot(logy
)="lower center", bbox_to_anchor=(0.5, -0.8), ncol=5)
plt.legend(loc"Training Set, Cumulative Claims by Development Period, Log Scale") plt.title(
Text(0.5, 1.0, 'Training Set, Cumulative Claims by Development Period, Log Scale')
### Get the diagonals in the cumulative IBNR triangle, which represents cumulative payments for a particular
### occurrence period
= triangle['payment_size_cumulative'][triangle['payment_period'] == cutoff].reset_index(drop = True)
triangle_diagonal_interim = pd.DataFrame(triangle_diagonal_interim).rename(columns = {'payment_size_cumulative': 'diagonal'})
triangle_diagonal_interim = triangle_diagonal_interim.iloc[::-1].reset_index(drop = True)
triangle_diagonal
### Sum cumulative payments by development period - to be used to calculate CDFs later
= triangle_train.groupby(by = 'development_period').sum()
development_period_sum_interim = development_period_sum_interim['payment_size_cumulative'].reset_index(drop = True)
development_period_sum = pd.DataFrame(development_period_sum).rename(columns = {'payment_size_cumulative': 'dev_period_sum'})
development_period_sum
# Merge two dataframes
= pd.concat([triangle_diagonal, development_period_sum], axis = 1)
df_cdf
## dev_period_sum_alt column is to ensure the claims for two consecutive periods have the same number
## of levels/elements (and division of these two claims columns give the CDF)
'dev_period_sum_alt'] = df_cdf['dev_period_sum'] - df_cdf['diagonal']
df_cdf[
### Calculate Development Factors
'dev_period_sum_shift'] = df_cdf['dev_period_sum'].shift(-1)
df_cdf['development_factor'] = df_cdf['dev_period_sum_shift'] / df_cdf['dev_period_sum_alt']
df_cdf['factor_to_ultimate'] = df_cdf.sort_index(ascending=False).fillna(1.0).development_factor.cumprod()
df_cdf['percentage_of_ultimate'] = 1 / df_cdf['factor_to_ultimate']
df_cdf['incr_perc_of_ultimate'] = df_cdf['percentage_of_ultimate'].diff()
df_cdf[ df_cdf
diagonal | dev_period_sum | dev_period_sum_alt | dev_period_sum_shift | development_factor | factor_to_ultimate | percentage_of_ultimate | incr_perc_of_ultimate | |
---|---|---|---|---|---|---|---|---|
0 | 0.00 | 1,297,052.88 | 1,297,052.88 | 8,141,409.00 | 6.28 | 923.86 | 0.00 | NaN |
1 | 237,799.91 | 8,141,409.00 | 7,903,609.00 | 30,276,714.00 | 3.83 | 147.19 | 0.01 | 0.01 |
2 | 1,041,738.56 | 30,276,714.00 | 29,234,976.00 | 57,907,192.00 | 1.98 | 38.42 | 0.03 | 0.02 |
3 | 3,489,761.75 | 57,907,192.00 | 54,417,432.00 | 93,450,776.00 | 1.72 | 19.40 | 0.05 | 0.03 |
4 | 4,121,508.50 | 93,450,776.00 | 89,329,264.00 | 132,863,912.00 | 1.49 | 11.30 | 0.09 | 0.04 |
5 | 6,421,425.50 | 132,863,912.00 | 126,442,488.00 | 172,164,592.00 | 1.36 | 7.59 | 0.13 | 0.04 |
6 | 7,570,764.00 | 172,164,592.00 | 164,593,824.00 | 210,850,864.00 | 1.28 | 5.58 | 0.18 | 0.05 |
7 | 8,387,086.00 | 210,850,864.00 | 202,463,776.00 | 245,069,168.00 | 1.21 | 4.35 | 0.23 | 0.05 |
8 | 11,048,168.00 | 245,069,168.00 | 234,020,992.00 | 273,903,680.00 | 1.17 | 3.60 | 0.28 | 0.05 |
9 | 13,596,397.00 | 273,903,680.00 | 260,307,280.00 | 298,338,624.00 | 1.15 | 3.07 | 0.33 | 0.05 |
10 | 13,984,095.00 | 298,338,624.00 | 284,354,528.00 | 311,596,960.00 | 1.10 | 2.68 | 0.37 | 0.05 |
11 | 18,247,140.00 | 311,596,960.00 | 293,349,824.00 | 324,199,360.00 | 1.11 | 2.45 | 0.41 | 0.04 |
12 | 20,606,538.00 | 324,199,360.00 | 303,592,832.00 | 331,724,640.00 | 1.09 | 2.21 | 0.45 | 0.04 |
13 | 19,192,766.00 | 331,724,640.00 | 312,531,872.00 | 335,702,112.00 | 1.07 | 2.03 | 0.49 | 0.04 |
14 | 14,057,512.00 | 335,702,112.00 | 321,644,608.00 | 340,176,128.00 | 1.06 | 1.89 | 0.53 | 0.04 |
15 | 13,656,915.00 | 340,176,128.00 | 326,519,200.00 | 348,353,504.00 | 1.07 | 1.78 | 0.56 | 0.03 |
16 | 18,047,780.00 | 348,353,504.00 | 330,305,728.00 | 348,470,784.00 | 1.05 | 1.67 | 0.60 | 0.04 |
17 | 20,920,498.00 | 348,470,784.00 | 327,550,272.00 | 350,782,240.00 | 1.07 | 1.58 | 0.63 | 0.03 |
18 | 18,409,002.00 | 350,782,240.00 | 332,373,248.00 | 350,120,704.00 | 1.05 | 1.48 | 0.68 | 0.04 |
19 | 23,882,838.00 | 350,120,704.00 | 326,237,856.00 | 338,951,584.00 | 1.04 | 1.40 | 0.71 | 0.04 |
20 | 23,967,498.00 | 338,951,584.00 | 314,984,096.00 | 322,580,384.00 | 1.02 | 1.35 | 0.74 | 0.03 |
21 | 20,525,112.00 | 322,580,384.00 | 302,055,264.00 | 309,189,504.00 | 1.02 | 1.32 | 0.76 | 0.02 |
22 | 18,259,918.00 | 309,189,504.00 | 290,929,600.00 | 298,672,864.00 | 1.03 | 1.29 | 0.78 | 0.02 |
23 | 20,473,130.00 | 298,672,864.00 | 278,199,744.00 | 285,398,368.00 | 1.03 | 1.26 | 0.80 | 0.02 |
24 | 19,147,944.00 | 285,398,368.00 | 266,250,432.00 | 273,799,296.00 | 1.03 | 1.22 | 0.82 | 0.02 |
25 | 25,963,162.00 | 273,799,296.00 | 247,836,128.00 | 253,648,528.00 | 1.02 | 1.19 | 0.84 | 0.02 |
26 | 17,040,986.00 | 253,648,528.00 | 236,607,536.00 | 243,831,648.00 | 1.03 | 1.16 | 0.86 | 0.02 |
27 | 24,036,898.00 | 243,831,648.00 | 219,794,752.00 | 222,414,336.00 | 1.01 | 1.13 | 0.89 | 0.03 |
28 | 15,848,811.00 | 222,414,336.00 | 206,565,520.00 | 212,487,168.00 | 1.03 | 1.12 | 0.90 | 0.01 |
29 | 24,596,934.00 | 212,487,168.00 | 187,890,240.00 | 191,449,696.00 | 1.02 | 1.08 | 0.92 | 0.03 |
30 | 23,628,224.00 | 191,449,696.00 | 167,821,472.00 | 169,830,640.00 | 1.01 | 1.06 | 0.94 | 0.02 |
31 | 19,798,286.00 | 169,830,640.00 | 150,032,352.00 | 151,600,704.00 | 1.01 | 1.05 | 0.95 | 0.01 |
32 | 27,839,500.00 | 151,600,704.00 | 123,761,200.00 | 125,163,056.00 | 1.01 | 1.04 | 0.96 | 0.01 |
33 | 18,526,526.00 | 125,163,056.00 | 106,636,528.00 | 106,959,200.00 | 1.00 | 1.03 | 0.97 | 0.01 |
34 | 21,611,440.00 | 106,959,200.00 | 85,347,760.00 | 85,733,776.00 | 1.00 | 1.03 | 0.97 | 0.00 |
35 | 19,235,524.00 | 85,733,776.00 | 66,498,252.00 | 67,122,320.00 | 1.01 | 1.02 | 0.98 | 0.00 |
36 | 18,155,534.00 | 67,122,320.00 | 48,966,784.00 | 49,027,344.00 | 1.00 | 1.01 | 0.99 | 0.01 |
37 | 16,108,434.00 | 49,027,344.00 | 32,918,910.00 | 33,280,996.00 | 1.01 | 1.01 | 0.99 | 0.00 |
38 | 15,795,103.00 | 33,280,996.00 | 17,485,892.00 | 17,485,892.00 | 1.00 | 1.00 | 1.00 | 0.01 |
39 | 17,485,892.00 | 17,485,892.00 | 0.00 | NaN | NaN | 1.00 | 1.00 | 0.00 |
Apply dev factors with a loop for simplicity. This can also be done with joins but coding it gets complicated.
= triangle.sort_values(["occurrence_period", "development_period"]).copy()
triangle_cl for i in range(0, triangle_cl.shape[0]):
if triangle_cl.loc[i, "payment_period"] > cutoff:
= triangle_cl.loc[i, "development_period"]
d if d <= df_cdf.index.max():
= df_cdf.loc[d-1, "development_factor"]
dev_factor else:
= 1
dev_factor
"payment_size_cumulative"] = (
triangle_cl.loc[i, - 1, "payment_size_cumulative"] *
triangle_cl.loc[i
dev_factor )
= "occurrence_period", columns = "development_period", values = "payment_size_cumulative") triangle_cl.pivot(index
development_period | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
occurrence_period | |||||||||||||||||||||
1 | 0.00 | 31,551.25 | 152,793.12 | 680,415.06 | 1,650,518.75 | 1,932,605.62 | 2,602,592.25 | 3,222,089.50 | 3,775,028.25 | 4,788,738.50 | ... | 15,739,832.00 | 16,082,119.00 | 16,780,246.00 | 17,400,658.00 | 17,422,826.00 | 17,422,826.00 | 17,485,892.00 | 17,485,892.00 | 17,485,892.00 | 17,485,892.00 |
2 | 10,286.75 | 49,493.38 | 132,020.67 | 430,670.75 | 1,225,081.75 | 2,044,801.12 | 2,433,025.00 | 2,808,325.75 | 3,586,153.00 | 4,218,801.50 | ... | 14,386,516.00 | 14,880,894.00 | 15,342,595.00 | 15,378,178.00 | 15,378,178.00 | 15,409,743.00 | 15,409,743.00 | 15,433,019.00 | 15,795,103.00 | 15,795,103.00 |
3 | 0.00 | 62,389.38 | 640,980.81 | 987,640.00 | 1,536,782.25 | 2,385,404.00 | 3,177,442.25 | 3,698,036.50 | 3,918,063.50 | 4,333,062.00 | ... | 15,336,449.00 | 15,437,039.00 | 15,646,281.00 | 15,867,258.00 | 16,067,564.00 | 16,067,564.00 | 16,071,147.00 | 16,108,434.00 | 16,285,616.00 | 16,285,616.00 |
4 | 5,646.32 | 63,203.44 | 245,984.03 | 584,083.62 | 1,030,191.31 | 2,371,411.75 | 3,179,435.75 | 4,964,554.00 | 6,244,163.00 | 6,612,675.00 | ... | 17,058,208.00 | 17,107,564.00 | 17,202,888.00 | 17,256,650.00 | 17,256,650.00 | 17,598,124.00 | 18,155,534.00 | 18,177,988.00 | 18,377,934.00 | 18,377,934.00 |
5 | 0.00 | 22,911.55 | 177,779.52 | 431,647.50 | 872,805.94 | 1,619,887.00 | 2,862,273.00 | 3,462,464.75 | 4,956,769.50 | 6,237,362.50 | ... | 19,066,434.00 | 19,121,178.00 | 19,157,450.00 | 19,222,538.00 | 19,222,538.00 | 19,235,524.00 | 19,416,044.00 | 19,440,058.00 | 19,653,886.00 | 19,653,886.00 |
6 | 2,259.24 | 62,889.91 | 357,107.84 | 813,191.12 | 1,218,815.75 | 2,030,866.88 | 3,065,646.00 | 4,652,615.50 | 5,097,719.50 | 6,097,211.00 | ... | 21,011,932.00 | 21,511,248.00 | 21,511,248.00 | 21,511,248.00 | 21,611,440.00 | 21,709,186.00 | 21,912,922.00 | 21,940,024.00 | 22,181,350.00 | 22,181,350.00 |
7 | 25,200.05 | 212,996.06 | 303,872.44 | 522,915.66 | 1,180,820.88 | 3,314,907.25 | 3,931,794.25 | 4,110,920.00 | 5,112,089.00 | 5,522,927.00 | ... | 17,723,798.00 | 18,120,490.00 | 18,120,490.00 | 18,526,526.00 | 18,582,586.00 | 18,666,634.00 | 18,841,816.00 | 18,865,120.00 | 19,072,624.00 | 19,072,624.00 |
8 | 0.00 | 101,324.27 | 482,201.94 | 1,302,672.88 | 1,762,479.38 | 2,816,009.75 | 3,599,199.00 | 4,404,539.50 | 5,549,806.50 | 6,296,549.50 | ... | 27,771,816.00 | 27,771,816.00 | 27,839,500.00 | 28,154,842.00 | 28,240,036.00 | 28,367,764.00 | 28,633,988.00 | 28,669,402.00 | 28,984,746.00 | 28,984,746.00 |
9 | 0.00 | 21,563.11 | 207,902.11 | 1,019,339.94 | 1,467,485.12 | 2,183,192.50 | 2,792,763.75 | 3,155,922.75 | 4,188,714.25 | 5,294,733.50 | ... | 19,726,484.00 | 19,798,286.00 | 20,005,246.00 | 20,231,848.00 | 20,293,068.00 | 20,384,852.00 | 20,576,158.00 | 20,601,606.00 | 20,828,210.00 | 20,828,210.00 |
10 | 0.00 | 62,438.00 | 642,477.19 | 1,220,443.88 | 1,512,127.75 | 2,125,860.75 | 3,205,997.25 | 5,542,220.00 | 6,233,909.50 | 7,278,581.00 | ... | 23,628,224.00 | 23,911,102.00 | 24,161,056.00 | 24,434,732.00 | 24,508,668.00 | 24,619,518.00 | 24,850,566.00 | 24,881,302.00 | 25,154,980.00 | 25,154,980.00 |
11 | 6,573.56 | 233,421.70 | 858,658.38 | 1,487,226.75 | 2,326,138.75 | 4,001,959.50 | 4,689,719.00 | 5,703,576.50 | 7,421,133.00 | 8,338,611.50 | ... | 25,062,908.00 | 25,362,962.00 | 25,628,092.00 | 25,918,386.00 | 25,996,812.00 | 26,114,394.00 | 26,359,472.00 | 26,392,074.00 | 26,682,368.00 | 26,682,368.00 |
12 | 0.00 | 337,785.12 | 602,649.94 | 947,861.81 | 1,960,324.38 | 2,438,331.25 | 3,263,283.25 | 4,443,729.00 | 4,600,126.00 | 4,950,643.00 | ... | 16,612,005.00 | 16,810,886.00 | 16,986,618.00 | 17,179,028.00 | 17,231,010.00 | 17,308,944.00 | 17,471,384.00 | 17,492,992.00 | 17,685,404.00 | 17,685,404.00 |
13 | 0.00 | 48,432.94 | 257,028.44 | 713,263.12 | 1,489,202.25 | 2,399,702.50 | 3,479,596.75 | 4,570,094.00 | 5,228,066.00 | 8,120,785.50 | ... | 25,494,662.00 | 25,799,886.00 | 26,069,584.00 | 26,364,878.00 | 26,444,656.00 | 26,564,264.00 | 26,813,564.00 | 26,846,726.00 | 27,142,022.00 | 27,142,022.00 |
14 | 18,463.75 | 352,189.09 | 1,015,863.62 | 1,451,495.75 | 2,553,104.25 | 3,266,427.50 | 5,020,334.00 | 6,462,674.00 | 8,515,544.00 | 9,761,134.00 | ... | 18,626,320.00 | 18,849,316.00 | 19,046,356.00 | 19,262,096.00 | 19,320,380.00 | 19,407,764.00 | 19,589,902.00 | 19,614,130.00 | 19,829,872.00 | 19,829,872.00 |
15 | 0.00 | 64,434.53 | 272,821.25 | 689,155.88 | 1,832,749.12 | 3,660,129.75 | 4,680,136.50 | 7,358,847.00 | 9,028,388.00 | 10,283,629.00 | ... | 29,044,080.00 | 29,391,798.00 | 29,699,044.00 | 30,035,450.00 | 30,126,334.00 | 30,262,592.00 | 30,546,598.00 | 30,584,378.00 | 30,920,786.00 | 30,920,786.00 |
16 | 172,118.23 | 276,003.81 | 1,468,529.00 | 2,473,900.50 | 3,320,296.25 | 4,172,269.25 | 6,421,418.50 | 7,663,898.50 | 9,201,556.00 | 9,852,889.00 | ... | 22,027,448.00 | 22,291,162.00 | 22,524,182.00 | 22,779,316.00 | 22,848,244.00 | 22,951,584.00 | 23,166,978.00 | 23,195,630.00 | 23,450,766.00 | 23,450,766.00 |
17 | 0.00 | 115,323.28 | 533,563.88 | 1,628,512.75 | 3,782,490.00 | 5,496,876.00 | 7,221,503.00 | 8,019,986.00 | 8,644,119.00 | 9,252,501.00 | ... | 24,161,340.00 | 24,450,602.00 | 24,706,196.00 | 24,986,046.00 | 25,061,652.00 | 25,175,004.00 | 25,411,266.00 | 25,442,694.00 | 25,722,546.00 | 25,722,546.00 |
18 | 0.00 | 113,693.24 | 534,180.81 | 1,126,199.38 | 2,086,124.12 | 2,636,258.50 | 3,415,784.25 | 4,550,892.50 | 6,384,181.50 | 7,276,585.00 | ... | 22,122,974.00 | 22,387,832.00 | 22,621,862.00 | 22,878,104.00 | 22,947,330.00 | 23,051,118.00 | 23,267,448.00 | 23,296,226.00 | 23,552,468.00 | 23,552,468.00 |
19 | 0.00 | 65,176.80 | 593,353.38 | 1,377,092.75 | 2,679,311.25 | 4,561,606.50 | 5,190,826.50 | 6,896,302.50 | 7,889,414.50 | 10,040,150.00 | ... | 25,454,732.00 | 25,759,478.00 | 26,028,754.00 | 26,323,586.00 | 26,403,238.00 | 26,522,658.00 | 26,771,566.00 | 26,804,676.00 | 27,099,510.00 | 27,099,510.00 |
20 | 3,269.31 | 303,522.59 | 789,422.50 | 1,556,370.50 | 2,431,901.50 | 4,251,400.00 | 6,718,960.50 | 9,005,605.00 | 10,659,464.00 | 12,031,561.00 | ... | 30,440,728.00 | 30,805,166.00 | 31,127,186.00 | 31,479,768.00 | 31,575,022.00 | 31,717,834.00 | 32,015,498.00 | 32,055,094.00 | 32,407,678.00 | 32,407,678.00 |
21 | 0.00 | 99,668.09 | 1,164,366.00 | 1,829,524.38 | 3,262,251.50 | 4,136,417.75 | 5,829,693.00 | 7,624,765.00 | 9,753,255.00 | 11,093,190.00 | ... | 31,515,310.00 | 31,892,614.00 | 32,226,002.00 | 32,591,030.00 | 32,689,646.00 | 32,837,498.00 | 33,145,670.00 | 33,186,664.00 | 33,551,694.00 | 33,551,694.00 |
22 | 0.00 | 73,295.09 | 370,612.00 | 1,025,605.56 | 2,284,124.00 | 3,889,548.75 | 5,347,188.00 | 6,029,389.00 | 7,281,124.50 | 8,422,002.00 | ... | 25,589,250.00 | 25,895,606.00 | 26,166,304.00 | 26,462,694.00 | 26,542,768.00 | 26,662,818.00 | 26,913,042.00 | 26,946,328.00 | 27,242,720.00 | 27,242,720.00 |
23 | 738,959.81 | 944,850.44 | 1,558,499.88 | 2,354,515.50 | 4,080,220.25 | 6,339,416.00 | 7,833,012.00 | 9,289,169.00 | 11,375,192.00 | 12,184,491.00 | ... | 31,142,896.00 | 31,515,740.00 | 31,845,188.00 | 32,205,904.00 | 32,303,356.00 | 32,449,462.00 | 32,753,992.00 | 32,794,502.00 | 33,155,220.00 | 33,155,220.00 |
24 | 32,878.80 | 139,590.50 | 722,637.12 | 2,000,021.75 | 3,699,119.00 | 4,275,938.50 | 5,317,856.00 | 7,233,627.50 | 8,291,626.50 | 8,943,028.00 | ... | 28,343,990.00 | 28,683,326.00 | 28,983,166.00 | 29,311,462.00 | 29,400,156.00 | 29,533,130.00 | 29,810,292.00 | 29,847,162.00 | 30,175,460.00 | 30,175,460.00 |
25 | 0.00 | 67,300.14 | 794,740.44 | 1,203,488.25 | 1,704,533.00 | 2,902,497.25 | 3,623,180.75 | 4,985,555.50 | 6,397,038.50 | 7,434,320.50 | ... | 22,882,388.00 | 23,156,338.00 | 23,398,402.00 | 23,663,440.00 | 23,735,042.00 | 23,842,394.00 | 24,066,148.00 | 24,095,912.00 | 24,360,950.00 | 24,360,950.00 |
26 | 0.00 | 222,579.27 | 724,894.44 | 1,461,998.50 | 2,063,614.62 | 3,727,648.00 | 6,192,158.00 | 7,523,790.50 | 8,246,265.50 | 10,140,250.00 | ... | 24,910,628.00 | 25,208,860.00 | 25,472,380.00 | 25,760,910.00 | 25,838,860.00 | 25,955,728.00 | 26,199,316.00 | 26,231,720.00 | 26,520,252.00 | 26,520,252.00 |
27 | 0.00 | 21,265.95 | 431,409.00 | 1,352,109.50 | 2,414,385.00 | 3,756,162.50 | 4,874,574.00 | 8,397,044.00 | 10,588,447.00 | 11,796,448.00 | ... | 36,532,008.00 | 36,969,372.00 | 37,355,828.00 | 37,778,964.00 | 37,893,280.00 | 38,064,668.00 | 38,421,896.00 | 38,469,416.00 | 38,892,552.00 | 38,892,552.00 |
28 | 0.00 | 281,462.16 | 1,313,574.88 | 2,223,132.25 | 3,606,104.75 | 5,205,782.00 | 8,583,559.00 | 9,386,136.00 | 12,726,494.00 | 14,633,823.00 | ... | 42,857,536.00 | 43,370,628.00 | 43,824,000.00 | 44,320,400.00 | 44,454,508.00 | 44,655,572.00 | 45,074,652.00 | 45,130,400.00 | 45,626,804.00 | 45,626,804.00 |
29 | 109,492.41 | 749,180.50 | 2,196,492.25 | 3,589,296.25 | 5,510,728.00 | 7,582,901.50 | 9,060,910.00 | 12,036,241.00 | 14,089,931.00 | 16,412,483.00 | ... | 41,941,440.00 | 42,443,564.00 | 42,887,248.00 | 43,373,040.00 | 43,504,284.00 | 43,701,052.00 | 44,111,176.00 | 44,165,732.00 | 44,651,524.00 | 44,651,524.00 |
30 | 0.00 | 152,130.56 | 1,171,098.38 | 2,392,989.25 | 4,038,955.50 | 5,796,675.50 | 6,487,593.00 | 7,935,940.00 | 9,448,145.00 | 12,658,112.00 | ... | 35,222,156.00 | 35,643,840.00 | 36,016,440.00 | 36,424,404.00 | 36,534,620.00 | 36,699,864.00 | 37,044,284.00 | 37,090,100.00 | 37,498,064.00 | 37,498,064.00 |
31 | 0.00 | 102,149.25 | 587,694.50 | 1,739,223.50 | 3,054,168.25 | 4,391,554.00 | 6,488,115.50 | 8,532,481.00 | 9,589,075.00 | 13,596,397.00 | ... | 39,249,004.00 | 39,718,896.00 | 40,134,096.00 | 40,588,700.00 | 40,711,516.00 | 40,895,652.00 | 41,279,448.00 | 41,330,504.00 | 41,785,112.00 | 41,785,112.00 |
32 | 0.00 | 137,841.00 | 841,636.31 | 1,810,914.25 | 3,203,533.50 | 4,339,657.00 | 7,193,541.50 | 8,792,350.00 | 11,048,168.00 | 12,931,036.00 | ... | 37,328,284.00 | 37,775,180.00 | 38,170,060.00 | 38,602,416.00 | 38,719,224.00 | 38,894,348.00 | 39,259,360.00 | 39,307,916.00 | 39,740,276.00 | 39,740,276.00 |
33 | 0.00 | 134,954.59 | 1,254,893.38 | 2,674,034.25 | 4,403,876.00 | 5,591,899.50 | 6,810,711.50 | 8,387,086.00 | 10,152,020.00 | 11,882,163.00 | ... | 34,300,484.00 | 34,711,132.00 | 35,073,984.00 | 35,471,272.00 | 35,578,604.00 | 35,739,524.00 | 36,074,932.00 | 36,119,548.00 | 36,516,840.00 | 36,516,840.00 |
34 | 38,893.38 | 652,741.81 | 1,456,564.38 | 2,050,356.00 | 3,044,004.00 | 4,796,483.00 | 7,570,764.00 | 9,698,433.00 | 11,739,320.00 | 13,739,976.00 | ... | 39,663,468.00 | 40,138,320.00 | 40,557,904.00 | 41,017,308.00 | 41,141,420.00 | 41,327,500.00 | 41,715,348.00 | 41,766,940.00 | 42,226,348.00 | 42,226,348.00 |
35 | 0.00 | 167,482.09 | 926,322.88 | 2,872,798.50 | 5,040,902.00 | 6,421,425.50 | 8,743,438.00 | 11,200,673.00 | 13,557,683.00 | 15,868,231.00 | ... | 45,807,152.00 | 46,355,560.00 | 46,840,136.00 | 47,370,700.00 | 47,514,040.00 | 47,728,944.00 | 48,176,868.00 | 48,236,452.00 | 48,767,020.00 | 48,767,020.00 |
36 | 21,154.29 | 320,832.66 | 1,568,433.12 | 2,393,324.00 | 4,121,508.50 | 6,130,127.00 | 8,346,805.50 | 10,692,571.00 | 12,942,659.00 | 15,148,392.00 | ... | 43,729,172.00 | 44,252,700.00 | 44,715,292.00 | 45,221,788.00 | 45,358,624.00 | 45,563,776.00 | 45,991,380.00 | 46,048,260.00 | 46,554,760.00 | 46,554,760.00 |
37 | 83,818.48 | 819,248.56 | 1,881,916.25 | 3,489,761.75 | 5,992,949.50 | 8,913,616.00 | 12,136,815.00 | 15,547,716.00 | 18,819,494.00 | 22,026,778.00 | ... | 63,585,152.00 | 64,346,396.00 | 65,019,040.00 | 65,755,520.00 | 65,954,488.00 | 66,252,796.00 | 66,874,564.00 | 66,957,276.00 | 67,693,760.00 | 67,693,760.00 |
38 | 28,038.49 | 216,292.89 | 1,041,738.56 | 2,063,424.12 | 3,543,507.50 | 5,270,437.00 | 7,176,248.00 | 9,193,043.00 | 11,127,578.00 | 13,023,979.00 | ... | 37,596,596.00 | 38,046,704.00 | 38,444,424.00 | 38,879,888.00 | 38,997,536.00 | 39,173,920.00 | 39,541,556.00 | 39,590,460.00 | 40,025,928.00 | 40,025,928.00 |
39 | 0.00 | 237,799.91 | 910,950.94 | 1,804,366.50 | 3,098,629.25 | 4,608,747.50 | 6,275,289.00 | 8,038,881.00 | 9,730,540.00 | 11,388,853.00 | ... | 32,876,436.00 | 33,270,034.00 | 33,617,820.00 | 33,998,616.00 | 34,101,492.00 | 34,255,732.00 | 34,577,212.00 | 34,619,976.00 | 35,000,772.00 | 35,000,772.00 |
40 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
40 rows × 40 columns
(triangle_cllambda df: df.occurrence_period != cutoff] # Exclude occurrence period == 40 since there's no data there at cutoff
.loc[= "development_period", columns = "occurrence_period", values = "payment_size_cumulative")
.pivot(index =True)
.plot(logy
)="lower center", bbox_to_anchor=(0.5, -0.8), ncol=5)
plt.legend(loc"Chain Ladder Prediction of Training Set, Cumulative Claims by Development Period, Log Scale") plt.title(
Text(0.5, 1.0, 'Chain Ladder Prediction of Training Set, Cumulative Claims by Development Period, Log Scale')
# MSE / 10^12
# Exclude occurrence period == 40 since there's no data there at cutoff
sum((
np.lambda df: df.occurrence_period != cutoff].payment_size_cumulative.values -
triangle_cl.loc[lambda df: df.occurrence_period != cutoff].payment_size_cumulative.values
triangle.loc[**2)/10**12 )
27656.166297305088
Chain Ladder, GLM, Aggregated Data
Chain ladders can be replicated with a GLM as discussed in our earlier article. To do this use:
- Quasipoisson distribution was used in the article (here just Poisson)
- Log Link
- Incremental ~ 0 + accident_period_factor + development_period_factor. I.e. No intercept. One factor per accident and development period.
Here we will use Torch, a package for neural networks, initially to do a GLM. Idea being that if it works, one could then test deep learning components to it. * Poisson loss * Log Link * Incremental ~ accident_period_factor + development_period_factor, but with intercept (seems to converge better this way)
Doing this at the claim level should give similar results just divided by number of claims.
class LogLinkGLM(nn.Module):
# Define the parameters in __init__
def __init__(
self,
# number of inputs
n_input, # number of outputs
n_output, # init mean value to speed up convergence
init_bias, =True, # use bias (intercept) or not
has_bias**kwargs, # Ignored; Not used
):super(LogLinkGLM, self).__init__()
self.linear = torch.nn.Linear(n_input, n_output, has_bias) # Linear coefficients
self.linear.weight) # Initialise to zero
nn.init.zeros_(if has_bias:
self.linear.bias.data = torch.tensor(init_bias)
# The forward functions defines how you get y from X.
def forward(self, x):
return torch.exp(self.linear(x)) # log(Y) = XB -> Y = exp(XB)
# Small dataset, so we can do a high number of iterations to converge to the mechanical chain ladder figures more precisely
= Pipeline(
GLM_CL_agg =[
steps"keep", ColumnKeeper(["occurrence_period", "development_period"])),
('one_hot', OneHotEncoder(sparse_output=False)), # OneHot to get one factor per
("model", TabularNetRegressor(LogLinkGLM, has_bias=True, max_iter=10000, max_lr=0.10))
(
]
)
GLM_CL_agg.fit(
triangle_train,"payment_size"]]
triangle_train.loc[:, [ )
Train RMSE: 685304.6875 Train Loss: -9891084.0
Train RMSE: 484032.34375 Train Loss: -10054772.0
Train RMSE: 484031.5625 Train Loss: -10054813.0
Train RMSE: 484031.4375 Train Loss: -10054817.0
Train RMSE: 484031.34375 Train Loss: -10054817.0
Train RMSE: 484031.375 Train Loss: -10054817.0
Train RMSE: 484031.40625 Train Loss: -10054817.0
Train RMSE: 484031.375 Train Loss: -10054818.0
Train RMSE: 484031.375 Train Loss: -10054817.0
Train RMSE: 484031.375 Train Loss: -10054818.0
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('one_hot', OneHotEncoder(sparse_output=False)), ('model', TabularNetRegressor(max_iter=10000, max_lr=0.1, module=<class '__main__.LogLinkGLM'>))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('one_hot', OneHotEncoder(sparse_output=False)), ('model', TabularNetRegressor(max_iter=10000, max_lr=0.1, module=<class '__main__.LogLinkGLM'>))])
ColumnKeeper(cols=['occurrence_period', 'development_period'])
OneHotEncoder(sparse_output=False)
TabularNetRegressor(max_iter=10000, max_lr=0.1, module=<class '__main__.LogLinkGLM'>)
triangle_train
occurrence_period | development_period | payment_period | payment_size_cumulative | payment_size | |
---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0.00 | 0.00 |
1 | 1 | 1 | 2 | 31,551.25 | 31,551.25 |
2 | 1 | 2 | 3 | 152,793.12 | 121,241.88 |
3 | 1 | 3 | 4 | 680,415.06 | 527,621.94 |
4 | 1 | 4 | 5 | 1,650,518.75 | 970,103.75 |
... | ... | ... | ... | ... | ... |
1481 | 38 | 1 | 39 | 216,292.89 | 188,254.41 |
1482 | 38 | 2 | 40 | 1,041,738.56 | 825,445.62 |
1520 | 39 | 0 | 39 | 0.00 | 0.00 |
1521 | 39 | 1 | 40 | 237,799.91 | 237,799.91 |
1560 | 40 | 0 | 40 | 0.00 | 0.00 |
820 rows × 5 columns
= pd.concat(
triangle_glm_agg
[
triangle_train,= GLM_CL_agg.predict(triangle_test))
triangle_test.assign(payment_size
], ="rows"
axis=["occurrence_period", "development_period"])
).sort_values(by
"payment_size_cumulative"] = (
triangle_glm_agg["occurrence_period", "payment_size"]].groupby('occurrence_period').cumsum()
triangle_glm_agg[[
)= "occurrence_period", columns = "development_period", values = "payment_size_cumulative") triangle_glm_agg.pivot(index
development_period | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
occurrence_period | |||||||||||||||||||||
1 | 0.00 | 31,551.25 | 152,793.14 | 680,415.06 | 1,650,518.75 | 1,932,605.62 | 2,602,592.25 | 3,222,089.50 | 3,775,028.25 | 4,788,738.50 | ... | 15,739,832.00 | 16,082,119.00 | 16,780,244.00 | 17,400,658.00 | 17,422,826.00 | 17,422,826.00 | 17,485,892.00 | 17,485,892.00 | 17,485,892.00 | 17,485,892.00 |
2 | 10,286.75 | 49,493.38 | 132,020.67 | 430,670.75 | 1,225,081.75 | 2,044,801.25 | 2,433,025.00 | 2,808,325.75 | 3,586,153.00 | 4,218,801.50 | ... | 14,386,516.00 | 14,880,894.00 | 15,342,595.00 | 15,378,178.00 | 15,378,178.00 | 15,409,743.00 | 15,409,743.00 | 15,433,019.00 | 15,795,103.00 | 15,795,134.00 |
3 | 0.00 | 62,389.38 | 640,980.81 | 987,639.94 | 1,536,782.25 | 2,385,404.00 | 3,177,442.25 | 3,698,036.50 | 3,918,063.50 | 4,333,062.00 | ... | 15,336,449.00 | 15,437,039.00 | 15,646,281.00 | 15,867,258.00 | 16,067,564.00 | 16,067,564.00 | 16,071,147.00 | 16,108,434.00 | 16,285,614.00 | 16,285,647.00 |
4 | 5,646.32 | 63,203.44 | 245,984.02 | 584,083.56 | 1,030,191.25 | 2,371,411.75 | 3,179,435.75 | 4,964,554.00 | 6,244,163.00 | 6,612,675.00 | ... | 17,058,208.00 | 17,107,564.00 | 17,202,888.00 | 17,256,650.00 | 17,256,650.00 | 17,598,124.00 | 18,155,534.00 | 18,177,990.00 | 18,377,934.00 | 18,377,972.00 |
5 | 0.00 | 22,911.55 | 177,779.52 | 431,647.50 | 872,806.00 | 1,619,887.12 | 2,862,273.00 | 3,462,464.75 | 4,956,769.00 | 6,237,362.50 | ... | 19,066,432.00 | 19,121,176.00 | 19,157,450.00 | 19,222,536.00 | 19,222,536.00 | 19,235,524.00 | 19,416,042.00 | 19,440,056.00 | 19,653,882.00 | 19,653,922.00 |
6 | 2,259.24 | 62,889.90 | 357,107.84 | 813,191.12 | 1,218,815.75 | 2,030,866.75 | 3,065,646.00 | 4,652,615.50 | 5,097,719.50 | 6,097,211.00 | ... | 21,011,932.00 | 21,511,248.00 | 21,511,248.00 | 21,511,248.00 | 21,611,440.00 | 21,709,188.00 | 21,912,920.00 | 21,940,022.00 | 22,181,346.00 | 22,181,390.00 |
7 | 25,200.05 | 212,996.06 | 303,872.44 | 522,915.66 | 1,180,820.75 | 3,314,907.25 | 3,931,794.50 | 4,110,920.00 | 5,112,089.00 | 5,522,927.00 | ... | 17,723,798.00 | 18,120,490.00 | 18,120,490.00 | 18,526,526.00 | 18,582,586.00 | 18,666,634.00 | 18,841,812.00 | 18,865,116.00 | 19,072,620.00 | 19,072,656.00 |
8 | 0.00 | 101,324.27 | 482,201.94 | 1,302,672.88 | 1,762,479.38 | 2,816,009.75 | 3,599,199.00 | 4,404,539.50 | 5,549,806.50 | 6,296,549.50 | ... | 27,771,816.00 | 27,771,816.00 | 27,839,500.00 | 28,154,842.00 | 28,240,034.00 | 28,367,762.00 | 28,633,984.00 | 28,669,398.00 | 28,984,740.00 | 28,984,798.00 |
9 | 0.00 | 21,563.11 | 207,902.11 | 1,019,339.94 | 1,467,485.12 | 2,183,192.50 | 2,792,763.75 | 3,155,922.75 | 4,188,714.25 | 5,294,733.50 | ... | 19,726,482.00 | 19,798,286.00 | 20,005,244.00 | 20,231,846.00 | 20,293,066.00 | 20,384,850.00 | 20,576,154.00 | 20,601,604.00 | 20,828,206.00 | 20,828,248.00 |
10 | 0.00 | 62,438.00 | 642,477.19 | 1,220,444.00 | 1,512,127.75 | 2,125,861.00 | 3,205,997.50 | 5,542,220.00 | 6,233,909.00 | 7,278,580.50 | ... | 23,628,224.00 | 23,911,102.00 | 24,161,056.00 | 24,434,730.00 | 24,508,666.00 | 24,619,518.00 | 24,850,564.00 | 24,881,298.00 | 25,154,974.00 | 25,155,024.00 |
11 | 6,573.56 | 233,421.70 | 858,658.44 | 1,487,226.75 | 2,326,138.75 | 4,001,959.50 | 4,689,719.00 | 5,703,576.50 | 7,421,133.00 | 8,338,611.50 | ... | 25,062,908.00 | 25,362,962.00 | 25,628,092.00 | 25,918,384.00 | 25,996,808.00 | 26,114,392.00 | 26,359,466.00 | 26,392,068.00 | 26,682,360.00 | 26,682,414.00 |
12 | 0.00 | 337,785.12 | 602,649.94 | 947,861.81 | 1,960,324.25 | 2,438,331.25 | 3,263,283.25 | 4,443,729.00 | 4,600,126.00 | 4,950,643.00 | ... | 16,612,003.00 | 16,810,882.00 | 16,986,614.00 | 17,179,024.00 | 17,231,004.00 | 17,308,940.00 | 17,471,378.00 | 17,492,986.00 | 17,685,396.00 | 17,685,432.00 |
13 | 0.00 | 48,432.94 | 257,028.44 | 713,263.12 | 1,489,202.25 | 2,399,702.50 | 3,479,596.50 | 4,570,094.00 | 5,228,066.00 | 8,120,785.50 | ... | 25,494,660.00 | 25,799,882.00 | 26,069,580.00 | 26,364,872.00 | 26,444,650.00 | 26,564,258.00 | 26,813,552.00 | 26,846,716.00 | 27,142,010.00 | 27,142,064.00 |
14 | 18,463.75 | 352,189.09 | 1,015,863.62 | 1,451,495.75 | 2,553,104.50 | 3,266,427.50 | 5,020,334.50 | 6,462,674.50 | 8,515,544.00 | 9,761,135.00 | ... | 18,626,318.00 | 18,849,314.00 | 19,046,354.00 | 19,262,094.00 | 19,320,378.00 | 19,407,764.00 | 19,589,898.00 | 19,614,128.00 | 19,829,868.00 | 19,829,908.00 |
15 | 0.00 | 64,434.53 | 272,821.25 | 689,155.88 | 1,832,749.12 | 3,660,129.75 | 4,680,136.50 | 7,358,847.00 | 9,028,388.00 | 10,283,630.00 | ... | 29,044,080.00 | 29,391,798.00 | 29,699,042.00 | 30,035,448.00 | 30,126,330.00 | 30,262,590.00 | 30,546,594.00 | 30,584,374.00 | 30,920,780.00 | 30,920,842.00 |
16 | 172,118.23 | 276,003.81 | 1,468,529.00 | 2,473,900.50 | 3,320,296.00 | 4,172,269.00 | 6,421,418.50 | 7,663,898.50 | 9,201,556.00 | 9,852,889.00 | ... | 22,027,450.00 | 22,291,162.00 | 22,524,182.00 | 22,779,316.00 | 22,848,242.00 | 22,951,584.00 | 23,166,976.00 | 23,195,630.00 | 23,450,764.00 | 23,450,812.00 |
17 | 0.00 | 115,323.28 | 533,563.88 | 1,628,512.75 | 3,782,490.00 | 5,496,876.00 | 7,221,503.00 | 8,019,986.00 | 8,644,119.00 | 9,252,501.00 | ... | 24,161,342.00 | 24,450,602.00 | 24,706,194.00 | 24,986,044.00 | 25,061,650.00 | 25,175,002.00 | 25,411,260.00 | 25,442,690.00 | 25,722,540.00 | 25,722,592.00 |
18 | 0.00 | 113,693.24 | 534,180.81 | 1,126,199.38 | 2,086,124.25 | 2,636,258.50 | 3,415,784.25 | 4,550,892.50 | 6,384,181.50 | 7,276,585.00 | ... | 22,122,974.00 | 22,387,832.00 | 22,621,862.00 | 22,878,102.00 | 22,947,328.00 | 23,051,118.00 | 23,267,444.00 | 23,296,222.00 | 23,552,462.00 | 23,552,510.00 |
19 | 0.00 | 65,176.80 | 593,353.38 | 1,377,092.75 | 2,679,311.00 | 4,561,606.50 | 5,190,826.50 | 6,896,303.00 | 7,889,414.50 | 10,040,150.00 | ... | 25,454,734.00 | 25,759,478.00 | 26,028,754.00 | 26,323,584.00 | 26,403,236.00 | 26,522,656.00 | 26,771,562.00 | 26,804,674.00 | 27,099,504.00 | 27,099,558.00 |
20 | 3,269.31 | 303,522.59 | 789,422.50 | 1,556,370.62 | 2,431,901.50 | 4,251,400.00 | 6,718,960.50 | 9,005,604.00 | 10,659,464.00 | 12,031,561.00 | ... | 30,440,730.00 | 30,805,166.00 | 31,127,186.00 | 31,479,768.00 | 31,575,020.00 | 31,717,834.00 | 32,015,494.00 | 32,055,090.00 | 32,407,674.00 | 32,407,738.00 |
21 | 0.00 | 99,668.09 | 1,164,366.00 | 1,829,524.50 | 3,262,251.75 | 4,136,418.00 | 5,829,693.00 | 7,624,765.00 | 9,753,254.00 | 11,093,190.00 | ... | 31,515,314.00 | 31,892,618.00 | 32,226,004.00 | 32,591,032.00 | 32,689,648.00 | 32,837,502.00 | 33,145,670.00 | 33,186,666.00 | 33,551,694.00 | 33,551,760.00 |
22 | 0.00 | 73,295.09 | 370,612.00 | 1,025,605.62 | 2,284,124.00 | 3,889,548.50 | 5,347,188.00 | 6,029,389.00 | 7,281,125.00 | 8,422,002.00 | ... | 25,589,258.00 | 25,895,614.00 | 26,166,310.00 | 26,462,700.00 | 26,542,772.00 | 26,662,824.00 | 26,913,044.00 | 26,946,332.00 | 27,242,722.00 | 27,242,776.00 |
23 | 738,959.81 | 944,850.44 | 1,558,499.75 | 2,354,515.50 | 4,080,220.25 | 6,339,416.00 | 7,833,012.00 | 9,289,169.00 | 11,375,192.00 | 12,184,491.00 | ... | 31,142,902.00 | 31,515,746.00 | 31,845,194.00 | 32,205,908.00 | 32,303,358.00 | 32,449,466.00 | 32,753,990.00 | 32,794,502.00 | 33,155,218.00 | 33,155,284.00 |
24 | 32,878.80 | 139,590.50 | 722,637.12 | 2,000,021.75 | 3,699,119.00 | 4,275,938.00 | 5,317,856.00 | 7,233,628.00 | 8,291,626.50 | 8,943,028.00 | ... | 28,343,994.00 | 28,683,330.00 | 28,983,168.00 | 29,311,464.00 | 29,400,156.00 | 29,533,132.00 | 29,810,290.00 | 29,847,160.00 | 30,175,456.00 | 30,175,516.00 |
25 | 0.00 | 67,300.14 | 794,740.44 | 1,203,488.25 | 1,704,533.00 | 2,902,497.50 | 3,623,180.75 | 4,985,555.50 | 6,397,038.50 | 7,434,320.50 | ... | 22,882,390.00 | 23,156,338.00 | 23,398,400.00 | 23,663,438.00 | 23,735,040.00 | 23,842,392.00 | 24,066,144.00 | 24,095,910.00 | 24,360,948.00 | 24,360,996.00 |
26 | 0.00 | 222,579.27 | 724,894.44 | 1,461,998.50 | 2,063,614.62 | 3,727,648.00 | 6,192,158.00 | 7,523,791.00 | 8,246,266.00 | 10,140,250.00 | ... | 24,910,632.00 | 25,208,862.00 | 25,472,382.00 | 25,760,910.00 | 25,838,858.00 | 25,955,728.00 | 26,199,312.00 | 26,231,716.00 | 26,520,246.00 | 26,520,298.00 |
27 | 0.00 | 21,265.95 | 431,409.00 | 1,352,109.50 | 2,414,385.00 | 3,756,162.50 | 4,874,574.00 | 8,397,044.00 | 10,588,448.00 | 11,796,448.00 | ... | 36,532,016.00 | 36,969,380.00 | 37,355,836.00 | 37,778,968.00 | 37,893,284.00 | 38,064,672.00 | 38,421,896.00 | 38,469,416.00 | 38,892,552.00 | 38,892,628.00 |
28 | 0.00 | 281,462.16 | 1,313,574.88 | 2,223,132.00 | 3,606,104.50 | 5,205,782.00 | 8,583,559.00 | 9,386,136.00 | 12,726,494.00 | 14,633,823.00 | ... | 42,857,548.00 | 43,370,640.00 | 43,824,008.00 | 44,320,408.00 | 44,454,516.00 | 44,655,584.00 | 45,074,660.00 | 45,130,408.00 | 45,626,808.00 | 45,626,900.00 |
29 | 109,492.41 | 749,180.50 | 2,196,492.25 | 3,589,296.50 | 5,510,728.00 | 7,582,901.50 | 9,060,910.00 | 12,036,241.00 | 14,089,931.00 | 16,412,483.00 | ... | 41,941,452.00 | 42,443,580.00 | 42,887,260.00 | 43,373,048.00 | 43,504,288.00 | 43,701,056.00 | 44,111,176.00 | 44,165,732.00 | 44,651,524.00 | 44,651,612.00 |
30 | 0.00 | 152,130.56 | 1,171,098.38 | 2,392,989.50 | 4,038,955.50 | 5,796,676.00 | 6,487,593.00 | 7,935,940.00 | 9,448,145.00 | 12,658,112.00 | ... | 35,222,176.00 | 35,643,860.00 | 36,016,460.00 | 36,424,424.00 | 36,534,636.00 | 36,699,884.00 | 37,044,296.00 | 37,090,116.00 | 37,498,076.00 | 37,498,152.00 |
31 | 0.00 | 102,149.25 | 587,694.50 | 1,739,223.50 | 3,054,168.25 | 4,391,554.00 | 6,488,115.50 | 8,532,481.00 | 9,589,075.00 | 13,596,397.00 | ... | 39,249,012.00 | 39,718,904.00 | 40,134,100.00 | 40,588,704.00 | 40,711,520.00 | 40,895,656.00 | 41,279,448.00 | 41,330,504.00 | 41,785,108.00 | 41,785,192.00 |
32 | 0.00 | 137,841.00 | 841,636.31 | 1,810,914.25 | 3,203,533.50 | 4,339,657.00 | 7,193,541.50 | 8,792,350.00 | 11,048,168.00 | 12,931,037.00 | ... | 37,328,300.00 | 37,775,196.00 | 38,170,076.00 | 38,602,432.00 | 38,719,236.00 | 38,894,364.00 | 39,259,372.00 | 39,307,928.00 | 39,740,288.00 | 39,740,368.00 |
33 | 0.00 | 134,954.59 | 1,254,893.38 | 2,674,034.25 | 4,403,876.00 | 5,591,899.50 | 6,810,711.50 | 8,387,086.00 | 10,152,019.00 | 11,882,161.00 | ... | 34,300,500.00 | 34,711,148.00 | 35,074,000.00 | 35,471,288.00 | 35,578,616.00 | 35,739,540.00 | 36,074,940.00 | 36,119,560.00 | 36,516,848.00 | 36,516,920.00 |
34 | 38,893.38 | 652,741.81 | 1,456,564.38 | 2,050,356.00 | 3,044,004.00 | 4,796,483.00 | 7,570,764.00 | 9,698,433.00 | 11,739,319.00 | 13,739,974.00 | ... | 39,663,492.00 | 40,138,348.00 | 40,557,928.00 | 41,017,336.00 | 41,141,448.00 | 41,327,528.00 | 41,715,372.00 | 41,766,968.00 | 42,226,372.00 | 42,226,456.00 |
35 | 0.00 | 167,482.09 | 926,322.88 | 2,872,798.50 | 5,040,902.00 | 6,421,425.50 | 8,743,438.00 | 11,200,672.00 | 13,557,681.00 | 15,868,229.00 | ... | 45,807,164.00 | 46,355,568.00 | 46,840,140.00 | 47,370,704.00 | 47,514,044.00 | 47,728,948.00 | 48,176,864.00 | 48,236,452.00 | 48,767,016.00 | 48,767,112.00 |
36 | 21,154.29 | 320,832.62 | 1,568,433.12 | 2,393,324.00 | 4,121,508.50 | 6,130,129.00 | 8,346,808.00 | 10,692,576.00 | 12,942,664.00 | 15,148,399.00 | ... | 43,729,208.00 | 44,252,736.00 | 44,715,328.00 | 45,221,824.00 | 45,358,660.00 | 45,563,816.00 | 45,991,416.00 | 46,048,296.00 | 46,554,796.00 | 46,554,888.00 |
37 | 83,818.48 | 819,248.56 | 1,881,916.25 | 3,489,761.75 | 5,992,953.00 | 8,913,622.00 | 12,136,824.00 | 15,547,728.00 | 18,819,508.00 | 22,026,796.00 | ... | 63,585,224.00 | 64,346,468.00 | 65,019,108.00 | 65,755,588.00 | 65,954,552.00 | 66,252,864.00 | 66,874,624.00 | 66,957,336.00 | 67,693,816.00 | 67,693,952.00 |
38 | 28,038.49 | 216,292.89 | 1,041,738.50 | 2,063,425.25 | 3,543,510.50 | 5,270,441.50 | 7,176,253.50 | 9,193,051.00 | 11,127,587.00 | 13,023,989.00 | ... | 37,596,636.00 | 38,046,744.00 | 38,444,464.00 | 38,879,928.00 | 38,997,576.00 | 39,173,960.00 | 39,541,592.00 | 39,590,500.00 | 40,025,964.00 | 40,026,044.00 |
39 | 0.00 | 237,799.91 | 910,953.75 | 1,804,373.88 | 3,098,643.00 | 4,608,768.00 | 6,275,317.00 | 8,038,917.00 | 9,730,584.00 | 11,388,904.00 | ... | 32,876,592.00 | 33,270,192.00 | 33,617,980.00 | 33,998,776.00 | 34,101,648.00 | 34,255,892.00 | 34,577,368.00 | 34,620,136.00 | 35,000,932.00 | 35,001,000.00 |
40 | 0.00 | 166.40 | 726.71 | 1,470.36 | 2,547.66 | 3,804.63 | 5,191.81 | 6,659.76 | 8,067.84 | 9,448.17 | ... | 27,333.73 | 27,661.35 | 27,950.83 | 28,267.79 | 28,353.42 | 28,481.80 | 28,749.39 | 28,784.99 | 29,101.95 | 29,102.01 |
40 rows × 40 columns
Check difference with chain ladder. These seem quite small, except for occurrence period 40 where our log link is unable to get the value to zero.
= "occurrence_period", columns = "development_period", values = "payment_size_cumulative") -
(triangle_glm_agg.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative")) triangle_cl.pivot(index
development_period | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
occurrence_period | |||||||||||||||||||||
1 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | -2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 31.00 |
3 | 0.00 | 0.00 | 0.00 | -0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -2.00 | 31.00 |
4 | 0.00 | 0.00 | -0.02 | -0.06 | -0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | 0.00 | 38.00 |
5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.06 | 0.12 | 0.00 | 0.00 | -0.50 | 0.00 | ... | -2.00 | -2.00 | 0.00 | -2.00 | -2.00 | 0.00 | -2.00 | -2.00 | -4.00 | 36.00 |
6 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | -2.00 | -2.00 | -4.00 | 40.00 |
7 | 0.00 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.25 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -4.00 | -4.00 | -4.00 | 32.00 |
8 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | -2.00 | -2.00 | -4.00 | -4.00 | -6.00 | 52.00 |
9 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -2.00 | 0.00 | -2.00 | -2.00 | -2.00 | -2.00 | -4.00 | -2.00 | -4.00 | 38.00 |
10 | 0.00 | 0.00 | 0.00 | 0.12 | 0.00 | 0.25 | 0.25 | 0.00 | -0.50 | -0.50 | ... | 0.00 | 0.00 | 0.00 | -2.00 | -2.00 | 0.00 | -2.00 | -4.00 | -6.00 | 44.00 |
11 | 0.00 | 0.00 | 0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | -2.00 | -4.00 | -2.00 | -6.00 | -6.00 | -8.00 | 46.00 |
12 | 0.00 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -2.00 | -4.00 | -4.00 | -4.00 | -6.00 | -4.00 | -6.00 | -6.00 | -8.00 | 28.00 |
13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.25 | 0.00 | 0.00 | 0.00 | ... | -2.00 | -4.00 | -4.00 | -6.00 | -6.00 | -6.00 | -12.00 | -10.00 | -12.00 | 42.00 |
14 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.50 | 0.50 | 0.00 | 1.00 | ... | -2.00 | -2.00 | -2.00 | -2.00 | -2.00 | 0.00 | -4.00 | -2.00 | -4.00 | 36.00 |
15 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ... | 0.00 | 0.00 | -2.00 | -2.00 | -4.00 | -2.00 | -4.00 | -4.00 | -6.00 | 56.00 |
16 | 0.00 | 0.00 | 0.00 | 0.00 | -0.25 | -0.25 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 2.00 | 0.00 | 0.00 | 0.00 | -2.00 | 0.00 | -2.00 | 0.00 | -2.00 | 46.00 |
17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 2.00 | 0.00 | -2.00 | -2.00 | -2.00 | -2.00 | -6.00 | -4.00 | -6.00 | 46.00 |
18 | 0.00 | 0.00 | 0.00 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | -2.00 | -2.00 | 0.00 | -4.00 | -4.00 | -6.00 | 42.00 |
19 | 0.00 | 0.00 | 0.00 | 0.00 | -0.25 | 0.00 | 0.00 | 0.50 | 0.00 | 0.00 | ... | 2.00 | 0.00 | 0.00 | -2.00 | -2.00 | -2.00 | -4.00 | -2.00 | -6.00 | 48.00 |
20 | 0.00 | 0.00 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | -1.00 | 0.00 | 0.00 | ... | 2.00 | 0.00 | 0.00 | 0.00 | -2.00 | 0.00 | -4.00 | -4.00 | -4.00 | 60.00 |
21 | 0.00 | 0.00 | 0.00 | 0.12 | 0.25 | 0.25 | 0.00 | 0.00 | -1.00 | 0.00 | ... | 4.00 | 4.00 | 2.00 | 2.00 | 2.00 | 4.00 | 0.00 | 2.00 | 0.00 | 66.00 |
22 | 0.00 | 0.00 | 0.00 | 0.06 | 0.00 | -0.25 | 0.00 | 0.00 | 0.50 | 0.00 | ... | 8.00 | 8.00 | 6.00 | 6.00 | 4.00 | 6.00 | 2.00 | 4.00 | 2.00 | 56.00 |
23 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 6.00 | 6.00 | 6.00 | 4.00 | 2.00 | 4.00 | -2.00 | 0.00 | -2.00 | 64.00 |
24 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.50 | 0.00 | 0.50 | 0.00 | 0.00 | ... | 4.00 | 4.00 | 2.00 | 2.00 | 0.00 | 2.00 | -2.00 | -2.00 | -4.00 | 56.00 |
25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 2.00 | 0.00 | -2.00 | -2.00 | -2.00 | -2.00 | -4.00 | -2.00 | -2.00 | 46.00 |
26 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.50 | 0.50 | 0.00 | ... | 4.00 | 2.00 | 2.00 | 0.00 | -2.00 | 0.00 | -4.00 | -4.00 | -6.00 | 46.00 |
27 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | ... | 8.00 | 8.00 | 8.00 | 4.00 | 4.00 | 4.00 | 0.00 | 0.00 | 0.00 | 76.00 |
28 | 0.00 | 0.00 | 0.00 | -0.25 | -0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 12.00 | 12.00 | 8.00 | 8.00 | 8.00 | 12.00 | 8.00 | 8.00 | 4.00 | 96.00 |
29 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 12.00 | 16.00 | 12.00 | 8.00 | 4.00 | 4.00 | 0.00 | 0.00 | 0.00 | 88.00 |
30 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 20.00 | 20.00 | 20.00 | 20.00 | 16.00 | 20.00 | 12.00 | 16.00 | 12.00 | 88.00 |
31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 8.00 | 8.00 | 4.00 | 4.00 | 4.00 | 4.00 | 0.00 | 0.00 | -4.00 | 80.00 |
32 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ... | 16.00 | 16.00 | 16.00 | 16.00 | 12.00 | 16.00 | 12.00 | 12.00 | 12.00 | 92.00 |
33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -1.00 | -2.00 | ... | 16.00 | 16.00 | 16.00 | 16.00 | 12.00 | 16.00 | 8.00 | 12.00 | 8.00 | 80.00 |
34 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -1.00 | -2.00 | ... | 24.00 | 28.00 | 24.00 | 28.00 | 28.00 | 28.00 | 24.00 | 28.00 | 24.00 | 108.00 |
35 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -1.00 | -2.00 | -2.00 | ... | 12.00 | 8.00 | 4.00 | 4.00 | 4.00 | 4.00 | -4.00 | 0.00 | -4.00 | 92.00 |
36 | 0.00 | -0.03 | 0.00 | 0.00 | 0.00 | 2.00 | 2.50 | 5.00 | 5.00 | 7.00 | ... | 36.00 | 36.00 | 36.00 | 36.00 | 36.00 | 40.00 | 36.00 | 36.00 | 36.00 | 128.00 |
37 | 0.00 | 0.00 | 0.00 | 0.00 | 3.50 | 6.00 | 9.00 | 12.00 | 14.00 | 18.00 | ... | 72.00 | 72.00 | 68.00 | 68.00 | 64.00 | 68.00 | 60.00 | 60.00 | 56.00 | 192.00 |
38 | 0.00 | 0.00 | -0.06 | 1.12 | 3.00 | 4.50 | 5.50 | 8.00 | 9.00 | 10.00 | ... | 40.00 | 40.00 | 40.00 | 40.00 | 40.00 | 40.00 | 36.00 | 40.00 | 36.00 | 116.00 |
39 | 0.00 | 0.00 | 2.81 | 7.38 | 13.75 | 20.50 | 28.00 | 36.00 | 44.00 | 51.00 | ... | 156.00 | 158.00 | 160.00 | 160.00 | 156.00 | 160.00 | 156.00 | 160.00 | 160.00 | 228.00 |
40 | 0.00 | 166.40 | 726.71 | 1,470.36 | 2,547.66 | 3,804.63 | 5,191.81 | 6,659.76 | 8,067.84 | 9,448.17 | ... | 27,333.73 | 27,661.35 | 27,950.83 | 28,267.79 | 28,353.42 | 28,481.80 | 28,749.39 | 28,784.99 | 29,101.95 | 29,102.01 |
40 rows × 40 columns
(triangle_glm_agglambda df: df.occurrence_period != cutoff] # Exclude occurrence period == 40 since there's no data there at cutoff
.loc[= "development_period", columns = "occurrence_period", values = "payment_size_cumulative")
.pivot(index =True)
.plot(logy
)"GLM Chain Ladder Prediction of Training Set, Cumulative Claims by Development Period, Log Scale")
plt.title(="lower center", bbox_to_anchor=(0.5, -0.8), ncol=5) plt.legend(loc
# MSE / 10^12
# Exclude occurrence period == 40 since there's no data there at cutoff
sum((
np.lambda df: df.occurrence_period != cutoff].payment_size_cumulative.values -
triangle_glm_agg.loc[lambda df: df.occurrence_period != cutoff].payment_size_cumulative.values
triangle.loc[**2)/10**12 )
27656.19636207616
GLM, Occurrence/Development Only, Individual Data
What if we just fit the same “chain ladder” model on the individual data?
One limitation moving to individual data, is that we would be predicting ultimates including claims that we would not be normally aware of yet. This is, however, a limitation with most individual claims models. A method of overcoming this could be to start at the policy rather than at the claim (and following this same process of one record per period), however our dataset is at the claim level.
= Pipeline(
GLM_CL_ind =[
steps"keep", ColumnKeeper(["occurrence_period", "development_period"])),
('one_hot', OneHotEncoder(sparse_output=False)), # OneHot to get one factor per
("model", TabularNetRegressor(LogLinkGLM, has_bias=True, max_iter=glm_iter, max_lr=0.05))
(
]
)
GLM_CL_ind.fit(== 1],
dat.loc[dat.train_ind "payment_size"]]
dat.loc[dat.train_ind, [ )
Train RMSE: 55491.80859375 Train Loss: -76383.375
Train RMSE: 55302.078125 Train Loss: -77458.4453125
Train RMSE: 55260.9375 Train Loss: -77752.9296875
Train RMSE: 55260.1640625 Train Loss: -77769.5703125
Train RMSE: 55260.1015625 Train Loss: -77773.015625
Train RMSE: 55260.0859375 Train Loss: -77774.140625
Train RMSE: 55260.08203125 Train Loss: -77774.5859375
Train RMSE: 55260.078125 Train Loss: -77774.7890625
Train RMSE: 55260.078125 Train Loss: -77774.8828125
Train RMSE: 55260.08203125 Train Loss: -77774.9140625
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('one_hot', OneHotEncoder(sparse_output=False)), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkGLM'>))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('one_hot', OneHotEncoder(sparse_output=False)), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkGLM'>))])
ColumnKeeper(cols=['occurrence_period', 'development_period'])
OneHotEncoder(sparse_output=False)
TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkGLM'>)
# Function to make a dataset with train payments and test predictions, and resulting triangle
def make_pred_set_and_triangle(individual_model, train, test):
= pd.concat(
dat_model_pred
[
train,= individual_model.predict(test))
test.assign(payment_size
], ="rows"
axis
)"payment_size_cumulative"] = (
dat_model_pred["claim_no", "payment_size"]].groupby('claim_no').cumsum()
dat_model_pred[[
)
= (dat_model_pred
triangle_model_ind "occurrence_period", "development_period", "payment_period"], as_index=False)
.groupby(["payment_size_cumulative": "sum", "payment_size": "sum"})
.agg({=["occurrence_period", "development_period"])
.sort_values(by
)
(triangle_model_indlambda df: (df.occurrence_period != cutoff) & (df.development_period < cutoff)]
.loc[= "development_period", columns = "occurrence_period", values = "payment_size_cumulative")
.pivot(index =True)
.plot(logy
)="lower center", bbox_to_anchor=(0.5, -0.8), ncol=5)
plt.legend(loc"Cumulative Claims by Development - Predictions Lower Half Triangle")
plt.title(
return dat_model_pred, triangle_model_ind
= make_pred_set_and_triangle(GLM_CL_ind, dat.loc[dat.train_ind == 1], dat.loc[dat.train_ind == 0]) dat_glm_pred, triangle_glm_ind
Looks to be giving a reasonable curve.
Differences with Chain Ladder
Differences in GLM coefficients with aggregated data:
= cutoff
start = cutoff + num_dev_periods + 1
end start, end
(40, 80)
"model"].module_.linear.weight.data[: ,start:end].cpu().numpy().ravel() GLM_CL_agg[
array([-2.6085851 , -0.9452279 , 0.26885098, 0.55193317, 0.92257833,
1.0768248 , 1.175387 , 1.2319895 , 1.1903464 , 1.1704375 ,
1.1738193 , 0.88819265, 1.0728831 , 1.0463339 , 0.9118993 ,
0.7312811 , 0.9362656 , 0.8054807 , 1.1134231 , 0.89804524,
0.63513625, 0.19344859, 0.19643542, 0.3392283 , 0.33730403,
0.45425934, 0.2924858 , 0.5794589 , -0.3311628 , 0.55835927,
0.17237537, -0.26778826, -0.39153227, -0.30086178, -1.6096059 ,
-1.2046075 , -0.47019288, -2.487373 , -0.30085984, -8.907836 ],
dtype=float32)
"model"].module_.linear.weight.data[: ,start:end].cpu().numpy().ravel() GLM_CL_ind[
array([-0.95821226, -0.6363852 , 0.12603194, 0.20968342, 0.48075658,
0.5889297 , 0.6586112 , 0.7034973 , 0.6573421 , 0.6352269 ,
0.6381907 , 0.35272628, 0.5351275 , 0.5085579 , 0.37453473,
0.1936421 , 0.3985621 , 0.268308 , 0.57655615, 0.361657 ,
0.0992958 , -0.34339836, -0.34055626, -0.19771653, -0.19970702,
-0.08316672, -0.24393895, 0.04285933, -0.86934584, 0.0212998 ,
-0.36557752, -0.806808 , -0.9307192 , -0.83957726, -2.048075 ,
-1.721573 , -1.0065743 , -2.4330187 , -0.8382782 , -2.7173047 ],
dtype=float32)
pd.DataFrame.from_dict({"aggregate": GLM_CL_agg["model"].module_.linear.weight.data[: ,start:end].cpu().numpy().ravel(),
"individual": GLM_CL_ind["model"].module_.linear.weight.data[: ,start:end].cpu().numpy().ravel(),
"log_incr_perc_ultimate_shifted": np.log(df_cdf.incr_perc_of_ultimate * num_dev_periods)
}).plot()
/Users/jacky/GitRepos/NNs-on-Individual-Claims/.conda/lib/python3.10/site-packages/pandas/core/arraylike.py:402: RuntimeWarning: divide by zero encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
Difference in predictions in direct calculation:
= "occurrence_period", columns = "development_period", values = "payment_size_cumulative") -
(triangle_glm_ind.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative")) triangle_cl.pivot(index
development_period | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
occurrence_period | |||||||||||||||||||||
1 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2 | 0.00 | 0.00 | 0.00 | 0.03 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | 26,118.00 |
3 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -541.00 | 26,439.00 |
4 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 17,844.00 | 16,446.00 | 46,772.00 |
5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 2.00 | 0.00 | 0.00 | 0.00 | 1,940.00 | 21,744.00 | 23,818.00 | 56,794.00 |
6 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1,718.00 | 1,304.00 | 23,032.00 | 22,294.00 | 59,042.00 |
7 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.25 | -0.50 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 5,974.00 | 7,912.00 | 8,500.00 | 27,408.00 | 27,892.00 | 59,658.00 |
8 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | -180.00 | 8,748.00 | 11,484.00 | 11,950.00 | 40,584.00 | 40,812.00 | 89,012.00 |
9 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 32.00 | 176.00 | 6,672.00 | 8,750.00 | 9,316.00 | 29,948.00 | 30,384.00 | 65,062.00 |
10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.00 | 0.00 | ... | 0.00 | -698.00 | -1,358.00 | -1,952.00 | 5,668.00 | 7,862.00 | 7,896.00 | 32,658.00 | 32,418.00 | 74,182.00 |
11 | 0.00 | 0.00 | 0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 0.00 | ... | -2,130.00 | -3,820.00 | -5,360.00 | -6,908.00 | 898.00 | 2,844.00 | 2,102.00 | 28,182.00 | 27,008.00 | 71,168.00 |
12 | 0.00 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -5,279.00 | -6,972.00 | -8,496.00 | -10,074.00 | -5,064.00 | -4,002.00 | -4,960.00 | 12,214.00 | 10,882.00 | 40,066.00 |
13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -1,548.00 | -2,314.00 | -3,036.00 | -3,688.00 | 4,528.00 | 6,890.00 | 6,916.00 | 33,634.00 | 33,364.00 | 78,426.00 |
14 | 0.00 | 0.00 | 0.00 | -0.12 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -12,584.00 | -14,744.00 | -16,686.00 | -18,710.00 | -13,170.00 | -12,086.00 | -13,376.00 | 5,828.00 | 4,078.00 | 36,764.00 |
15 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 35,176.00 | 38,468.00 | 41,322.00 | 44,608.00 | 55,172.00 | 59,532.00 | 62,974.00 | 94,228.00 | 97,954.00 | 149,906.00 |
16 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -23,084.00 | -25,688.00 | -28,028.00 | -30,468.00 | -23,930.00 | -22,666.00 | -24,230.00 | -1,528.00 | -3,644.00 | 35,004.00 |
17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -5,364.00 | -6,332.00 | -7,230.00 | -8,078.00 | -362.00 | 1,780.00 | 1,608.00 | 26,880.00 | 26,392.00 | 69,062.00 |
18 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -10,792.00 | -12,038.00 | -13,176.00 | -14,304.00 | -7,342.00 | -5,524.00 | -5,974.00 | 17,092.00 | 16,296.00 | 55,314.00 |
19 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -14,284.00 | -15,752.00 | -17,094.00 | -18,426.00 | -10,426.00 | -8,348.00 | -8,896.00 | 17,642.00 | 16,690.00 | 61,578.00 |
20 | 0.00 | 0.00 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -55,784.00 | -59,628.00 | -63,078.00 | -66,690.00 | -57,724.00 | -56,078.00 | -58,444.00 | -27,120.00 | -30,280.00 | 23,090.00 |
21 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 69,346.00 | 71,976.00 | 74,242.00 | 76,900.00 | 88,090.00 | 92,444.00 | 95,408.00 | 129,136.00 | 132,270.00 | 188,502.00 |
22 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.50 | 0.00 | 0.00 | 0.00 | ... | 55,238.00 | 56,924.00 | 58,370.00 | 60,092.00 | 69,048.00 | 72,404.00 | 74,442.00 | 101,742.00 | 103,848.00 | 149,442.00 |
23 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 54,880.00 | 56,070.00 | 57,066.00 | 58,326.00 | 68,976.00 | 72,712.00 | 74,488.00 | 107,540.00 | 109,268.00 | 164,628.00 |
24 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 1.00 | ... | 116,230.00 | 119,328.00 | 122,014.00 | 125,112.00 | 135,388.00 | 139,596.00 | 142,860.00 | 173,338.00 | 176,866.00 | 227,548.00 |
25 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -24,200.00 | -25,480.00 | -26,652.00 | -27,812.00 | -20,608.00 | -18,724.00 | -19,184.00 | 4,678.00 | 3,862.00 | 44,220.00 |
26 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.50 | 0.00 | 0.00 | -1.00 | ... | -46,666.00 | -48,544.00 | -50,248.00 | -51,978.00 | -44,276.00 | -42,422.00 | -43,318.00 | -17,438.00 | -18,798.00 | 25,066.00 |
27 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | ... | 110,416.00 | 112,332.00 | 113,960.00 | 115,940.00 | 128,584.00 | 133,176.00 | 135,684.00 | 174,560.00 | 177,092.00 | 242,108.00 |
28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -2,208.00 | -3,240.00 | -4,228.00 | -5,076.00 | 8,812.00 | 12,884.00 | 13,140.00 | 58,104.00 | 57,896.00 | 133,684.00 |
29 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -747,996.00 | -764,748.00 | -779,632.00 | -795,700.00 | -786,664.00 | -788,988.00 | -801,632.00 | -760,728.00 | -776,188.00 | -704,352.00 |
30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -22,220.00 | -23,488.00 | -24,672.00 | -25,780.00 | -14,488.00 | -11,312.00 | -11,448.00 | 25,420.00 | 24,844.00 | 87,068.00 |
31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 375,908.00 | 381,800.00 | 386,940.00 | 392,784.00 | 407,476.00 | 413,948.00 | 419,784.00 | 462,300.00 | 468,740.00 | 539,160.00 |
32 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | -10,088.00 | ... | -182,780.00 | -186,864.00 | -190,536.00 | -194,356.00 | -183,184.00 | -180,912.00 | -183,300.00 | -144,764.00 | -148,032.00 | -82,492.00 |
33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 27,233.00 | 50,059.00 | ... | 529,056.00 | 537,300.00 | 544,524.00 | 552,624.00 | 566,356.00 | 573,252.00 | 580,884.00 | 618,652.00 | 627,272.00 | 689,272.00 |
34 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -2,418.00 | -13,918.00 | -29,573.00 | ... | -276,912.00 | -282,420.00 | -287,356.00 | -292,544.00 | -281,008.00 | -279,064.00 | -282,552.00 | -241,836.00 | -246,440.00 | -176,972.00 |
35 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -70,556.00 | -146,515.00 | -229,660.00 | -291,366.00 | ... | -1,141,304.00 | -1,157,984.00 | -1,172,796.00 | -1,188,772.00 | -1,178,432.00 | -1,180,320.00 | -1,192,800.00 | -1,147,804.00 | -1,163,116.00 | -1,084,416.00 |
36 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 85,090.00 | 229,827.50 | 384,379.00 | 638,293.00 | 881,795.00 | ... | 3,982,064.00 | 4,037,648.00 | 4,086,680.00 | 4,140,624.00 | 4,171,164.00 | 4,198,008.00 | 4,244,652.00 | 4,301,668.00 | 4,356,340.00 | 4,442,052.00 |
37 | 0.00 | 0.00 | 0.00 | 0.00 | -74,768.50 | -122,564.00 | -158,118.00 | -235,035.00 | -287,442.00 | -310,780.00 | ... | -684,548.00 | -692,960.00 | -700,508.00 | -708,428.00 | -689,816.00 | -686,528.00 | -691,788.00 | -626,436.00 | -633,412.00 | -521,992.00 |
38 | 0.00 | 0.00 | 0.00 | 66,269.62 | 262,007.75 | 452,814.00 | 603,134.50 | 736,867.00 | 882,875.00 | 1,021,512.00 | ... | 2,772,288.00 | 2,803,356.00 | 2,830,736.00 | 2,860,940.00 | 2,882,360.00 | 2,898,740.00 | 2,925,152.00 | 2,970,880.00 | 3,001,680.00 | 3,072,900.00 |
39 | 0.00 | 0.00 | 105,522.88 | 372,281.00 | 825,593.25 | 1,354,328.50 | 1,996,585.50 | 2,647,787.00 | 3,261,998.00 | 3,859,120.00 | ... | 11,545,788.00 | 11,685,482.00 | 11,808,832.00 | 11,944,132.00 | 11,995,388.00 | 12,054,788.00 | 12,170,032.00 | 12,232,152.00 | 12,368,124.00 | 12,447,052.00 |
40 | 0.00 | 11,206.78 | 56,149.67 | 110,068.73 | 188,510.28 | 280,837.16 | 385,106.44 | 496,923.50 | 605,015.06 | 710,742.38 | ... | 2,078,335.00 | 2,103,333.75 | 2,125,419.25 | 2,149,612.25 | 2,156,837.25 | 2,166,852.00 | 2,187,324.25 | 2,192,240.75 | 2,216,465.00 | 2,220,165.25 |
40 rows × 40 columns
# MSE / 10^12
# Exclude occurrence period == 40 since there's no data there at cutoff
sum((
np.lambda df: df.occurrence_period != cutoff].payment_size_cumulative.values -
triangle_glm_ind.loc[lambda df: df.occurrence_period != cutoff].payment_size_cumulative.values
triangle.loc[**2)/10**12 )
24148.42140950528
GLM with Splines, Occur/Dev, Individual Data
In Grainne McGuire’s article she uses splines in the GLM model to model a parabolic trend in occurrence period. In this example data, there is also a curve with development periods. Can we do something similar?
= Pipeline(
GLM_CL_spline =[
steps"keep", ColumnKeeper(["occurrence_period", "development_period"])),
('one_hot', SplineTransformer()), # Spline transform
("model", TabularNetRegressor(LogLinkGLM, has_bias=True, max_iter=glm_iter, max_lr=0.05))
(
]
)
GLM_CL_spline.fit(== 1],
dat.loc[dat.train_ind "payment_size"]]
dat.loc[dat.train_ind, [ )
Train RMSE: 55492.2421875 Train Loss: -76383.375
Train RMSE: 55338.0625 Train Loss: -77265.875
Train RMSE: 55294.43359375 Train Loss: -77546.9765625
Train RMSE: 55289.453125 Train Loss: -77583.4375
Train RMSE: 55286.875 Train Loss: -77601.484375
Train RMSE: 55285.2890625 Train Loss: -77612.5078125
Train RMSE: 55284.37109375 Train Loss: -77618.8984375
Train RMSE: 55283.875 Train Loss: -77622.3984375
Train RMSE: 55283.62890625 Train Loss: -77624.1328125
Train RMSE: 55283.5390625 Train Loss: -77624.78125
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('one_hot', SplineTransformer()), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkGLM'>))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('one_hot', SplineTransformer()), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkGLM'>))])
ColumnKeeper(cols=['occurrence_period', 'development_period'])
SplineTransformer()
TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkGLM'>)
= make_pred_set_and_triangle(GLM_CL_spline, dat.loc[dat.train_ind == 1], dat.loc[dat.train_ind == 0]) dat_spline_pred, triangle_spline_ind
# Diagnostic model subplots
# We will be repeating this logic across multiple models, so put logic into function for repeatability
def make_model_subplots(model, dat):
= plt.subplots(3, 2, sharex='all', sharey='all', figsize=(15, 15))
fig, axes
(dat= model.predict(dat))
.assign(payment_size_pred lambda df: df.train_ind]
.loc["occurrence_period"])
.groupby(["payment_size": "mean", "payment_size_pred": "mean"})
.agg({=axes[0,0], logy=True)
).plot(ax0,0].title.set_text("Train, Occur")
axes[
(dat= model.predict(dat))
.assign(payment_size_pred lambda df: df.train_ind]
.loc["development_period"])
.groupby(["payment_size": "mean", "payment_size_pred": "mean"})
.agg({=axes[0,1], logy=True)
).plot(ax0,1].title.set_text("Train, Dev")
axes[
(dat= model.predict(dat))
.assign(payment_size_pred lambda df: ~df.train_ind]
.loc["occurrence_period"])
.groupby(["payment_size": "mean", "payment_size_pred": "mean"})
.agg({=axes[1,0], logy=True)
).plot(ax1,0].title.set_text("Test, Occ")
axes[
(dat= model.predict(dat))
.assign(payment_size_pred lambda df: ~df.train_ind]
.loc["development_period"])
.groupby(["payment_size": "mean", "payment_size_pred": "mean"})
.agg({=axes[1,1], logy=True)
).plot(ax1,1].title.set_text("Test, Dev")
axes[
(dat= model.predict(dat))
.assign(payment_size_pred "occurrence_period"])
.groupby(["payment_size": "mean", "payment_size_pred": "mean"})
.agg({=axes[2,0], logy=True)
).plot(ax2,0].title.set_text("All, Occ")
axes[
(dat= model.predict(dat))
.assign(payment_size_pred "development_period"])
.groupby(["payment_size": "mean", "payment_size_pred": "mean"})
.agg({=axes[2,1], logy=True)
).plot(ax2,1].title.set_text("All, Dev")
axes[
make_model_subplots(GLM_CL_spline, dat)
This should look reasonable.
Neural Network, Occurence/Development Only
Can we have deep learning just figure out this curve by using the periods as numeric inputs, rather than the one-hot encoding?
Here we have a ResNet pretty similar to the ones previously used in the other notebooks. There is more functionality built-in than a base ResNet - some observations on the architecture:
- This is the concatenation variant rather than addition variant of the ResNet, but for a single hidden layer it does not make much difference.
- Batch Normalisation: We use a variant of FixUp initialisation so the batch normalisation is not needed for numerical stability. Our experience is that including this leads to a higher degree of overfitting.
- L1 and L2 normalisation are built into the sci-kit learn wrapper. L1 enforces sparsity but also leads to some underfitting of effects.
- Dropout layer is available for optimisation.
- One Cycle learning schedule with Adam is used as the optimiser in the sci-kit learn wrapper above. It seems to work in getting a working model quickly.
- ELU is used as activation. Sigmoid does not appear to work well. Occasionally ReLU has dead neurons which impact model results due to low neuron count.
class LogLinkResNet(nn.Module):
# Define the parameters in __init__
def __init__(
self,
# number of inputs
n_input, # number of outputs
n_output, # init mean value to speed up convergence
init_bias, # hidden layer size
n_hidden, # whether to do batch norm (boolean)
batch_norm, # dropout percentage,
dropout, **kwargs,
):
super(LogLinkResNet, self).__init__()
self.hidden = nn.Linear(n_input, n_hidden) # Hidden layer
# Batchnorm layer
self.batch_norm = batch_norm
if batch_norm:
self.batchn = nn.BatchNorm1d(n_hidden)
self.dropout = nn.Dropout(dropout)
self.linear = torch.nn.Linear(n_input, n_output) # Linear coefficients
# Neural net coefficients - no bias - glm has a bias already
self.neural = torch.nn.Linear(n_hidden, n_output, bias=False)
self.linear.weight) # Initialise to zero
nn.init.zeros_(self.neural.weight) # n.b. do not initialise hidden layers to zero
nn.init.zeros_(self.linear.bias.data = torch.tensor(init_bias)
# The forward function defines how you get y from X.
def forward(self, x):
= F.elu(self.hidden(x)) # Apply hidden layer
h if self.batch_norm:
= self.batchn(h) # Apply batchnorm
h = self.dropout(h)
h return torch.exp(self.linear(x) + self.neural(h)) # Add GLM to NN
Development Period Only Model
So here we min-max scale the data rather than one-hot encode it. The ResNet to figure out the effect.
First, we test with a model with only one feature - development_period
.
= Pipeline(
model_NN =[
steps"keep", ColumnKeeper(["development_period"])), # Just development period!
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", TabularNetRegressor(LogLinkResNet, n_hidden=20, max_iter=nn_iter, max_lr=0.05))
(
]
)
model_NN.fit(== 1],
dat.loc[dat.train_ind == 1, ["payment_size"]]
dat.loc[dat.train_ind )
Train RMSE: 55491.70703125 Train Loss: -76383.375
Train RMSE: 55438.2109375 Train Loss: -76925.859375
Train RMSE: 55411.64453125 Train Loss: -77013.7578125
Train RMSE: 55366.9921875 Train Loss: -77228.1640625
Train RMSE: 55359.65234375 Train Loss: -77283.15625
Train RMSE: 55358.203125 Train Loss: -77291.875
Train RMSE: 55357.453125 Train Loss: -77296.515625
Train RMSE: 55356.91015625 Train Loss: -77299.859375
Train RMSE: 55356.59375 Train Loss: -77301.8515625
Train RMSE: 55356.45703125 Train Loss: -77302.671875
Pipeline(steps=[('keep', ColumnKeeper(cols=['development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkResNet'>))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('keep', ColumnKeeper(cols=['development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkResNet'>))])
ColumnKeeper(cols=['development_period'])
MinMaxScaler()
TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkResNet'>)
make_model_subplots(model_NN, dat)
So far, this approach does fine. It seems to reflect the development pattern for the training and holdout datasets. But to replicate chain ladder, we need occurrence too.
Development and Occurrence Model
Now we use both development and occurrence.
= Pipeline(
model_NN =[
steps"keep", ColumnKeeper(["occurrence_period", "development_period"])), # both
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", TabularNetRegressor(LogLinkResNet, n_hidden=20, max_iter=nn_iter, max_lr=0.05))
(
]
)
model_NN.fit(== 1],
dat.loc[dat.train_ind == 1, ["payment_size"]]
dat.loc[dat.train_ind )
Train RMSE: 55491.5390625 Train Loss: -76383.375
Train RMSE: 55372.53125 Train Loss: -77204.9765625
Train RMSE: 55350.5703125 Train Loss: -77285.359375
Train RMSE: 55331.859375 Train Loss: -77372.9140625
Train RMSE: 55316.28515625 Train Loss: -77478.140625
Train RMSE: 55297.3515625 Train Loss: -77553.765625
Train RMSE: 55292.02734375 Train Loss: -77582.234375
Train RMSE: 55289.3984375 Train Loss: -77595.484375
Train RMSE: 55288.16796875 Train Loss: -77602.0078125
Train RMSE: 55287.69140625 Train Loss: -77604.6171875
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkResNet'>))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkResNet'>))])
ColumnKeeper(cols=['occurrence_period', 'development_period'])
MinMaxScaler()
TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.LogLinkResNet'>)
= make_pred_set_and_triangle(model_NN, dat.loc[dat.train_ind == 1], dat.loc[dat.train_ind == 0]) dat_nn_pred, triangle_resnet_do
make_model_subplots(model_NN, dat)
The model does fit well for training data, but the model sometimes appears overfits, particularly on the occurrence period trend for dataset 1, which leads to a deterioration of performance in test data. Performance appears to be better when batch normalisation is not used.
“SplineNet” Architecture, Occurrence/Development Only
An observation with the above model compared to the spline GLM is that all the hidden layer components are naturally interactions rather than the one-way splines in the GLM. If only we could fit the model as a linear combination of one-way effects.
We re-introduce ideas from our earlier paper [1]. This model architecture includes:
- a log-link, through the exponential transformation at the end,
- one-way neural network splines, linearly combined, to ensure single-variable effects are well-captured, similar to the spline-GLM earlier,
- a neural interaction component to which l1, weight decay, dropout regularization and penalties are applied to put a cost to multi-way effects and discourage overfitting.
- an interaction parameter, which can be set at between 0.0 and 1.0, to governs the impact of this interaction. This is set as a trainable parameter with a sigmoid gate based on Residual Gates and Highway Networks [3] so that the model can identify to what extent interactions are required.
class SplineNet(nn.Module):
# Define the parameters in __init__
def __init__(
self,
# number of inputs
n_input, # number of outputs
n_output, # init mean value to speed up convergence
init_bias, # hidden layer size
n_hidden, # dropout percentage
dropout, =0.0, # 0.0 - 1.0 initial weight to apply interactions
interactions=True, # Whether the model can adapt the interactions weight over time.
interactions_trainable=torch.exp,
inverse_of_link_fn**kwargs
): # Initialise
super(SplineNet, self).__init__()
self.interactions=nn.Parameter(
=1e-5),
torch.logit(torch.tensor(interactions), eps=interactions_trainable)
requires_grad
self.n_input = n_input
self.inverse_of_link_fn = inverse_of_link_fn
# One-way layers
self.oneways = nn.ModuleList([nn.Linear(1, n_hidden) for i in range(0, n_input)])
self.oneway_linear = nn.Linear(n_hidden * n_input, n_output)
self.oneway_linear.bias.data = torch.tensor(init_bias) # Initialise bias to speed up convergence
self.oneway_linear.weight) # n.b. do not initialise hidden layers to zero
nn.init.zeros_(
# Optional Interactions Layer
self.hidden = nn.Linear(n_input, n_hidden) # Hidden layer - interactions
self.linear = nn.Linear(n_hidden, n_output, bias=False) # Neural net coefficients
self.linear.weight) # n.b. do not initialise hidden layers to zero
nn.init.zeros_(
# Dropout
self.dropout = nn.Dropout(dropout)
# The forward function defines how you get y from X.
def forward(self, x):
# Apply one-ways
= torch.split(x, [1 for i in range(0, self.n_input)], dim=1)
chunks = torch.cat([self.oneways[i](chunks[i]) for i in range(0, self.n_input)], dim=1)
splines
# Sigmoid gate
= torch.sigmoid(self.interactions)
interact_gate
= self.oneway_linear(F.elu(splines)) * (1 - interact_gate)
splines_out = self.linear(F.elu(self.hidden(self.dropout(x)))) * (interact_gate)
interact_out
# Add ResNet style
return self.inverse_of_link_fn(splines_out + interact_out)
= Pipeline(
model_spline_net =[
steps"keep", ColumnKeeper(["occurrence_period", "development_period"])), # both
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", TabularNetRegressor(SplineNet, n_hidden=20, max_iter=nn_iter, max_lr=0.05))
(
]
)
model_spline_net.fit(== 1],
dat.loc[dat.train_ind == 1, ["payment_size"]]
dat.loc[dat.train_ind )
Train RMSE: 55489.55078125 Train Loss: -76383.359375
Train RMSE: 55360.18359375 Train Loss: -77245.1328125
Train RMSE: 55355.15234375 Train Loss: -77276.9140625
Train RMSE: 55335.62109375 Train Loss: -77348.296875
Train RMSE: 55298.7421875 Train Loss: -77537.8125
Train RMSE: 55291.6015625 Train Loss: -77573.96875
Train RMSE: 55289.71875 Train Loss: -77584.609375
Train RMSE: 55281.45703125 Train Loss: -77640.9296875
Train RMSE: 55279.375 Train Loss: -77654.7109375
Train RMSE: 55278.79296875 Train Loss: -77659.125
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.SplineNet'>))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.SplineNet'>))])
ColumnKeeper(cols=['occurrence_period', 'development_period'])
MinMaxScaler()
TabularNetRegressor(max_iter=500, max_lr=0.05, module=<class '__main__.SplineNet'>)
= make_pred_set_and_triangle(model_spline_net, dat.loc[dat.train_ind == 1], dat.loc[dat.train_ind == 0]) dat_nn_pred, triangle_nn_spline
make_model_subplots(model_spline_net, dat)
Mixture Density Network
In Probabilistic Forecasting with Neural Networks Applied to Loss Reserving [2], Muhammed Al-Mudafer introduced the idea of a “ResMDN” model which produces results similar to a more advanced stochastic chain ladder that has a deep learning component. The ResMDN extends the one-hot GLM into a ResNet, then predicts a full probability distribution using Mixture Density Networks.
We note that our “SplineNet” architecture appears to compare well to the one-hot GLM, since it fits curves to the features rather than one-hot encoding each occurrence and development period. So we will add MDN output and loss build on the “SplineNet” model instead of the GLM.
Distribution Assumption
The Mixture Density Network models the output as being a mixture of normal distributions. However, it becomes quickly apparent the normal distribution is not a good fit for individual claims data. The SPLICE claim payments, like typical insurance claims data, is heavily skewed:
== 1) & (dat.payment_size > 0), ["payment_size"]])) plt.hist((dat.loc[(dat.train_ind
(array([1.2441e+04, 2.5600e+02, 4.8000e+01, 2.5000e+01, 7.0000e+00,
4.0000e+00, 3.0000e+00, 0.0000e+00, 1.0000e+00, 1.0000e+00]),
array([5.43023949e+01, 3.13449125e+05, 6.26843938e+05, 9.40238750e+05,
1.25363362e+06, 1.56702838e+06, 1.88042325e+06, 2.19381800e+06,
2.50721275e+06, 2.82060775e+06, 3.13400250e+06]),
<BarContainer object of 10 artists>)
Given the skew, plot log-transformed histogram:
== 1) & (dat.payment_size > 0), ["payment_size"]])) plt.hist(np.log(dat.loc[(dat.train_ind
(array([ 13., 27., 58., 1162., 4850., 4191., 902., 1315., 233.,
35.]),
array([ 3.99456835, 5.09089375, 6.18721914, 7.28354454, 8.37986946,
9.47619534, 10.57252026, 11.66884613, 12.76517105, 13.86149693,
14.95782185]),
<BarContainer object of 10 artists>)
It is clear from the plot that the log-normal distribution should be more appropriate than the normal for the distribution of payments in individual claims data.
Consequently, we propose an adjusted log-normal MDN approach, which also takes into account the observations with zero payments: 1. Calculate point estimates as \(exp(\mu + \sigma^2/2)\), being mean of a log-normal, 2. Apply loss to \(log(y + \epsilon)\) rather than \(y\) (with \(\epsilon\) being some small value), 3. Use \(\sqrt ReLU\) + \(\epsilon\) as the activation for \(\sigma\) rather than \(exp\).
This design, which appears to be novel, addresses two numerical stability challenges from having observations with zero payment: * Mean of the zero records is zero. This means typically, with a log-link \(log(0), \rightarrow -\infty\). Adding the \(\epsilon\) value avoids this. * \(\sigma\) of the zero records is also zero. Typically, MDNs use an exponential or ELU activation for \(sigma\), so again \(log(0) \rightarrow -\infty\). Use of ReLU allows \(sigma\) to be extremely small without numerical instability. * Apply square root \(\sigma\) to avoid any unintended quadratic effects to \(exp(\mu + \sigma^2/2)\)
Loss function
Our adjusted Mixture Density Networks loss function is as follows. This utilises the torch.logsumexp
function for the best numerical stability.
= 1e-7
SMALL
def log_mdn_loss_fn(y_dists, y):
= torch.log(y + SMALL) # log(y) ~ Normal
y = y_dists
alpha, mu, sigma = torch.distributions.Normal(loc=mu, scale=sigma) # Normal
m = -torch.logsumexp(m.log_prob(y) + torch.log(alpha + 1e-15), dim=-1)
loss return torch.mean(loss) # Average over dataset
Meanwhile it is possible to get similar result as the point estimate MDN model using this loss function, which calculates the Poisson NLL loss:
def point_poisson_nll_loss(y_dists, y):
= y_dists
alpha, mu, sigma = torch.sum(alpha * torch.exp(mu + 0.5 * sigma * sigma), dim=1).unsqueeze(1)
yhat assert yhat.size() == y.size()
return F.poisson_nll_loss(yhat, y, log_input=False, full=False, eps=1e-8, reduction='mean')
It is possible to train for both losses by summing both losses as per below. This seems to provide a more reliably stable result for the point estimates than relying on solely the MDN loss, however we found the variational predictions became less accurate so we abandoned this approach.
= lambda y_dists, y: log_mdn_loss_fn(y_dists, y) + point_poisson_nll_loss(y_dists, y) two_loss
Architecture
We take the “SplineNet” architecture presented earlier and use one to provide the point estimates, exactly as before, then apply a second SplineNet module to predict the distributional assumptions. The \(\alpha\), \(\mu\), and \(\sigma\) outputs are calculated in a way such that: * \(\mu\) is calculated with scaling factors to ensure the mean, being \(\Sigma^n_1 (\alpha \times exp(\mu + 0.5 \sigma^2))\), is always equal to the point estimate output, * \(\sigma\) is calculated using ReLU and square root transformation as described above.
We found that whilst optimising for the MDN loss function was often effective, there were training runs when the MSE and point estimates would behave oddly during the training process. Since the estimate for the mean comes directly from a SplineNet sub-module, it is possible to use the previously trained model and weights. This would avoid another source of instability. An interesting observation with this is that a similar approach could be used to add distributional predictions to any point estimate model.
class LognormalSplineMDN(nn.Module): # V3
# Define the parameters in __init__
def __init__(
self,
# number of inputs
n_input, # number of outputs
n_output, # init mean value to speed up convergence
init_bias, # init mean value to speed up convergence
init_extra, # hidden layer size
n_hidden, # dropout percentage
dropout, # number of gaussians
n_gaussians, =0.0, # 0.0 - 1.0 whether to apply interactions
interactions=False,
point_estimates**kwargs
): assert n_output==1, "Not implemented for n_output > 1"
# Initialise
super(LognormalSplineMDN, self).__init__()
self.n_gaussians = n_gaussians
self.point_estimates = point_estimates
self.point_estimator = SplineNet(
=1, init_bias=init_bias, inverse_of_link_fn=lambda x: x,
n_output# the other parameters are passed straight through:
=n_input, n_hidden=n_hidden, dropout=dropout, interactions=interactions,
n_input
)self.point_estimator.load_state_dict(init_extra[0])
self.point_estimator.requires_grad_(False)
self.distribution_estimator = SplineNet(
=3 * n_gaussians, init_bias=init_extra[1], inverse_of_link_fn=lambda x: x,
n_output# the other parameters are passed straight through:
=n_input, n_hidden=n_hidden, dropout=dropout, interactions=interactions,
n_input
)
# The forward function defines how you get y from X.
def forward(self, x):
= self.point_estimator(x)
predictor
if self.point_estimates:
return torch.exp(predictor)
else:
= torch.split(
alpha0, median0, sigma0 self.distribution_estimator(x),
self.n_gaussians, self.n_gaussians, self.n_gaussians],
[=1
dim
)
# Alpha: Softmax so it sums to 100%
= F.softmax(alpha0, dim=-1)
alpha = torch.log(alpha + 1e-15)
log_mix_prob
# Sigma: ReLU with minimum of SMALL
= F.relu(sigma0) + SMALL
s2 = torch.sqrt(s2)
sigma
# Linear predictors. sum of alpha * exp(mdn_lin_preds) = exp(lin_pred)
# Adjust to make sure they equate.
= torch.logsumexp(log_mix_prob + median0, dim=-1).unsqueeze(1) - predictor
log_scaling_factor
# Mean is exp(mu + 0.5 * sigma^2)
= median0 - log_scaling_factor - 0.5 * s2
mu
return alpha, mu, sigma
Number of distributions
We will make a generic assumption for three distributions being: 1. Zeroes (most records are zero payment at this granular claim/period level), 2. Minor payments, and 3. Major payments
More distributions can be used e.g. when there are several peril categories, but assuming the presence of these three seems generally reasonable.
Initialisation
We will need to initialise our model intelligently for numerical stability and faster convergence.
For the initial values, we will assume large claims are 5% of the claims and 10x larger than the average, the model will train this to the right levels.
= (dat.loc[(dat.train_ind == 1) & (dat.payment_size < 0.001)].claim_no.count()) / (dat.loc[(dat.train_ind == 1)].claim_no.count())
percent_zeroes = np.log(dat.loc[(dat.train_ind == 1) & (dat.payment_size > 0), ["payment_size"]].values).mean()
mean_log_payment
= dat.loc[(dat.train_ind == 1) & (dat.payment_size > 0), ["payment_size"]].values.mean()
mean_payment_encl_zeroes = dat.loc[(dat.train_ind == 1), ["payment_size"]].values.mean()
mean_payment_incl_zeroes
= np.sqrt((np.log(mean_payment_encl_zeroes) - mean_log_payment) * 2)
sigma_payment
= np.array([
mdn_init_bias # Alpha
np.log(percent_zeroes),1-percent_zeroes) * 0.95),
np.log((1-percent_zeroes) * 0.05),
np.log((# Mu
np.log(SMALL),+ np.log(0.526),
mean_log_payment + np.log(10),
mean_log_payment # Sigma
-1.0,
sigma_payment,
sigma_payment,=np.float32)
], dtype
{"Percent of records zeroes": percent_zeroes,
"Alpha bias after softmax": F.softmax(torch.tensor(mdn_init_bias[0:3]),dim=-1), # This should return the correct zero percentage
"Mean log payment": mean_log_payment,
"Sigma (non-zero claims)": sigma_payment,
"Mean payment (excl zeroes)": mean_payment_encl_zeroes,
"Mean Payment (incl zeroes)": mean_payment_incl_zeroes
}
{'Percent of records zeroes': 0.8141110448802756,
'Alpha bias after softmax': tensor([0.8141, 0.1766, 0.0093]),
'Mean log payment': 9.755892,
'Sigma (non-zero claims)': 1.4647239534347847,
'Mean payment (excl zeroes)': 50443.03,
'Mean Payment (incl zeroes)': 9376.803}
Model fitting
Fit the MDN below, with some validation checks that the module and initialisation has been performed correctly.
= Pipeline(
model_spline_mdn =[
steps"keep", ColumnKeeper(["occurrence_period", "development_period"])), # both
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", TabularNetRegressor(
(
LognormalSplineMDN, =20,
n_hidden=mdn_iter,
max_iter=0.025,
max_lr=3,
n_gaussians=log_mdn_loss_fn,
criterion=[model_spline_net["model"].module_.state_dict(), mdn_init_bias],
init_extra# clip_value=3.0,
=True # Noting numerical stability, keep the best model instead
keep_best_model
)
)
]
)
# Fit model
model_spline_mdn.fit(== 1],
dat.loc[dat.train_ind == 1, ["payment_size"]]
dat.loc[dat.train_ind )
Train RMSE: 55278.71875 Train Loss: 181.297119140625
Train RMSE: 55278.71875 Train Loss: 47.793006896972656
Train RMSE: 55278.71875 Train Loss: 4.475003242492676
Train RMSE: 55278.71875 Train Loss: 2.6302762031555176
Train RMSE: 55278.71875 Train Loss: 2.598949432373047
Train RMSE: 55278.71875 Train Loss: 2.5837700366973877
Train RMSE: 55278.71875 Train Loss: 2.5708096027374268
Train RMSE: 55278.71875 Train Loss: 2.5611422061920166
Train RMSE: 55278.71875 Train Loss: 2.555274724960327
Train RMSE: 55278.71875 Train Loss: 2.5528206825256348
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(criterion=<function log_mdn_loss_fn at 0x2bc8eeb00>, init_extra=[OrderedDict([('interactions', tensor(-5.0061)), ('oneways.0.weight', tensor([[-0.0981], [-0.6153], [-0.0221], [-0.5114], [ 0.0978], [ 0.6668], [ 0.1500], [... tensor([[ 3.2120, -6.1099, -6.2423, -6.5094, -3.1338, 7.8484, 2.6359, 6.3528, 4.7452, -5.0441, 7.7757, -3.3306, 3.7796, 4.8792, 6.3781, -4.6370, -0.9826, -4.2656, -6.8903, -4.4056]]))]), array([ -0.20565851, -1.7338991 , -4.678338 , -16.118095 , 9.113438 , 12.058476 , -1. , 1.464724 , 1.464724 ], dtype=float32)], keep_best_model=True, max_iter=500, max_lr=0.025, module=<class '__main__.LognormalSplineMDN'>))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(criterion=<function log_mdn_loss_fn at 0x2bc8eeb00>, init_extra=[OrderedDict([('interactions', tensor(-5.0061)), ('oneways.0.weight', tensor([[-0.0981], [-0.6153], [-0.0221], [-0.5114], [ 0.0978], [ 0.6668], [ 0.1500], [... tensor([[ 3.2120, -6.1099, -6.2423, -6.5094, -3.1338, 7.8484, 2.6359, 6.3528, 4.7452, -5.0441, 7.7757, -3.3306, 3.7796, 4.8792, 6.3781, -4.6370, -0.9826, -4.2656, -6.8903, -4.4056]]))]), array([ -0.20565851, -1.7338991 , -4.678338 , -16.118095 , 9.113438 , 12.058476 , -1. , 1.464724 , 1.464724 ], dtype=float32)], keep_best_model=True, max_iter=500, max_lr=0.025, module=<class '__main__.LognormalSplineMDN'>))])
ColumnKeeper(cols=['occurrence_period', 'development_period'])
MinMaxScaler()
TabularNetRegressor(criterion=<function log_mdn_loss_fn at 0x2bc8eeb00>, init_extra=[OrderedDict([('interactions', tensor(-5.0061)), ('oneways.0.weight', tensor([[-0.0981], [-0.6153], [-0.0221], [-0.5114], [ 0.0978], [ 0.6668], [ 0.1500], [-0.1200], [-0.5031], [ 0.4155], [-0.0469], [ 0.4390], [-0.5381], [-0.5175], [-0.2999], [-0.8372], [-0.5752], [ 0.5170], [ 0.3644], [ 0.9420]])),... tensor([[ 3.2120, -6.1099, -6.2423, -6.5094, -3.1338, 7.8484, 2.6359, 6.3528, 4.7452, -5.0441, 7.7757, -3.3306, 3.7796, 4.8792, 6.3781, -4.6370, -0.9826, -4.2656, -6.8903, -4.4056]]))]), array([ -0.20565851, -1.7338991 , -4.678338 , -16.118095 , 9.113438 , 12.058476 , -1. , 1.464724 , 1.464724 ], dtype=float32)], keep_best_model=True, max_iter=500, max_lr=0.025, module=<class '__main__.LognormalSplineMDN'>)
Check point estimates:
# Lower half triangle predictions
= make_pred_set_and_triangle(model_spline_mdn, dat.loc[dat.train_ind == 1], dat.loc[dat.train_ind == 0]) dat_mdn_pred, triangle_mdn_spline
make_model_subplots(model_spline_mdn, dat)
Check probabilistic predictions
def make_distribution_subplots(model):
= plt.subplots(2, 3, sharex='col', sharey=False, figsize=(15, 7))
fig, axes
0,0].title.set_text("All claims")
axes[0,1].title.set_text("Non-zero claims")
axes[0,2].title.set_text("Log non-zero claims")
axes[
0,0].hist((dat.loc[(dat.train_ind == 1), ["payment_size"]]))
axes[0,1].hist((dat.loc[(dat.train_ind == 1) & (dat.payment_size > 0), ["payment_size"]]))
axes[0,2].hist(np.log(dat.loc[(dat.train_ind == 1) & (dat.payment_size > 0), ["payment_size"]]))
axes[
= model.predict(dat.loc[(dat.train_ind == 1)], point_estimates=False)
alpha, mu, sigma
= torch.distributions.Categorical(alpha)
categorical = categorical.sample().unsqueeze(1)
alpha_sample = Variable(sigma.data.new(sigma.size(0), 1).normal_())
n_01 = mu.gather(1, alpha_sample) + n_01 * sigma.gather(1, alpha_sample)
sample = np.exp(sample.cpu().detach().numpy())
sample
1,0].hist(sample)
axes[1,1].hist(sample[sample > 0.01])
axes[1,2].hist(np.log(sample[sample > 0.01]))
axes[
make_distribution_subplots(model_spline_mdn)
We should hopefully see a resemblance between the distributions.
Cross Validation and Hyperparameter Search
Rolling Origin Validation
The problem with using a random split for cross validation is that records may pertain to the same time period, but we are predicting records that are completely out of the time period.
It’s fairly simple to implement a payment period based cross validation using PredefinedSplit
:
ps = PredefinedSplit(nn_train.cv_ind)
However, this did not lead to good quality results in our testing, so we will implement the rolling origin validation.
class RollingOriginSplit:
def __init__(self, start_cut, n_splits):
self.start_cut = start_cut
self.n_splits = n_splits
def split(self, X=None, y=None, groups=None):
"""Generate indices to split data into training and test set.
Parameters
----------
X : array-like of shape (n_samples, n_features)
Training data, where `n_samples` is the number of samples
and `n_features` is the number of features.
y : array-like of shape (n_samples,)
Always ignored, exists for compatibility.
groups : array-like of shape (n_samples,)
Payment period for splits
Yields
------
train : ndarray
The training set indices for that split.
test : ndarray
The testing set indices for that split.
"""
= pd.qcut(groups, self.start_cut + self.n_splits + 1, labels=False)
quantiles
for split_value in range(self.start_cut, self.start_cut + self.n_splits):
yield np.where(quantiles <= split_value)[0], np.where(quantiles == split_value + 1)[0]
= RollingOriginSplit(5, 5).split(groups=dat.loc[dat.train_ind == 1].payment_period)
ps for tr, te in ps:
print(len(tr), len(te), len(tr)+ len(te) )
37921 8376 46297
46297 6001 52298
52298 6336 58634
58634 6687 65321
65321 3462 68783
len(dat.loc[dat.train_ind == 1].index)
68783
Random Search:
Use Random Search - noting comments here about the limited benefits of Bayesian Search.
- l1 penalty, weight decay: Cap at 0.1 to avoid underfitting problems.
- dropout: try higher percentages.
- number of neurons in hidden layer: keep relatively small, the data is not that complex
- interactions: test with and without.
- max iter, max lr: - keep at these values or similar to get something close to convergence in a short number of iterations
= {
parameters_nn "l1_penalty": [0.0, 0.001, 0.01, 0.1],
"weight_decay": [0.0, 0.001, 0.01, 0.1],
"n_hidden": [5, 10, 20],
# "interactions": [0.0, 0.25, 0.5, 0.75, 1.0],
"dropout": [0, 0.25, 0.5],
"max_iter": [nn_cv_iter],
"max_lr": [0.05],
"verbose": [0],
"clip_value": [None, 3.0],
"keep_best_model": [True]
}
= Pipeline(
model_NN_CV =[
steps"keep", ColumnKeeper(["occurrence_period", "development_period"])),
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", RandomizedSearchCV(
(
TabularNetRegressor(SplineNet),
parameters_nn,=4, # Run in parallel (small model)
n_jobs=cv_runs, # Models train slowly, so try only a few models
n_iter=RollingOriginSplit(5,5).split(groups=dat.payment_period.loc[dat.train_ind == 1]),
cv=42,
random_state=False
refit
)),
]
)
model_NN_CV.fit(== 1],
dat.loc[dat.train_ind == 1, ["payment_size"]]
dat.loc[dat.train_ind
)
= model_NN_CV["model"].best_params_
bst_nn print("best parameters:", bst_nn)
= pd.DataFrame(model_NN_CV["model"].cv_results_)
cv_results
# Refit best model for longer iters
= Pipeline(
model_NN_CV =[
steps"keep", ColumnKeeper(["occurrence_period", "development_period"])),
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", TabularNetRegressor(
(
SplineNet, =bst_nn["l1_penalty"],
l1_penalty=bst_nn["weight_decay"],
weight_decay=bst_nn["n_hidden"],
n_hidden# interactions=bst_nn["interactions"],
=bst_nn["dropout"],
dropout=nn_iter,
max_iter=bst_nn["max_lr"],
max_lr
)
)
]
)
model_NN_CV.fit(== 1],
dat.loc[dat.train_ind == 1, ["payment_size"]]
dat.loc[dat.train_ind )
best parameters: {'weight_decay': 0.01, 'verbose': 0, 'n_hidden': 10, 'max_lr': 0.05, 'max_iter': 100, 'l1_penalty': 0.0, 'keep_best_model': True, 'dropout': 0.5, 'clip_value': None}
Train RMSE: 55491.87890625 Train Loss: -76383.359375
Train RMSE: 55388.3046875 Train Loss: -77174.5
Train RMSE: 55351.515625 Train Loss: -77285.703125
Train RMSE: 55333.53515625 Train Loss: -77376.625
Train RMSE: 55301.0390625 Train Loss: -77523.671875
Train RMSE: 55292.10546875 Train Loss: -77570.9921875
Train RMSE: 55290.66796875 Train Loss: -77579.3359375
Train RMSE: 55289.921875 Train Loss: -77583.3203125
Train RMSE: 55289.45703125 Train Loss: -77585.7421875
Train RMSE: 55289.2578125 Train Loss: -77586.828125
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(dropout=0.5, max_iter=500, max_lr=0.05, module=<class '__main__.SplineNet'>, n_hidden=10, weight_decay=0.01))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('keep', ColumnKeeper(cols=['occurrence_period', 'development_period'])), ('zero_to_one', MinMaxScaler()), ('model', TabularNetRegressor(dropout=0.5, max_iter=500, max_lr=0.05, module=<class '__main__.SplineNet'>, n_hidden=10, weight_decay=0.01))])
ColumnKeeper(cols=['occurrence_period', 'development_period'])
MinMaxScaler()
TabularNetRegressor(dropout=0.5, max_iter=500, max_lr=0.05, module=<class '__main__.SplineNet'>, n_hidden=10, weight_decay=0.01)
make_model_subplots(model_NN_CV, dat)
More predictions and diagnostics
= make_pred_set_and_triangle(model_NN_CV, dat.loc[dat.train_ind == 1], dat.loc[dat.train_ind == 0])
dat_nn_pred, triangle_nn_cv
(triangle_nn_cvlambda df: df.occurrence_period != cutoff]
.loc[= "development_period", columns = "occurrence_period", values = "payment_size_cumulative")
.pivot(index =True)
.plot(logy
)="lower center", bbox_to_anchor=(0.5, -0.8), ncol=5) plt.legend(loc
lambda df: ~df.train_ind]
(dat_nn_pred.loc["occurrence_period", "development_period", "payment_period"], as_index=False)
.groupby(["payment_size_cumulative": "sum", "payment_size": "sum"})
.agg({=["occurrence_period", "development_period"])
.sort_values(bylambda df: df.occurrence_period != cutoff]
.loc[= "development_period", columns = "occurrence_period", values = "payment_size_cumulative")
.pivot(index =True)
.plot(logy
)="lower center", bbox_to_anchor=(0.5, -0.8), ncol=5) plt.legend(loc
lambda df: ~df.train_ind]
(dat_nn_pred.loc["occurrence_period", "development_period", "payment_period"], as_index=False)
.groupby(["payment_size_cumulative": "sum", "payment_size": "sum"})
.agg({=["occurrence_period", "development_period"])
.sort_values(bylambda df: df.occurrence_period != cutoff]
.loc[= "development_period", columns = "occurrence_period", values = "payment_size")
.pivot(index =True)
.plot(logy
)="lower center", bbox_to_anchor=(0.5, -0.8), ncol=5) plt.legend(loc
= "occurrence_period", columns = "development_period", values = "payment_size_cumulative") -
(triangle_nn_cv.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative")) triangle_cl.pivot(index
development_period | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
occurrence_period | |||||||||||||||||||||
1 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2 | 0.00 | 0.00 | 0.00 | 0.03 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | 93,131.00 |
3 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -64,598.00 | 38,252.00 |
4 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 88,980.00 | -9,110.00 | 83,940.00 |
5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 2.00 | 0.00 | 0.00 | 0.00 | -40,644.00 | 63,272.00 | -33,622.00 | 73,202.00 |
6 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 88,402.00 | 55,026.00 | 183,736.00 | 84,830.00 | 214,934.00 |
7 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.25 | -0.50 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 123,190.00 | 203,302.00 | 178,360.00 | 292,464.00 | 210,556.00 | 325,294.00 |
8 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | -68,508.00 | 72,516.00 | 151,964.00 | 75,346.00 | 213,344.00 | 56,508.00 | 201,310.00 |
9 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 19,458.00 | 530.00 | 129,638.00 | 212,160.00 | 180,378.00 | 300,830.00 | 207,586.00 | 329,414.00 |
10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 8,376.00 | 25,792.00 | -2,652.00 | 148,164.00 | 243,146.00 | 200,474.00 | 342,026.00 | 225,828.00 | 369,690.00 |
11 | 0.00 | 0.00 | 0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 0.00 | ... | -110,566.00 | -84,060.00 | -49,410.00 | -64,744.00 | 108,826.00 | 222,028.00 | 188,162.00 | 348,732.00 | 235,008.00 | 396,310.00 |
12 | 0.00 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -239,897.00 | -208,423.00 | -172,688.00 | -171,140.00 | -45,364.00 | 39,498.00 | 26,048.00 | 140,704.00 | 72,844.00 | 186,628.00 |
13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -324,870.00 | -311,592.00 | -288,906.00 | -316,024.00 | -150,024.00 | -44,542.00 | -87,842.00 | 67,404.00 | -55,680.00 | 101,642.00 |
14 | 0.00 | 0.00 | 0.00 | -0.12 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -8,586.00 | 86,876.00 | 182,180.00 | 234,580.00 | 422,040.00 | 559,716.00 | 583,550.00 | 747,702.00 | 704,150.00 | 861,450.00 |
15 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -982,604.00 | -1,005,588.00 | -1,014,728.00 | -1,077,710.00 | -918,006.00 | -824,770.00 | -898,744.00 | -744,430.00 | -905,256.00 | -744,854.00 |
16 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -282,120.00 | -224,990.00 | -163,476.00 | -148,460.00 | 30,198.00 | 153,602.00 | 145,724.00 | 306,866.00 | 225,208.00 | 383,688.00 |
17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -256,492.00 | -197,700.00 | -133,780.00 | -120,570.00 | 72,406.00 | 205,028.00 | 193,880.00 | 368,340.00 | 276,680.00 | 448,600.00 |
18 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -370,818.00 | -338,750.00 | -300,200.00 | -306,432.00 | -146,528.00 | -40,474.00 | -64,756.00 | 82,110.00 | -13,586.00 | 133,080.00 |
19 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -449,984.00 | -430,272.00 | -401,694.00 | -423,334.00 | -252,610.00 | -142,730.00 | -181,784.00 | -22,964.00 | -142,364.00 | 17,900.00 |
20 | 0.00 | 0.00 | 0.00 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 164,274.00 | 214,646.00 | 273,424.00 | 270,108.00 | 494,952.00 | 645,292.00 | 615,920.00 | 821,702.00 | 693,404.00 | 898,298.00 |
21 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 62,204.00 | 104,174.00 | 155,680.00 | 143,678.00 | 368,602.00 | 517,058.00 | 480,062.00 | 687,088.00 | 548,758.00 | 755,854.00 |
22 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.50 | 0.00 | 0.00 | 0.00 | ... | 723,910.00 | 796,366.00 | 873,416.00 | 895,982.00 | 1,108,224.00 | 1,255,884.00 | 1,250,668.00 | 1,441,464.00 | 1,349,894.00 | 1,537,004.00 |
23 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -1,136,422.00 | -1,123,384.00 | -1,098,592.00 | -1,134,396.00 | -934,076.00 | -807,472.00 | -862,422.00 | -674,666.00 | -826,740.00 | -636,134.00 |
24 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 1.00 | ... | -492,014.00 | -459,922.00 | -418,790.00 | -434,346.00 | -236,420.00 | -106,900.00 | -143,830.00 | 39,014.00 | -88,454.00 | 95,010.00 |
25 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 323,934.00 | 374,614.00 | 430,562.00 | 438,862.00 | 617,768.00 | 739,836.00 | 726,048.00 | 888,316.00 | 798,804.00 | 959,154.00 |
26 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.50 | 0.00 | 0.00 | -1.00 | ... | 1,089,664.00 | 1,157,726.00 | 1,230,464.00 | 1,250,350.00 | 1,455,058.00 | 1,597,054.00 | 1,590,378.00 | 1,774,652.00 | 1,684,174.00 | 1,865,102.00 |
27 | 0.00 | 0.00 | 0.00 | -0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | ... | -1,407,048.00 | -1,401,292.00 | -1,380,964.00 | -1,430,996.00 | -1,203,368.00 | -1,061,596.00 | -1,132,224.00 | -917,620.00 | -1,101,164.00 | -882,288.00 |
28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -3,867,864.00 | -3,915,048.00 | -3,940,716.00 | -4,044,824.00 | -3,819,404.00 | -3,691,204.00 | -3,808,944.00 | -3,589,088.00 | -3,833,576.00 | -3,603,444.00 |
29 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -2,378,108.00 | -2,385,600.00 | -2,375,208.00 | -2,444,524.00 | -2,194,076.00 | -2,041,280.00 | -2,131,484.00 | -1,893,444.00 | -2,111,792.00 | -1,867,468.00 |
30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -997,216.00 | -986,856.00 | -962,840.00 | -1,007,028.00 | -783,848.00 | -643,760.00 | -708,744.00 | -498,988.00 | -673,348.00 | -459,944.00 |
31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | -3,378,252.00 | -3,408,740.00 | -3,420,564.00 | -3,505,188.00 | -3,288,928.00 | -3,162,528.00 | -3,262,124.00 | -3,053,256.00 | -3,270,276.00 | -3,053,236.00 |
32 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | -141,523.00 | ... | -1,893,108.00 | -1,893,272.00 | -1,878,052.00 | -1,934,264.00 | -1,706,344.00 | -1,565,756.00 | -1,641,832.00 | -1,426,128.00 | -1,616,940.00 | -1,396,280.00 |
33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 296,923.00 | 620,535.00 | ... | 5,173,244.00 | 5,295,080.00 | 5,421,048.00 | 5,472,108.00 | 5,775,676.00 | 5,991,068.00 | 6,000,060.00 | 6,270,432.00 | 6,161,048.00 | 6,424,068.00 |
34 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -216,282.00 | -316,790.00 | -384,694.00 | ... | -1,173,348.00 | -1,152,364.00 | -1,116,768.00 | -1,158,684.00 | -900,172.00 | -735,840.00 | -802,992.00 | -561,276.00 | -752,588.00 | -507,672.00 |
35 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -373,199.00 | -773,315.00 | -1,042,005.00 | -1,249,327.00 | ... | -3,836,456.00 | -3,845,296.00 | -3,834,548.00 | -3,910,796.00 | -3,637,768.00 | -3,471,352.00 | -3,570,292.00 | -3,310,700.00 | -3,549,524.00 | -3,283,008.00 |
36 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | -393,696.50 | -786,715.50 | -1,205,360.00 | -1,403,664.00 | -1,565,677.00 | ... | -3,568,540.00 | -3,567,764.00 | -3,549,048.00 | -3,614,084.00 | -3,346,332.00 | -3,180,952.00 | -3,269,448.00 | -3,016,180.00 | -3,239,192.00 | -2,980,216.00 |
37 | 0.00 | 0.00 | 0.00 | 0.00 | -922,144.00 | -1,988,635.00 | -3,151,961.00 | -4,412,670.00 | -5,477,120.00 | -6,461,321.00 | ... | -19,109,192.00 | -19,300,120.00 | -19,449,212.00 | -19,705,488.00 | -19,464,360.00 | -19,359,616.00 | -19,612,516.00 | -19,357,860.00 | -19,785,976.00 | -19,504,272.00 |
38 | 0.00 | 0.00 | 0.00 | 89,275.38 | 93,749.00 | 50,939.50 | -46,986.50 | -176,625.00 | -170,175.00 | -133,220.00 | ... | 436,856.00 | 482,736.00 | 540,336.00 | 522,492.00 | 787,584.00 | 961,720.00 | 914,880.00 | 1,159,376.00 | 992,084.00 | 1,237,080.00 |
39 | 0.00 | 0.00 | 160,790.56 | 608,276.75 | 1,066,243.25 | 1,582,669.00 | 2,213,820.00 | 2,848,667.00 | 3,591,825.00 | 4,358,758.00 | ... | 14,410,724.00 | 14,639,306.00 | 14,862,680.00 | 15,005,756.00 | 15,382,996.00 | 15,668,460.00 | 15,749,392.00 | 16,074,676.00 | 16,030,288.00 | 16,337,608.00 |
40 | 0.00 | 405,663.06 | 1,325,837.00 | 2,517,204.25 | 4,020,799.00 | 5,775,195.50 | 7,758,965.00 | 9,882,142.00 | 12,064,130.00 | 14,237,541.00 | ... | 42,502,060.00 | 43,059,632.00 | 43,571,484.00 | 44,040,956.00 | 44,471,220.00 | 44,865,264.00 | 45,225,892.00 | 45,555,720.00 | 45,857,196.00 | 46,132,608.00 |
40 rows × 40 columns
# MSE / 10^12
# Exclude occurrence period == 40 since there's no data there at cutoff
sum((
np.lambda df: df.occurrence_period != cutoff].payment_size_cumulative.values -
triangle_nn_cv.loc[lambda df: df.occurrence_period != cutoff].payment_size_cumulative.values
triangle.loc[**2)/10**12 )
10739.951196962817
Leveraging individual data
So far we have only used Occurrence Period and Development Period, not any individual claim features. Does the dataset enable us to improve predictions by using claims level information?
First let us look at the features.
= plt.subplots(len(data_cols), sharex=False, sharey=True, figsize=(15, 20))
fig, axs
for i, f in enumerate(data_cols):
"payment_size"])
axs[i].scatter(dat[f], dat[= np.polyfit(dat[f], dat["payment_size"], 1)
m, b *dat[f]+b,color='orange')
axs[i].plot(dat[f], m axs[i].set_title(f)
Results are very noisy, so look at data with features in a decile format:
= plt.subplots(len(data_cols), sharex=False, sharey=True, figsize=(7, 40))
fig, axs
for i, f in enumerate(data_cols):
= dat.copy()
dat_copy "decile"] = pd.qcut(dat[f], 10, labels=False, duplicates='drop')
dat_copy[= dat_copy.groupby("decile").agg("mean").reset_index()
X_sum
axs[i].scatter(X_sum.index, X_sum.payment_size) axs[i].set_title(f)
Data Augmentation
Idea: Claim information as at every balance date
Consider claim number 2000 (as an example):
= dat.loc[dat.claim_no==2000, ["claim_no", "occurrence_period", "train_ind", "payment_size"] + data_cols]
sample_df sample_df
claim_no | occurrence_period | train_ind | payment_size | notidel | occurrence_time | development_period | payment_period | has_payment_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | log1_incurred_to_prior_period | has_outstanding_to_prior_period | log1_outstanding_to_prior_period | payment_count_to_prior_period | transaction_count_to_prior_period | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
79961 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 1 | 23 | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 | 0.00 | 0.00 |
79962 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 2 | 24 | 0 | 0.00 | 1 | 10.46 | 1 | 10.46 | 0.00 | 1.00 |
79963 | 2000 | 22 | True | 12,924.67 | 1.69 | 21.23 | 3 | 25 | 0 | 0.00 | 1 | 10.46 | 1 | 10.46 | 0.00 | 1.00 |
79964 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 4 | 26 | 1 | 9.47 | 1 | 10.72 | 1 | 10.38 | 1.00 | 2.00 |
79965 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 5 | 27 | 1 | 9.47 | 1 | 10.72 | 1 | 10.38 | 1.00 | 2.00 |
79966 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 6 | 28 | 1 | 9.47 | 1 | 10.72 | 1 | 10.38 | 1.00 | 2.00 |
79967 | 2000 | 22 | True | 14,697.76 | 1.69 | 21.23 | 7 | 29 | 1 | 9.47 | 1 | 10.93 | 1 | 10.67 | 1.00 | 3.00 |
79968 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 8 | 30 | 1 | 10.23 | 1 | 10.93 | 1 | 10.25 | 2.00 | 4.00 |
79969 | 2000 | 22 | True | 13,963.72 | 1.69 | 21.23 | 9 | 31 | 1 | 10.23 | 1 | 10.94 | 1 | 10.27 | 2.00 | 5.00 |
79970 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 10 | 32 | 1 | 10.64 | 1 | 10.94 | 1 | 9.61 | 3.00 | 6.00 |
79971 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 11 | 33 | 1 | 10.64 | 1 | 10.94 | 1 | 9.61 | 3.00 | 6.00 |
79972 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 12 | 34 | 1 | 10.64 | 1 | 10.94 | 1 | 9.61 | 3.00 | 6.00 |
79973 | 2000 | 22 | True | 14,848.13 | 1.69 | 21.23 | 13 | 35 | 1 | 10.64 | 1 | 10.94 | 1 | 9.61 | 3.00 | 6.00 |
79974 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 14 | 36 | 1 | 10.94 | 1 | 10.99 | 1 | 8.00 | 4.00 | 7.00 |
79975 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 15 | 37 | 1 | 10.94 | 1 | 12.49 | 1 | 12.25 | 4.00 | 8.00 |
79976 | 2000 | 22 | True | 15,405.11 | 1.69 | 21.23 | 16 | 38 | 1 | 10.94 | 1 | 12.49 | 1 | 12.25 | 4.00 | 8.00 |
79977 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 17 | 39 | 1 | 11.18 | 1 | 12.36 | 1 | 12.00 | 5.00 | 9.00 |
79978 | 2000 | 22 | True | 0.00 | 1.69 | 21.23 | 18 | 40 | 1 | 11.18 | 1 | 12.33 | 1 | 11.95 | 5.00 | 10.00 |
79979 | 2000 | 22 | False | 460,153.69 | 1.69 | 21.23 | 19 | 41 | 1 | 11.18 | 1 | 13.29 | 1 | 13.16 | 5.00 | 11.00 |
79980 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 20 | 42 | 1 | 13.18 | 1 | 13.29 | 1 | 10.95 | 6.00 | 12.00 |
79981 | 2000 | 22 | False | 57,363.51 | 1.69 | 21.23 | 21 | 43 | 1 | 13.18 | 1 | 13.29 | 1 | 10.96 | 6.00 | 13.00 |
79982 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 22 | 44 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79983 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 23 | 45 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79984 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 24 | 46 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79985 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 25 | 47 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79986 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 26 | 48 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79987 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 27 | 49 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79988 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 28 | 50 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79989 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 29 | 51 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79990 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 30 | 52 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79991 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 31 | 53 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79992 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 32 | 54 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79993 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 33 | 55 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79994 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 34 | 56 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79995 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 35 | 57 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79996 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 36 | 58 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79997 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 37 | 59 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79998 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 38 | 60 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
79999 | 2000 | 22 | False | 0.00 | 1.69 | 21.23 | 39 | 61 | 1 | 13.29 | 1 | 13.29 | 0 | 0.00 | 7.00 | 14.00 |
Each record represents one development period. The features are based on the history up to the development period, so at every point, we are predicting the next payment without using any future information.
However, this is any future information relative to the period. The challenge is that actuarial models are often used to predict ultimate claims, not just for the following period. Consequently, we need to predict the future payments based on information up to some date.
def backdate(df, backdate_periods, keep_cols):
= [*set(["claim_no", "occurrence_period", "development_period", "payment_period"] + keep_cols)]
dedupe = df.loc[:, dedupe].copy()
bd "development_period"]= bd.development_period + backdate_periods
bd[={"payment_period": "payment_period_as_at"}, inplace=True)
bd.rename(columns= df[["claim_no", "occurrence_period", "development_period", "train_ind", "payment_size", "payment_period", "occurrence_time", "notidel"]].assign(
df= lambda df: df.development_period - backdate_periods,
data_as_at_development_period = backdate_periods
backdate_periods
).merge(
bd,='left',
how=["claim_no", "occurrence_period", "development_period"],
on=[None, "_backdatedrop"]
suffixes
)return df.drop(df.filter(regex='_backdatedrop').columns, axis=1)
=1, keep_cols=data_cols) backdate(sample_df, backdate_periods
claim_no | occurrence_period | development_period | train_ind | payment_size | payment_period | occurrence_time | notidel | data_as_at_development_period | backdate_periods | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | log1_incurred_to_prior_period | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2000 | 22 | 1 | True | 0.00 | 23 | 21.23 | 1.69 | 0 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 2000 | 22 | 2 | True | 0.00 | 24 | 21.23 | 1.69 | 1 | 1 | 0.00 | 0.00 | 0.00 | 23.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2 | 2000 | 22 | 3 | True | 12,924.67 | 25 | 21.23 | 1.69 | 2 | 1 | 10.46 | 0.00 | 1.00 | 24.00 | 0.00 | 1.00 | 1.00 | 0.00 | 10.46 |
3 | 2000 | 22 | 4 | True | 0.00 | 26 | 21.23 | 1.69 | 3 | 1 | 10.46 | 0.00 | 1.00 | 25.00 | 0.00 | 1.00 | 1.00 | 0.00 | 10.46 |
4 | 2000 | 22 | 5 | True | 0.00 | 27 | 21.23 | 1.69 | 4 | 1 | 10.38 | 9.47 | 1.00 | 26.00 | 1.00 | 1.00 | 2.00 | 1.00 | 10.72 |
5 | 2000 | 22 | 6 | True | 0.00 | 28 | 21.23 | 1.69 | 5 | 1 | 10.38 | 9.47 | 1.00 | 27.00 | 1.00 | 1.00 | 2.00 | 1.00 | 10.72 |
6 | 2000 | 22 | 7 | True | 14,697.76 | 29 | 21.23 | 1.69 | 6 | 1 | 10.38 | 9.47 | 1.00 | 28.00 | 1.00 | 1.00 | 2.00 | 1.00 | 10.72 |
7 | 2000 | 22 | 8 | True | 0.00 | 30 | 21.23 | 1.69 | 7 | 1 | 10.67 | 9.47 | 1.00 | 29.00 | 1.00 | 1.00 | 3.00 | 1.00 | 10.93 |
8 | 2000 | 22 | 9 | True | 13,963.72 | 31 | 21.23 | 1.69 | 8 | 1 | 10.25 | 10.23 | 1.00 | 30.00 | 1.00 | 1.00 | 4.00 | 2.00 | 10.93 |
9 | 2000 | 22 | 10 | True | 0.00 | 32 | 21.23 | 1.69 | 9 | 1 | 10.27 | 10.23 | 1.00 | 31.00 | 1.00 | 1.00 | 5.00 | 2.00 | 10.94 |
10 | 2000 | 22 | 11 | True | 0.00 | 33 | 21.23 | 1.69 | 10 | 1 | 9.61 | 10.64 | 1.00 | 32.00 | 1.00 | 1.00 | 6.00 | 3.00 | 10.94 |
11 | 2000 | 22 | 12 | True | 0.00 | 34 | 21.23 | 1.69 | 11 | 1 | 9.61 | 10.64 | 1.00 | 33.00 | 1.00 | 1.00 | 6.00 | 3.00 | 10.94 |
12 | 2000 | 22 | 13 | True | 14,848.13 | 35 | 21.23 | 1.69 | 12 | 1 | 9.61 | 10.64 | 1.00 | 34.00 | 1.00 | 1.00 | 6.00 | 3.00 | 10.94 |
13 | 2000 | 22 | 14 | True | 0.00 | 36 | 21.23 | 1.69 | 13 | 1 | 9.61 | 10.64 | 1.00 | 35.00 | 1.00 | 1.00 | 6.00 | 3.00 | 10.94 |
14 | 2000 | 22 | 15 | True | 0.00 | 37 | 21.23 | 1.69 | 14 | 1 | 8.00 | 10.94 | 1.00 | 36.00 | 1.00 | 1.00 | 7.00 | 4.00 | 10.99 |
15 | 2000 | 22 | 16 | True | 15,405.11 | 38 | 21.23 | 1.69 | 15 | 1 | 12.25 | 10.94 | 1.00 | 37.00 | 1.00 | 1.00 | 8.00 | 4.00 | 12.49 |
16 | 2000 | 22 | 17 | True | 0.00 | 39 | 21.23 | 1.69 | 16 | 1 | 12.25 | 10.94 | 1.00 | 38.00 | 1.00 | 1.00 | 8.00 | 4.00 | 12.49 |
17 | 2000 | 22 | 18 | True | 0.00 | 40 | 21.23 | 1.69 | 17 | 1 | 12.00 | 11.18 | 1.00 | 39.00 | 1.00 | 1.00 | 9.00 | 5.00 | 12.36 |
18 | 2000 | 22 | 19 | False | 460,153.69 | 41 | 21.23 | 1.69 | 18 | 1 | 11.95 | 11.18 | 1.00 | 40.00 | 1.00 | 1.00 | 10.00 | 5.00 | 12.33 |
19 | 2000 | 22 | 20 | False | 0.00 | 42 | 21.23 | 1.69 | 19 | 1 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
20 | 2000 | 22 | 21 | False | 57,363.51 | 43 | 21.23 | 1.69 | 20 | 1 | 10.95 | 13.18 | 1.00 | 42.00 | 1.00 | 1.00 | 12.00 | 6.00 | 13.29 |
21 | 2000 | 22 | 22 | False | 0.00 | 44 | 21.23 | 1.69 | 21 | 1 | 10.96 | 13.18 | 1.00 | 43.00 | 1.00 | 1.00 | 13.00 | 6.00 | 13.29 |
22 | 2000 | 22 | 23 | False | 0.00 | 45 | 21.23 | 1.69 | 22 | 1 | 0.00 | 13.29 | 1.00 | 44.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
23 | 2000 | 22 | 24 | False | 0.00 | 46 | 21.23 | 1.69 | 23 | 1 | 0.00 | 13.29 | 1.00 | 45.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
24 | 2000 | 22 | 25 | False | 0.00 | 47 | 21.23 | 1.69 | 24 | 1 | 0.00 | 13.29 | 1.00 | 46.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
25 | 2000 | 22 | 26 | False | 0.00 | 48 | 21.23 | 1.69 | 25 | 1 | 0.00 | 13.29 | 1.00 | 47.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
26 | 2000 | 22 | 27 | False | 0.00 | 49 | 21.23 | 1.69 | 26 | 1 | 0.00 | 13.29 | 1.00 | 48.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
27 | 2000 | 22 | 28 | False | 0.00 | 50 | 21.23 | 1.69 | 27 | 1 | 0.00 | 13.29 | 1.00 | 49.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
28 | 2000 | 22 | 29 | False | 0.00 | 51 | 21.23 | 1.69 | 28 | 1 | 0.00 | 13.29 | 1.00 | 50.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
29 | 2000 | 22 | 30 | False | 0.00 | 52 | 21.23 | 1.69 | 29 | 1 | 0.00 | 13.29 | 1.00 | 51.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
30 | 2000 | 22 | 31 | False | 0.00 | 53 | 21.23 | 1.69 | 30 | 1 | 0.00 | 13.29 | 1.00 | 52.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
31 | 2000 | 22 | 32 | False | 0.00 | 54 | 21.23 | 1.69 | 31 | 1 | 0.00 | 13.29 | 1.00 | 53.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
32 | 2000 | 22 | 33 | False | 0.00 | 55 | 21.23 | 1.69 | 32 | 1 | 0.00 | 13.29 | 1.00 | 54.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
33 | 2000 | 22 | 34 | False | 0.00 | 56 | 21.23 | 1.69 | 33 | 1 | 0.00 | 13.29 | 1.00 | 55.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
34 | 2000 | 22 | 35 | False | 0.00 | 57 | 21.23 | 1.69 | 34 | 1 | 0.00 | 13.29 | 1.00 | 56.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
35 | 2000 | 22 | 36 | False | 0.00 | 58 | 21.23 | 1.69 | 35 | 1 | 0.00 | 13.29 | 1.00 | 57.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
36 | 2000 | 22 | 37 | False | 0.00 | 59 | 21.23 | 1.69 | 36 | 1 | 0.00 | 13.29 | 1.00 | 58.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
37 | 2000 | 22 | 38 | False | 0.00 | 60 | 21.23 | 1.69 | 37 | 1 | 0.00 | 13.29 | 1.00 | 59.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
38 | 2000 | 22 | 39 | False | 0.00 | 61 | 21.23 | 1.69 | 38 | 1 | 0.00 | 13.29 | 1.00 | 60.00 | 1.00 | 0.00 | 14.00 | 7.00 | 13.29 |
Idea: Pinning the Tail
So one challenge with the deep learning models is the tail.
We would have an assumption that claims will generally tail off to zero in later development periods and eventually all be settled.
The training data see claims tailing off to zero, but only for the early occurrence months. With the payment period cut-off, we have no tail data for more recent occurrence months.
This is not a problem for the GLM which is parametrized to have the development period effect across all occurrence periods, but for more complex models like the neural networks it’s possible for the model to extrapolate the tail wildly.
The proposed solution below is to augment the data with tail development periods with zero payments.
= (
extra_data == 1, [*set(["claim_no", "occurrence_period", "development_period", "payment_period", "train_ind"] + data_cols)]] # Training data
dat.loc[dat.train_ind "claim_no").last() # Last training record per claim
.groupby(={"payment_period": "payment_period_as_at", "development_period": "data_as_at_development_period"})
.rename(columns
.assign(= num_dev_periods + 1, # set dev period to be tail
development_period = lambda df: df.occurrence_period + num_dev_periods + 1,
payment_period = lambda df: num_dev_periods + 1 - df.payment_period,
backdate_periods = 0
payment_size
)
.reset_index()
) extra_data.head()
claim_no | data_as_at_development_period | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | log1_incurred_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | occurrence_time | occurrence_period | notidel | train_ind | development_period | payment_period | backdate_periods | payment_size | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 39 | 0.00 | 12.50 | 1 | 40 | 1 | 0 | 12.50 | 9.00 | 5.00 | 0.73 | 1 | 1.13 | True | 40 | 41 | -1 | 0 |
1 | 2 | 39 | 0.00 | 12.50 | 1 | 40 | 1 | 0 | 12.50 | 9.00 | 5.00 | 0.33 | 1 | 1.26 | True | 40 | 41 | -1 | 0 |
2 | 3 | 39 | 0.00 | 11.72 | 1 | 40 | 1 | 0 | 11.72 | 9.00 | 6.00 | 0.52 | 1 | 1.86 | True | 40 | 41 | -1 | 0 |
3 | 4 | 39 | 0.00 | 12.41 | 1 | 40 | 1 | 0 | 12.41 | 8.00 | 5.00 | 0.74 | 1 | 0.72 | True | 40 | 41 | -1 | 0 |
4 | 5 | 39 | 0.00 | 11.91 | 1 | 40 | 1 | 0 | 11.91 | 8.00 | 5.00 | 0.62 | 1 | 3.18 | True | 40 | 41 | -1 | 0 |
The dataset gets quite large with the backdating, as we repeat each claim development_period
squared times.
We will create a nn_train
that samples one record from each claim_no, development_period.
= [backdate(dat, backdate_periods=i, keep_cols=data_cols) for i in range(0, num_dev_periods)]
backdated_data = pd.concat(backdated_data + [extra_data], axis="rows")
all_data
= set(all_data.columns.to_list())
a = set(extra_data.columns.to_list())
b
assert list(b - a) == []
assert list(a - b) == []
= all_data.loc[all_data.train_ind == 1].loc[lambda df: ~np.isnan(df.payment_period_as_at)] # Filter out invalid payment period as at
nn_train_full = all_data.loc[all_data.train_ind == 0].loc[lambda df: df.payment_period_as_at==(cutoff + 1)].fillna(0) # As at balance date
nn_test = data_cols + ["backdate_periods"]
features
= nn_train_full.groupby(["claim_no", "development_period"]).sample(n=1, random_state=42)
nn_train == dat[dat.train_ind==1].index.size
nn_train.index.size
= pd.concat([nn_train.assign(train_ind=True), nn_test.assign(train_ind=False)])
nn_dat # Run below instead to not use these ideas for now:
# nn_train = dat.loc[dat.train_ind == 1]
# nn_test = dat.loc[dat.train_ind == 0]
= data_cols + ["backdate_periods"] features
A more sophisticated approach could be to sample different records per epoch. We create a batch sampling function to use in our training loop.
def claim_sampler(X, y):
= torch.tensor(
indices "claim_no", "development_period", "data_as_at_development_period"]]
nn_train_full[[
.reset_index()"claim_no", "development_period"])
.groupby([=1)
.sample(n
.index
)return torch.index_select(X, 0, indices), torch.index_select(y, 0, indices)
=True # Set to False to omit this logic. use_batching_logic
lambda df: df.claim_no == 2000].sort_values(["development_period", "data_as_at_development_period"]) nn_train_full.loc[
claim_no | occurrence_period | development_period | train_ind | payment_size | payment_period | occurrence_time | notidel | data_as_at_development_period | backdate_periods | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | log1_incurred_to_prior_period | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
75716 | 2000 | 22 | 1 | True | 0.00 | 23 | 21.23 | 1.69 | 1 | 0 | 0.00 | 0.00 | 0.00 | 23.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
75717 | 2000 | 22 | 2 | True | 0.00 | 24 | 21.23 | 1.69 | 1 | 1 | 0.00 | 0.00 | 0.00 | 23.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
75717 | 2000 | 22 | 2 | True | 0.00 | 24 | 21.23 | 1.69 | 2 | 0 | 10.46 | 0.00 | 1.00 | 24.00 | 0.00 | 1.00 | 1.00 | 0.00 | 10.46 |
75718 | 2000 | 22 | 3 | True | 12,924.67 | 25 | 21.23 | 1.69 | 1 | 2 | 0.00 | 0.00 | 0.00 | 23.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
75718 | 2000 | 22 | 3 | True | 12,924.67 | 25 | 21.23 | 1.69 | 2 | 1 | 10.46 | 0.00 | 1.00 | 24.00 | 0.00 | 1.00 | 1.00 | 0.00 | 10.46 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
75733 | 2000 | 22 | 18 | True | 0.00 | 40 | 21.23 | 1.69 | 15 | 3 | 12.25 | 10.94 | 1.00 | 37.00 | 1.00 | 1.00 | 8.00 | 4.00 | 12.49 |
75733 | 2000 | 22 | 18 | True | 0.00 | 40 | 21.23 | 1.69 | 16 | 2 | 12.25 | 10.94 | 1.00 | 38.00 | 1.00 | 1.00 | 8.00 | 4.00 | 12.49 |
75733 | 2000 | 22 | 18 | True | 0.00 | 40 | 21.23 | 1.69 | 17 | 1 | 12.00 | 11.18 | 1.00 | 39.00 | 1.00 | 1.00 | 9.00 | 5.00 | 12.36 |
75733 | 2000 | 22 | 18 | True | 0.00 | 40 | 21.23 | 1.69 | 18 | 0 | 11.95 | 11.18 | 1.00 | 40.00 | 1.00 | 1.00 | 10.00 | 5.00 | 12.33 |
1999 | 2000 | 22 | 40 | True | 0.00 | 62 | 21.23 | 1.69 | 18 | -22 | 11.95 | 11.18 | 1.00 | 40.00 | 1.00 | 1.00 | 10.00 | 5.00 | 12.33 |
172 rows × 19 columns
lambda df: df.claim_no == 2000] nn_test.loc[
claim_no | occurrence_period | development_period | train_ind | payment_size | payment_period | occurrence_time | notidel | data_as_at_development_period | backdate_periods | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | log1_incurred_to_prior_period | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
75734 | 2000 | 22 | 19 | False | 460,153.69 | 41 | 21.23 | 1.69 | 19 | 0 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75735 | 2000 | 22 | 20 | False | 0.00 | 42 | 21.23 | 1.69 | 19 | 1 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75736 | 2000 | 22 | 21 | False | 57,363.51 | 43 | 21.23 | 1.69 | 19 | 2 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75737 | 2000 | 22 | 22 | False | 0.00 | 44 | 21.23 | 1.69 | 19 | 3 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75738 | 2000 | 22 | 23 | False | 0.00 | 45 | 21.23 | 1.69 | 19 | 4 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75739 | 2000 | 22 | 24 | False | 0.00 | 46 | 21.23 | 1.69 | 19 | 5 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75740 | 2000 | 22 | 25 | False | 0.00 | 47 | 21.23 | 1.69 | 19 | 6 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75741 | 2000 | 22 | 26 | False | 0.00 | 48 | 21.23 | 1.69 | 19 | 7 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75742 | 2000 | 22 | 27 | False | 0.00 | 49 | 21.23 | 1.69 | 19 | 8 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75743 | 2000 | 22 | 28 | False | 0.00 | 50 | 21.23 | 1.69 | 19 | 9 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75744 | 2000 | 22 | 29 | False | 0.00 | 51 | 21.23 | 1.69 | 19 | 10 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75745 | 2000 | 22 | 30 | False | 0.00 | 52 | 21.23 | 1.69 | 19 | 11 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75746 | 2000 | 22 | 31 | False | 0.00 | 53 | 21.23 | 1.69 | 19 | 12 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75747 | 2000 | 22 | 32 | False | 0.00 | 54 | 21.23 | 1.69 | 19 | 13 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75748 | 2000 | 22 | 33 | False | 0.00 | 55 | 21.23 | 1.69 | 19 | 14 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75749 | 2000 | 22 | 34 | False | 0.00 | 56 | 21.23 | 1.69 | 19 | 15 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75750 | 2000 | 22 | 35 | False | 0.00 | 57 | 21.23 | 1.69 | 19 | 16 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75751 | 2000 | 22 | 36 | False | 0.00 | 58 | 21.23 | 1.69 | 19 | 17 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75752 | 2000 | 22 | 37 | False | 0.00 | 59 | 21.23 | 1.69 | 19 | 18 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75753 | 2000 | 22 | 38 | False | 0.00 | 60 | 21.23 | 1.69 | 19 | 19 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
75754 | 2000 | 22 | 39 | False | 0.00 | 61 | 21.23 | 1.69 | 19 | 20 | 13.16 | 11.18 | 1.00 | 41.00 | 1.00 | 1.00 | 11.00 | 5.00 | 13.29 |
==40] nn_test.loc[nn_test.occurrence_period
claim_no | occurrence_period | development_period | train_ind | payment_size | payment_period | occurrence_time | notidel | data_as_at_development_period | backdate_periods | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | log1_incurred_to_prior_period | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
135553 | 3582 | 40 | 1 | False | 0.00 | 41 | 39.26 | 0.22 | 1 | 0 | 10.53 | 0.00 | 1.00 | 41.00 | 0.00 | 1.00 | 1.00 | 0.00 | 10.53 |
135630 | 3584 | 40 | 1 | False | 0.00 | 41 | 39.29 | 1.48 | 1 | 0 | 0.00 | 0.00 | 0.00 | 41.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
135707 | 3586 | 40 | 1 | False | 32,340.10 | 41 | 39.95 | 0.28 | 1 | 0 | 0.00 | 0.00 | 0.00 | 41.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
135781 | 3588 | 40 | 1 | False | 0.00 | 41 | 39.67 | 0.87 | 1 | 0 | 0.00 | 0.00 | 0.00 | 41.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
135820 | 3589 | 40 | 1 | False | 0.00 | 41 | 39.94 | 0.27 | 1 | 0 | 0.00 | 0.00 | 0.00 | 41.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
138157 | 3650 | 40 | 39 | False | 0.00 | 79 | 39.31 | 0.22 | 1 | 38 | 9.59 | 0.00 | 1.00 | 41.00 | 0.00 | 1.00 | 1.00 | 0.00 | 9.59 |
138344 | 3655 | 40 | 39 | False | 0.00 | 79 | 39.14 | 1.64 | 1 | 38 | 0.00 | 0.00 | 0.00 | 41.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
138494 | 3659 | 40 | 39 | False | 0.00 | 79 | 39.54 | 0.16 | 1 | 38 | 12.63 | 0.00 | 1.00 | 41.00 | 0.00 | 1.00 | 1.00 | 0.00 | 12.63 |
138534 | 3660 | 40 | 39 | False | 0.00 | 79 | 39.43 | 0.51 | 1 | 38 | 13.08 | 0.00 | 1.00 | 41.00 | 0.00 | 1.00 | 1.00 | 0.00 | 13.08 |
138574 | 3661 | 40 | 39 | False | 0.00 | 79 | 39.36 | 0.46 | 1 | 38 | 11.11 | 0.00 | 1.00 | 41.00 | 0.00 | 1.00 | 1.00 | 0.00 | 11.11 |
1209 rows × 19 columns
==40, nn_test.columns] dat.loc[dat.occurrence_period
claim_no | occurrence_period | development_period | train_ind | payment_size | payment_period | occurrence_time | notidel | data_as_at_development_period | backdate_periods | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | log1_incurred_to_prior_period | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143240 | 3582 | 40 | 0 | True | 0.00 | 40 | 39.26 | 0.22 | 0 | 0 | 0.00 | 0.00 | 0 | 40 | 0 | 0 | 0.00 | 0.00 | 0.00 |
143241 | 3582 | 40 | 1 | False | 0.00 | 41 | 39.26 | 0.22 | 1 | 0 | 10.53 | 0.00 | 1 | 41 | 0 | 1 | 1.00 | 0.00 | 10.53 |
143242 | 3582 | 40 | 2 | False | 15,226.59 | 42 | 39.26 | 0.22 | 2 | 0 | 10.53 | 0.00 | 1 | 42 | 0 | 1 | 1.00 | 0.00 | 10.53 |
143243 | 3582 | 40 | 3 | False | 0.00 | 43 | 39.26 | 0.22 | 3 | 0 | 10.43 | 9.63 | 1 | 43 | 1 | 1 | 2.00 | 1.00 | 10.80 |
143244 | 3582 | 40 | 4 | False | 16,196.52 | 44 | 39.26 | 0.22 | 4 | 0 | 10.43 | 9.63 | 1 | 44 | 1 | 1 | 2.00 | 1.00 | 10.80 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
146515 | 3663 | 40 | 35 | False | 0.00 | 75 | 39.87 | 5.27 | 35 | 0 | 0.00 | 13.61 | 1 | 75 | 1 | 0 | 9.00 | 6.00 | 13.61 |
146516 | 3663 | 40 | 36 | False | 0.00 | 76 | 39.87 | 5.27 | 36 | 0 | 0.00 | 13.61 | 1 | 76 | 1 | 0 | 9.00 | 6.00 | 13.61 |
146517 | 3663 | 40 | 37 | False | 0.00 | 77 | 39.87 | 5.27 | 37 | 0 | 0.00 | 13.61 | 1 | 77 | 1 | 0 | 9.00 | 6.00 | 13.61 |
146518 | 3663 | 40 | 38 | False | 0.00 | 78 | 39.87 | 5.27 | 38 | 0 | 0.00 | 13.61 | 1 | 78 | 1 | 0 | 9.00 | 6.00 | 13.61 |
146519 | 3663 | 40 | 39 | False | 0.00 | 79 | 39.87 | 5.27 | 39 | 0 | 0.00 | 13.61 | 1 | 79 | 1 | 0 | 9.00 | 6.00 | 13.61 |
3092 rows × 19 columns
Detailed Neural Network
This logic follows a similar pipeline to the basic neural network but including the features.
SplineNet:
= Pipeline(
model_resnet_detailed =[
steps"keep", ColumnKeeper(features)),
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", RandomizedSearchCV(
(
TabularNetRegressor(LogLinkResNet),
parameters_nn,=4, # Run in parallel (small model)
n_jobs=cv_runs, # Models train slowly, so try only a few models
n_iter=RollingOriginSplit(5,5).split(groups=nn_train.payment_period),
cv=42
random_state
)),
]
)
model_resnet_detailed.fit(
nn_train,"payment_size"]]
nn_train.loc[:, [
)
= model_resnet_detailed["model"].best_params_
bst_res_det print("best parameters:", bst_res_det)
= pd.DataFrame(model_resnet_detailed["model"].cv_results_)
cv_results_res_detailed
# Refit best model for longer iters
= Pipeline(
model_resnet_detailed =[
steps"keep", ColumnKeeper(features)),
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", TabularNetRegressor(
(
SplineNet, =bst_res_det["l1_penalty"],
l1_penalty=bst_res_det["weight_decay"],
weight_decay=bst_res_det["n_hidden"],
n_hidden=bst_res_det["dropout"],
dropout=nn_iter,
max_iter=bst_res_det["max_lr"],
max_lr=claim_sampler if use_batching_logic else None,
batch_function=mdn_iter/10, # takes over 1s to resample so iterate a few epochs per resample
rebatch_every_iter
)
)
]
)
if use_batching_logic:
model_resnet_detailed.fit(
nn_train_full,"payment_size"]]
nn_train_full.loc[:, [
)else:
model_resnet_detailed.fit(
nn_train,"payment_size"]]
nn_train.loc[:, [ )
best parameters: {'weight_decay': 0.01, 'verbose': 0, 'n_hidden': 10, 'max_lr': 0.05, 'max_iter': 100, 'l1_penalty': 0.0, 'keep_best_model': True, 'dropout': 0.5, 'clip_value': None}
Train RMSE: 54727.0859375 Train Loss: -72119.28125
refreshing batch on epoch 0
Train RMSE: 54085.41796875 Train Loss: -76427.2890625
refreshing batch on epoch 50
Train RMSE: 54353.5859375 Train Loss: -77095.7421875
refreshing batch on epoch 100
Train RMSE: 53831.265625 Train Loss: -77777.6484375
refreshing batch on epoch 150
Train RMSE: 53840.56640625 Train Loss: -77741.328125
refreshing batch on epoch 200
Train RMSE: 53813.30078125 Train Loss: -77861.6484375
refreshing batch on epoch 250
Train RMSE: 53815.609375 Train Loss: -77779.140625
refreshing batch on epoch 300
Train RMSE: 53772.734375 Train Loss: -78098.1171875
refreshing batch on epoch 350
Train RMSE: 53779.421875 Train Loss: -78083.203125
refreshing batch on epoch 400
Train RMSE: 53768.2265625 Train Loss: -78098.3671875
refreshing batch on epoch 450
make_model_subplots(model_resnet_detailed, dat)
make_model_subplots(model_resnet_detailed, nn_dat)
= make_pred_set_and_triangle(model_resnet_detailed, nn_train, nn_test) dat_resnet_det_pred, triangle_resnet_detailed
"pred_resnet_claims"] = model_resnet_detailed.predict(dat)
dat["pred_resnet_claims_decile"] = pd.qcut(dat["pred_resnet_claims"], 10, labels=False, duplicates='drop') dat[
# Train only
= dat.loc[dat.train_ind == 1].groupby("pred_resnet_claims_decile").agg("mean").reset_index()
X_sum
= dat.groupby("pred_resnet_claims_decile").agg("mean").reset_index()
X_sum
plt.scatter(X_sum.pred_resnet_claims, X_sum.payment_size)= max(X_sum.pred_resnet_claims.max(), X_sum.payment_size.max())
m 0, m],[0, m]) plt.plot([
# Test only
= dat.loc[dat.train_ind == 0].groupby("pred_resnet_claims_decile").agg("mean").reset_index()
X_sum
plt.scatter(X_sum.pred_resnet_claims, X_sum.payment_size)= max(X_sum.pred_resnet_claims.max(), X_sum.payment_size.max())
m
0, m],[0, m]) plt.plot([
SplineNet:
= Pipeline(
model_splinenet_detailed =[
steps"keep", ColumnKeeper(features)),
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", RandomizedSearchCV(
(
TabularNetRegressor(SplineNet),
parameters_nn,=4, # Run in parallel (small model)
n_jobs=cv_runs, # Models train slowly, so try only a few models
n_iter=RollingOriginSplit(5,5).split(groups=nn_train.payment_period),
cv=42
random_state
)),
]
)
model_splinenet_detailed.fit(
nn_train,"payment_size"]]
nn_train.loc[:, [
)
= model_splinenet_detailed["model"].best_params_
bst_det print("best parameters:", bst_det)
= pd.DataFrame(model_splinenet_detailed["model"].cv_results_)
cv_results_detailed
# Refit best model for longer iters
= Pipeline(
model_splinenet_detailed =[
steps"keep", ColumnKeeper(features)),
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", TabularNetRegressor(
(
SplineNet, =bst_det["l1_penalty"],
l1_penalty=bst_det["weight_decay"],
weight_decay=bst_det["n_hidden"],
n_hidden=bst_det["dropout"],
dropout=nn_iter,
max_iter=bst_det["max_lr"],
max_lr=claim_sampler if use_batching_logic else None,
batch_function=mdn_iter/10, # takes over 1s to resample so iterate a few epochs per resample
rebatch_every_iter
)
)
]
)
if use_batching_logic:
model_splinenet_detailed.fit(
nn_train_full,"payment_size"]]
nn_train_full.loc[:, [
)else:
model_splinenet_detailed.fit(
nn_train,"payment_size"]]
nn_train.loc[:, [ )
best parameters: {'weight_decay': 0.01, 'verbose': 0, 'n_hidden': 10, 'max_lr': 0.05, 'max_iter': 100, 'l1_penalty': 0.0, 'keep_best_model': True, 'dropout': 0.5, 'clip_value': None}
Train RMSE: 54725.71484375 Train Loss: -72119.28125
refreshing batch on epoch 0
Train RMSE: 54101.71484375 Train Loss: -76451.703125
refreshing batch on epoch 50
Train RMSE: 53882.55859375 Train Loss: -77524.34375
refreshing batch on epoch 100
Train RMSE: 53936.40234375 Train Loss: -76640.3984375
refreshing batch on epoch 150
Train RMSE: 53838.6875 Train Loss: -77731.515625
refreshing batch on epoch 200
Train RMSE: 53799.36328125 Train Loss: -77964.8125
refreshing batch on epoch 250
Train RMSE: 53773.82421875 Train Loss: -78052.8828125
refreshing batch on epoch 300
Train RMSE: 53774.40234375 Train Loss: -77963.5390625
refreshing batch on epoch 350
Train RMSE: 53782.2109375 Train Loss: -78030.90625
refreshing batch on epoch 400
Train RMSE: 53766.69921875 Train Loss: -78186.703125
refreshing batch on epoch 450
make_model_subplots(model_splinenet_detailed, dat)
make_model_subplots(model_splinenet_detailed, nn_dat)
= make_pred_set_and_triangle(model_splinenet_detailed, nn_train, nn_test) dat_spline_det_pred, triangle_spline_detailed
"pred_nn_claims"] = model_splinenet_detailed.predict(dat)
dat["pred_nn_claims_decile"] = pd.qcut(dat["pred_nn_claims"], 10, labels=False, duplicates='drop') dat[
# Train only
= dat.loc[dat.train_ind == 1].groupby("pred_nn_claims_decile").agg("mean").reset_index()
X_sum
= dat.groupby("pred_nn_claims_decile").agg("mean").reset_index()
X_sum
plt.scatter(X_sum.pred_nn_claims, X_sum.payment_size)= max(X_sum.pred_nn_claims.max(), X_sum.payment_size.max())
m 0, m],[0, m]) plt.plot([
# Test only
= dat.loc[dat.train_ind == 0].groupby("pred_nn_claims_decile").agg("mean").reset_index()
X_sum
plt.scatter(X_sum.pred_nn_claims, X_sum.payment_size)= max(X_sum.pred_nn_claims.max(), X_sum.payment_size.max())
m
0, m],[0, m]) plt.plot([
Detailed MDN
As we get to larger MDN model, the CV adds to the run-time, but based on the CV performance of the point-estimate models we are not really sure if it improves results.
So here the approach is:
- Single model run
- Add the features
- Bring across the interactions and dropout parameters from the point estimate CV
- Apply a low learn rate and clip norm for numerical stability.
= Pipeline(
model_MDN_detailed =[
steps"keep", ColumnKeeper(features)),
('zero_to_one', MinMaxScaler()), # Important! Standardize deep learning inputs.
("model", TabularNetRegressor(
(
LognormalSplineMDN, =bst_det["n_hidden"],
n_hidden=mdn_iter,
max_iter=0.025,
max_lr# weight decay and l1 may lead to unexpected behaviour, leave it off
=bst_det["dropout"],
dropout=3,
n_gaussians=log_mdn_loss_fn,
criterion=[model_splinenet_detailed["model"].module_.state_dict(), mdn_init_bias],
init_extra=True,
keep_best_model# clip_value=3.0,
=claim_sampler if use_batching_logic else None,
batch_function=mdn_iter/10, # takes over 1s to resample so iterate a few epochs per resample
rebatch_every_iter
)
)
]
)
# Fit model
if use_batching_logic:
model_MDN_detailed.fit(
nn_train_full,"payment_size"]]
nn_train_full.loc[:, [
) else:
model_MDN_detailed.fit(
nn_train,"payment_size"]]
nn_train.loc[:, [ )
Train RMSE: 53765.6328125 Train Loss: 155.98707580566406
refreshing batch on epoch 0
Train RMSE: 53765.6328125 Train Loss: 5.832543849945068
refreshing batch on epoch 50
Train RMSE: 53765.6328125 Train Loss: 2.8419244289398193
refreshing batch on epoch 100
Train RMSE: 53765.6328125 Train Loss: 2.398144006729126
refreshing batch on epoch 150
Train RMSE: 53765.6328125 Train Loss: 2.3014259338378906
refreshing batch on epoch 200
Train RMSE: 53765.6328125 Train Loss: 2.2549655437469482
refreshing batch on epoch 250
Train RMSE: 53765.6328125 Train Loss: 2.219618797302246
refreshing batch on epoch 300
Train RMSE: 53765.6328125 Train Loss: 2.192033290863037
refreshing batch on epoch 350
Train RMSE: 53765.6328125 Train Loss: 2.1750428676605225
refreshing batch on epoch 400
Train RMSE: 53765.6328125 Train Loss: 2.1675562858581543
refreshing batch on epoch 450
make_model_subplots(model_MDN_detailed, dat)
make_model_subplots(model_MDN_detailed, nn_dat)
make_distribution_subplots(model_MDN_detailed)
"pred_mdn_claims"] = model_MDN_detailed.predict(dat)
dat["pred_mdn_claims_decile"] = pd.qcut(dat["pred_mdn_claims"], 10, labels=False, duplicates='drop') dat[
# Train only
= dat.loc[dat.train_ind == 1].groupby("pred_mdn_claims_decile").agg("mean").reset_index()
X_sum
= dat.groupby("pred_mdn_claims_decile").agg("mean").reset_index()
X_sum
plt.scatter(X_sum.pred_mdn_claims, X_sum.payment_size)= max(X_sum.pred_mdn_claims.max(), X_sum.payment_size.max())
m 0, m],[0, m]) plt.plot([
# Test only
= dat.loc[dat.train_ind == 0].groupby("pred_mdn_claims_decile").agg("mean").reset_index()
X_sum
plt.scatter(X_sum.pred_mdn_claims, X_sum.payment_size)= max(X_sum.pred_mdn_claims.max(), X_sum.payment_size.max())
m
0, m],[0, m]) plt.plot([
# for later comparison
= make_pred_set_and_triangle(model_MDN_detailed, nn_train, nn_test) dat_mdn_pred, triangle_mdn_detailed
abs(dat_spline_det_pred.payment_size_cumulative - dat_mdn_pred.payment_size_cumulative) > 1] dat_mdn_pred.loc[np.
claim_no | occurrence_period | development_period | train_ind | payment_size | payment_period | occurrence_time | notidel | data_as_at_development_period | backdate_periods | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | log1_incurred_to_prior_period | payment_size_cumulative |
---|
lambda df: (df.claim_no == 349) & (df.train_ind == False)] dat_spline_det_pred.loc[
claim_no | occurrence_period | development_period | train_ind | payment_size | payment_period | occurrence_time | notidel | data_as_at_development_period | backdate_periods | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | log1_incurred_to_prior_period | payment_size_cumulative | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13238 | 349 | 4 | 37 | False | 10,822.77 | 41 | 3.86 | 1.84 | 37 | 0 | 13.83 | 11.21 | 1.00 | 41.00 | 1.00 | 1.00 | 8.00 | 3.00 | 13.90 | 84,392.68 |
13239 | 349 | 4 | 38 | False | 10,290.16 | 42 | 3.86 | 1.84 | 37 | 1 | 13.83 | 11.21 | 1.00 | 41.00 | 1.00 | 1.00 | 8.00 | 3.00 | 13.90 | 94,682.83 |
13240 | 349 | 4 | 39 | False | 9,778.47 | 43 | 3.86 | 1.84 | 37 | 2 | 13.83 | 11.21 | 1.00 | 41.00 | 1.00 | 1.00 | 8.00 | 3.00 | 13.90 | 104,461.31 |
lambda df: (df.claim_no == 349) & (df.train_ind == False)] dat_mdn_pred.loc[
claim_no | occurrence_period | development_period | train_ind | payment_size | payment_period | occurrence_time | notidel | data_as_at_development_period | backdate_periods | log1_outstanding_to_prior_period | log1_payment_to_prior_period | has_incurred_to_prior_period | payment_period_as_at | has_payment_to_prior_period | has_outstanding_to_prior_period | transaction_count_to_prior_period | payment_count_to_prior_period | log1_incurred_to_prior_period | payment_size_cumulative | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13238 | 349 | 4 | 37 | False | 10,822.77 | 41 | 3.86 | 1.84 | 37 | 0 | 13.83 | 11.21 | 1.00 | 41.00 | 1.00 | 1.00 | 8.00 | 3.00 | 13.90 | 84,392.68 |
13239 | 349 | 4 | 38 | False | 10,290.16 | 42 | 3.86 | 1.84 | 37 | 1 | 13.83 | 11.21 | 1.00 | 41.00 | 1.00 | 1.00 | 8.00 | 3.00 | 13.90 | 94,682.83 |
13240 | 349 | 4 | 39 | False | 9,778.47 | 43 | 3.86 | 1.84 | 37 | 2 | 13.83 | 11.21 | 1.00 | 41.00 | 1.00 | 1.00 | 8.00 | 3.00 | 13.90 | 104,461.31 |
Gradient Boosting Individual Model
We create a GBM model to use as a baseline. Advantage of GBMs is that it is really easy to get good, reasonable results. It is unable to extrapolate wildly in unexpected ways, but the disadvantage is that it is unlikely to correctly extrapolate payment period trends. There are no payment period trends in this simulated data, so should do well:
= {
parameters_gbm "max_iter": [500, 1000, 2000, 5000],
"l2_regularization": [0, 0.001, 0.01, 0.1, 0.3],
"loss": ["poisson"],
"learning_rate": [0.03, 0.07, 0.1]
}
= Pipeline(
gradient_boosting =[
steps"keep", ColumnKeeper(features)),
('gbm', RandomizedSearchCV(
(
HistGradientBoostingRegressor(),
parameters_gbm,=-1, # Run in parallel
n_jobs=cv_runs, # Models train slowly, so try only a few models
n_iter=RollingOriginSplit(5,5).split(groups=nn_train.payment_period),
cv=0)),
random_state
]
)
gradient_boosting.fit(
nn_train,
nn_train.payment_size
)
print(gradient_boosting["gbm"].best_params_)
{'max_iter': 2000, 'loss': 'poisson', 'learning_rate': 0.03, 'l2_regularization': 0.01}
make_model_subplots(gradient_boosting, dat)
make_model_subplots(gradient_boosting, nn_dat)
= make_pred_set_and_triangle(gradient_boosting, nn_train, nn_test) dat_gbm_pred, triangle_gbm_ind
Individual Claims Predictiveness
Check how the individual models perform on train and test data.
"pred_gbm_claims"] = gradient_boosting.predict(dat)
dat["pred_gbm_claims_decile"] = pd.qcut(dat["pred_gbm_claims"], 10, labels=False, duplicates='drop') dat[
# Train only
= dat.loc[dat.train_ind == 1].groupby("pred_gbm_claims_decile").agg("mean").reset_index()
X_sum
= dat.groupby("pred_gbm_claims_decile").agg("mean").reset_index()
X_sum
plt.scatter(X_sum.pred_gbm_claims, X_sum.payment_size)= max(X_sum.pred_gbm_claims.max(), X_sum.payment_size.max())
m 0, m],[0, m]) plt.plot([
# Test only
= dat.loc[dat.train_ind == 0].groupby("pred_gbm_claims_decile").agg("mean").reset_index()
X_sum
plt.scatter(X_sum.pred_gbm_claims, X_sum.payment_size)= max(X_sum.pred_gbm_claims.max(), X_sum.payment_size.max())
m 0, m],[0, m]) plt.plot([
Ultimate Summary
Compare Ultimate projections from the different models
= '{:,.0f}'.format
pd.options.display.float_format
= pd.concat(
results
[(lambda df: df.payment_period == cutoff, ["occurrence_period", "payment_size_cumulative"]]
triangle.loc["occurrence_period")
.set_index(={"payment_size_cumulative": "Paid to Date"})
.rename(columns+
)] lambda df: df.development_period == num_dev_periods, ["occurrence_period", "payment_size_cumulative"]]
[(df.loc["occurrence_period")
.set_index(={"payment_size_cumulative": l})) for df, l in zip(
.rename(columns
[triangle, triangle_cl, triangle_glm_agg, triangle_glm_ind, triangle_spline_ind, triangle_resnet_do, triangle_nn_spline, triangle_mdn_spline, triangle_nn_cv, triangle_resnet_detailed, triangle_spline_detailed, triangle_mdn_detailed, triangle_gbm_ind],"True Ultimate", "Chain Ladder", "GLM Chain Ladder", "GLM Individual", "GLM Spline", "D/O ResNet", "D/O SplineNet", "D/O SplineMDN ", "D/O SplineNet CV", "Detailed ResNet", "Detailed SplineNet", "Detailed SplineMDN", "Detailed GBM"]
[
)
],="columns"
axis
).sort_index()
results
Paid to Date | True Ultimate | Chain Ladder | GLM Chain Ladder | GLM Individual | GLM Spline | D/O ResNet | D/O SplineNet | D/O SplineMDN | D/O SplineNet CV | Detailed ResNet | Detailed SplineNet | Detailed SplineMDN | Detailed GBM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
occurrence_period | ||||||||||||||
1 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 | 17,485,892 |
2 | 15,795,103 | 15,795,103 | 15,795,103 | 15,795,134 | 15,821,221 | 15,866,280 | 15,888,352 | 15,906,488 | 15,906,488 | 15,888,234 | 15,800,222 | 15,801,500 | 15,801,500 | 15,821,457 |
3 | 16,108,434 | 16,111,727 | 16,285,616 | 16,285,647 | 16,312,055 | 16,280,810 | 16,322,010 | 16,356,091 | 16,356,091 | 16,323,868 | 16,121,592 | 16,126,862 | 16,126,862 | 16,190,916 |
4 | 18,155,534 | 18,935,352 | 18,377,934 | 18,377,972 | 18,424,706 | 18,410,586 | 18,456,274 | 18,493,792 | 18,493,792 | 18,461,874 | 18,265,570 | 18,292,784 | 18,292,784 | 18,396,958 |
5 | 19,235,524 | 19,250,872 | 19,653,886 | 19,653,922 | 19,710,680 | 19,659,068 | 19,713,468 | 19,756,592 | 19,756,592 | 19,727,088 | 19,337,093 | 19,361,392 | 19,361,392 | 19,562,238 |
6 | 21,611,440 | 22,786,944 | 22,181,350 | 22,181,390 | 22,240,392 | 22,307,848 | 22,367,292 | 22,409,770 | 22,409,770 | 22,396,284 | 21,736,698 | 21,765,160 | 21,765,160 | 21,985,755 |
7 | 18,526,526 | 18,861,376 | 19,072,624 | 19,072,656 | 19,132,282 | 19,319,268 | 19,357,838 | 19,376,774 | 19,376,774 | 19,397,918 | 18,608,061 | 18,630,599 | 18,630,599 | 18,771,394 |
8 | 27,839,500 | 30,036,106 | 28,984,746 | 28,984,798 | 29,073,758 | 29,090,414 | 29,112,122 | 29,099,470 | 29,099,470 | 29,186,056 | 28,016,788 | 28,051,537 | 28,051,537 | 28,195,846 |
9 | 19,798,286 | 22,849,618 | 20,828,210 | 20,828,248 | 20,893,272 | 21,082,996 | 21,071,046 | 21,017,720 | 21,017,720 | 21,157,624 | 20,225,237 | 20,304,972 | 20,304,972 | 20,458,580 |
10 | 23,628,224 | 24,720,896 | 25,154,980 | 25,155,024 | 25,229,162 | 25,445,416 | 25,387,422 | 25,258,898 | 25,258,898 | 25,524,670 | 24,207,924 | 24,322,480 | 24,322,480 | 24,405,912 |
11 | 24,596,934 | 26,525,608 | 26,682,368 | 26,682,414 | 26,753,536 | 27,000,192 | 26,877,728 | 26,641,880 | 26,641,880 | 27,078,678 | 24,948,550 | 25,011,901 | 25,011,901 | 25,498,735 |
12 | 15,848,811 | 16,394,228 | 17,685,404 | 17,685,432 | 17,725,470 | 17,822,516 | 17,690,942 | 17,446,042 | 17,446,042 | 17,872,032 | 16,254,583 | 16,304,961 | 16,304,961 | 16,587,953 |
13 | 24,036,898 | 25,194,832 | 27,142,022 | 27,142,064 | 27,220,448 | 27,177,784 | 26,929,468 | 26,462,016 | 26,462,016 | 27,243,664 | 24,487,006 | 24,558,175 | 24,558,175 | 24,908,239 |
14 | 17,040,986 | 18,981,944 | 19,829,872 | 19,829,908 | 19,866,636 | 20,619,532 | 20,302,944 | 19,686,090 | 19,686,090 | 20,691,322 | 18,282,593 | 18,478,140 | 18,478,140 | 18,637,493 |
15 | 25,963,162 | 28,375,866 | 30,920,786 | 30,920,842 | 31,070,692 | 30,085,996 | 29,692,482 | 28,889,976 | 28,889,976 | 30,175,932 | 27,319,403 | 27,525,937 | 27,525,937 | 28,493,595 |
16 | 19,147,944 | 27,107,038 | 23,450,766 | 23,450,812 | 23,485,770 | 23,717,874 | 23,257,828 | 22,272,546 | 22,272,546 | 23,834,454 | 21,004,365 | 21,249,098 | 21,249,098 | 22,053,924 |
17 | 20,473,130 | 22,796,338 | 25,722,546 | 25,722,592 | 25,791,608 | 26,003,652 | 25,423,924 | 24,123,716 | 24,123,716 | 26,171,146 | 21,622,966 | 21,741,627 | 21,741,627 | 22,269,693 |
18 | 18,259,918 | 22,533,702 | 23,552,468 | 23,552,510 | 23,607,782 | 23,501,260 | 22,931,582 | 21,606,382 | 21,606,382 | 23,685,548 | 20,825,912 | 21,201,268 | 21,201,268 | 22,299,554 |
19 | 20,525,112 | 23,686,100 | 27,099,510 | 27,099,558 | 27,161,088 | 26,869,550 | 26,152,032 | 24,449,804 | 24,449,804 | 27,117,410 | 22,708,353 | 23,091,859 | 23,091,859 | 22,941,089 |
20 | 23,967,498 | 31,330,774 | 32,407,678 | 32,407,738 | 32,430,768 | 32,939,240 | 31,872,852 | 29,356,400 | 29,356,400 | 33,305,976 | 28,383,144 | 28,863,218 | 28,863,218 | 27,752,390 |
21 | 23,882,838 | 29,962,468 | 33,551,694 | 33,551,760 | 33,740,196 | 33,911,348 | 32,639,526 | 29,759,118 | 29,759,118 | 34,307,548 | 29,412,685 | 30,428,752 | 30,428,752 | 28,776,498 |
22 | 18,409,002 | 24,089,654 | 27,242,720 | 27,242,776 | 27,392,162 | 28,433,204 | 27,057,726 | 24,183,322 | 24,183,322 | 28,779,724 | 24,999,340 | 25,750,220 | 25,750,220 | 23,957,820 |
23 | 20,920,498 | 23,758,476 | 33,155,220 | 33,155,284 | 33,319,848 | 32,220,078 | 30,529,388 | 27,370,516 | 27,370,516 | 32,519,086 | 24,946,902 | 25,192,274 | 25,192,274 | 24,446,417 |
24 | 18,047,780 | 20,501,736 | 30,175,460 | 30,175,516 | 30,403,008 | 30,080,486 | 28,113,480 | 24,897,982 | 24,897,982 | 30,270,470 | 22,429,679 | 22,805,165 | 22,805,165 | 21,000,008 |
25 | 13,656,915 | 23,132,220 | 24,360,950 | 24,360,996 | 24,405,170 | 25,282,098 | 23,211,626 | 20,291,212 | 20,291,212 | 25,320,104 | 25,585,984 | 26,500,339 | 26,500,339 | 22,561,164 |
26 | 14,057,512 | 25,597,836 | 26,520,252 | 26,520,298 | 26,545,318 | 28,534,582 | 25,743,200 | 22,373,132 | 22,373,132 | 28,385,354 | 25,949,932 | 26,467,517 | 26,467,517 | 24,389,678 |
27 | 19,192,766 | 34,954,040 | 38,892,552 | 38,892,628 | 39,134,660 | 38,475,344 | 34,486,064 | 30,378,120 | 30,378,120 | 38,010,264 | 41,257,998 | 41,837,961 | 41,837,961 | 34,845,121 |
28 | 20,606,538 | 39,625,828 | 45,626,804 | 45,626,900 | 45,760,488 | 42,853,872 | 37,968,644 | 33,677,224 | 33,677,224 | 42,023,360 | 44,954,913 | 45,815,149 | 45,815,149 | 46,081,342 |
29 | 18,247,140 | 32,446,106 | 44,651,524 | 44,651,612 | 43,947,172 | 44,051,356 | 38,111,388 | 33,644,020 | 33,644,020 | 42,784,056 | 40,330,884 | 40,731,268 | 40,731,268 | 39,661,250 |
30 | 13,984,095 | 28,976,184 | 37,498,064 | 37,498,152 | 37,585,132 | 38,474,268 | 32,646,268 | 28,865,580 | 28,865,580 | 37,038,120 | 36,976,778 | 37,538,512 | 37,538,512 | 38,027,176 |
31 | 13,596,397 | 29,792,210 | 41,785,112 | 41,785,192 | 42,324,272 | 40,486,768 | 33,973,032 | 30,283,838 | 30,283,838 | 38,731,876 | 40,403,462 | 40,892,232 | 40,892,232 | 41,443,538 |
32 | 11,048,168 | 29,870,178 | 39,740,276 | 39,740,368 | 39,657,784 | 40,362,984 | 33,247,632 | 29,671,058 | 29,671,058 | 38,343,996 | 43,678,392 | 43,867,986 | 43,867,986 | 43,662,527 |
33 | 8,387,086 | 35,015,104 | 36,516,840 | 36,516,920 | 37,206,112 | 45,514,788 | 36,620,288 | 32,558,432 | 32,558,432 | 42,940,908 | 48,447,343 | 48,162,239 | 48,162,239 | 51,146,504 |
34 | 7,570,764 | 34,747,012 | 42,226,348 | 42,226,456 | 42,049,376 | 44,158,132 | 35,653,984 | 32,045,872 | 32,045,872 | 41,718,676 | 51,030,670 | 50,925,791 | 50,925,791 | 51,942,173 |
35 | 6,421,426 | 35,800,224 | 48,767,020 | 48,767,112 | 47,682,604 | 48,012,896 | 38,757,744 | 34,973,528 | 34,973,528 | 45,484,012 | 52,921,648 | 52,658,770 | 52,658,770 | 58,484,879 |
36 | 4,121,508 | 42,207,016 | 46,554,760 | 46,554,888 | 50,996,812 | 45,708,396 | 36,968,172 | 33,365,534 | 33,365,534 | 43,574,544 | 44,521,851 | 44,091,826 | 44,091,826 | 62,297,948 |
37 | 3,489,762 | 41,677,012 | 67,693,760 | 67,693,952 | 67,171,768 | 49,969,316 | 41,016,080 | 37,088,308 | 37,088,308 | 48,189,488 | 44,231,205 | 43,262,513 | 43,262,513 | 65,567,004 |
38 | 1,041,739 | 33,073,546 | 40,025,928 | 40,026,044 | 43,098,828 | 42,130,068 | 35,091,852 | 31,576,854 | 31,576,854 | 41,263,008 | 31,153,941 | 30,061,124 | 30,061,124 | 53,524,457 |
39 | 237,800 | 45,185,632 | 35,000,772 | 35,001,000 | 47,447,824 | 51,326,744 | 43,612,884 | 39,077,556 | 39,077,556 | 51,338,380 | 25,019,690 | 23,720,650 | 23,720,650 | 43,477,737 |
40 | 0 | 45,785,588 | 0 | 29,102 | 2,220,165 | 44,997,796 | 39,246,008 | 35,037,048 | 35,037,048 | 46,132,608 | 12,676,553 | 11,527,513 | 11,527,513 | 22,612,619 |
= '{:,.1f}'.format
pd.options.display.float_format
= []
leaderboard
for col in results.columns:
+= [{
leaderboard "Method": col,
"Outstanding Claims Liability": (results.loc[results.index < cutoff, col] - results.loc[results.index < cutoff, "Paid to Date"]).sum(),
"Period Level MSE": np.sqrt(((results.loc[results.index < cutoff, col] - results.loc[results.index < cutoff, "True Ultimate"])**2).sum()),
"Period Level Absolute Error": (np.abs(results.loc[results.index < cutoff, col] - results.loc[results.index < cutoff, "True Ultimate"]).sum()),
"Total OCL Absolute Error": np.abs((results.loc[results.index < cutoff, col] - results.loc[results.index < cutoff, "True Ultimate"]).sum()),
"Total OCL Absolute Percent Error": (
100 * np.abs((results.loc[results.index < cutoff, col] - results.loc[results.index < cutoff, "True Ultimate"]).sum()) /
< cutoff, "True Ultimate"] - results.loc[results.index < cutoff, "Paid to Date"]).sum()
(results.loc[results.index
)
}]
"Total OCL Absolute Percent Error") pd.DataFrame(leaderboard).sort_values(
Method | Outstanding Claims Liability | Period Level MSE | Period Level Absolute Error | Total OCL Absolute Error | Total OCL Absolute Percent Error | |
---|---|---|---|---|---|---|
1 | True Ultimate | 415,208,224.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7 | D/O SplineNet | 377,212,928.0 | 17,353,778.0 | 72,897,200.0 | 37,995,252.0 | 9.2 |
8 | D/O SplineMDN | 377,212,928.0 | 17,353,778.0 | 72,897,200.0 | 37,995,252.0 | 9.2 |
6 | D/O ResNet | 456,777,888.0 | 16,902,854.0 | 75,666,032.0 | 41,569,676.0 | 10.0 |
10 | Detailed ResNet | 478,930,661.1 | 41,708,402.1 | 157,432,251.7 | 63,722,452.3 | 15.3 |
11 | Detailed SplineNet | 483,915,060.2 | 42,700,794.0 | 161,107,670.6 | 68,706,851.4 | 16.5 |
12 | Detailed SplineMDN | 483,915,060.2 | 42,700,794.0 | 161,107,670.6 | 68,706,851.4 | 16.5 |
9 | D/O SplineNet CV | 548,784,064.0 | 31,238,146.0 | 146,933,376.0 | 133,575,840.0 | 32.2 |
2 | Chain Ladder | 553,335,232.0 | 43,015,804.0 | 174,280,848.0 | 138,127,024.0 | 33.3 |
3 | GLM Chain Ladder | 553,337,792.0 | 43,016,100.0 | 174,282,560.0 | 138,129,600.0 | 33.3 |
13 | Detailed GBM | 563,046,264.0 | 54,976,179.2 | 193,207,856.1 | 147,838,055.2 | 35.6 |
5 | GLM Spline | 565,708,352.0 | 35,842,056.0 | 164,710,784.0 | 150,500,096.0 | 36.2 |
4 | GLM Individual | 574,341,184.0 | 42,662,360.0 | 174,327,264.0 | 159,132,944.0 | 38.3 |
0 | Paid to Date | 0.0 | 102,023,488.0 | 415,208,224.0 | 415,208,224.0 | 100.0 |
The SplineMDN should perform identically to the SplineNet for point estimate based error - by design since weights are loaded in and are set to non-trainable.
Summary
We have taken a simple chain ladder and added successive enhancements to expand it into first a GLM, then an individual neural network and then a probabilistic mixture density network with specialised architecture and loss function. Key steps, including data transformation, are all transparent in this notebook. We also implemented in Python, a framework for converting Pytorch modules to scikit-learn models, and a rolling origin cross validation approach.
Resulting models perform well compared to benchmarks of the chain ladder and GBM, across all 5 SPLICE datasets.
The summary of the final “Spline-based Individual Mixture Density Network” model design as follows:
Architecture
- One-way effects - spline based architecture
- Log-link
- ELU activation due to low neuron count
- Adjusted log-normal MDN approach, which also takes into account the observations with zero payments:
- Calculate point estimates from a separate module first fit using Poisson loss
- Backward solving \(\mu\) and \(\sigma\) with the mean of a log-normal being \(exp(\mu + \sigma^2/2)\)
- Apply loss to \(log(y + \epsilon)\) rather than \(y\) (with \(\epsilon\) being some small value),
- Use \(\sqrt ReLU\) + \(\epsilon\) as the activation for \(\sigma\) rather than \(exp\).
Initialisation:
- Final coefficients generally start at zero which improves numerical stability
- Bias initialise at log(mean) for point estimate models improves convergence
- Bias initialise at values for zero, small, large claims for MDN
- Initialise weights of MDN point estimates to be identical to the SplineNet version, make non-trainable.
Regularisation:
- L1, Weight decay (similar to L2), Dropout, Constant multiplier factor
- Set up rolling origin cross validation in Python
- Find optimal combination via Random Grid Search. Decided against bayesian optimisation due to added complexity.
Optimiser:
- AdamW with OneCycle Learning rate scheduler provides fairly fast convergence
Batch size:
- Use entire dataset per batch. Data is sparse so large batch size is preferred.
Calculation:
- GPU acceleration using Nvidia GPUs where available.
- M1 GPU is also possible with MPS but time savings vs CPU appear to be limited on our current choice of machine.
Export
# Export code if to see the values in Excel etc.
= True
export if export:
= "occurrence_period", columns = "development_period", values = "payment_size_cumulative").to_csv(f"payment_triangle_{dataset_no}.csv")
triangle.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative").to_csv(f"payment_triangle_train_{dataset_no}.csv")
triangle_train.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative").to_csv(f"payment_triangle_cl_{dataset_no}.csv")
triangle_cl.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative").to_csv(f"payment_triangle_glm_agg_{dataset_no}.csv")
triangle_glm_agg.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative").to_csv(f"payment_triangle_glm_ind_{dataset_no}.csv")
triangle_glm_ind.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative").to_csv(f"payment_triangle_nn_cv_{dataset_no}.csv")
triangle_nn_cv.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative").to_csv(f"payment_triangle_nn_det_{dataset_no}.csv")
triangle_spline_detailed.pivot(index = "occurrence_period", columns = "development_period", values = "payment_size_cumulative").to_csv(f"payment_triangle_mdn_det_{dataset_no}.csv")
triangle_mdn_detailed.pivot(index "Total OCL Absolute Error").to_csv(f"leaderboard_{dataset_no}.csv")
pd.DataFrame(leaderboard).sort_values(f"payment_results_{dataset_no}.csv") results.to_csv(
References
[1] Poon, J.H., 2019. Penalising unexplainability in neural networks for predicting payments per claim incurred. Risks 7, 95.
[2] MT Al-Mudafer, 2020, Probabilistic Forecasting with Neural Networks Applied to Loss Reserving
[3] Savarese and Figueiredo, 2017, Residual Gates: A Simple Mechanism for Improved Network Optimization