The objective of this vignette is to illustrate the practical application of Generalized Additive Models (GAM) in analyzing insurance data, with an emphasis on the beMTPL dataset from Charpentier (2014). Our focus lies in public liability for drivers, particularly in providing comprehensive insights into insurance contracts and claims associated with Belgium motor third-party liability insurance. Our objective is to develop a model to explore the factors that impact claim occurrences within the insurance dataset, with a special focus on the elderly.
The data used in this vignette come from the Belgium motor third-party liability insurance portfolio.
The dataset, beMTPL, encompasses details regarding contracts and clients obtained from a Belgium insurance company, related to a public liability insurance portfolio.
For convenience, the beMTPL table will be referred to as CLAIMS.
Dictionaries
The list of the 22 variables from the beMTPL dataset is reported in Table 1.
Table 1: Content of the beMTPL dataset: CLAIMS
Attribute
Type
Description
insurance_contract
Numeric
Unique identifier for the contract
policy_year
Numeric
Year of study or observation for the insured person
insured_year_birth
Numeric
insured’s year of birth
exposure
Numeric
Exposure duration in years
vehicle_age
Numeric
Age of the vehicle in years
policy_holder_age
Numeric
Seniority of the insured at the insurance agency
driver_license_age
Numeric
Age of the driver’s licence
vehicle_brand
Character
Brand of the vehicle
mileage
Numeric
Mileage of the vehicle
vehicle_power
Numeric
Power value of the vehicle
catalog_value
Numeric
Catalog value of the vehicle
claim_value
Numeric
Value of the claim
number_of_liability_claims
Numeric
Number of liability claims
number_of_bodily_injury_liability_claims
Numeric
Number of bodily injury liability claims
claim_time
Numeric
Time of the accident
claim_responsibility_rate
Numeric
Rate of responsibility for the claim (100% full responsibility, 0% no responsibility
In the domain of public liability for automobile accidents, particularly with a focus on elderly drivers, Generalized Additive Models (GAM) are a reliable tool for understanding and predicting accident frequencies, repair costs, and claim patterns.
By employing GAM, insurers can better anticipate future challenges, refine pricing strategies, and enhance their resilience in an ever-evolving risk environment, specifically addressing the unique risks associated with elderly drivers.
In this analysis, we explore the relationship between the response variable target and the explanatory variables DriverAge and vehicle_age. This modeling framework aligns with the principles outlined by Agresti (2013), a prominent figure in statistical methodology, who emphasizes the importance of considering multiple explanatory factors in regression analysis.
To model the frequency of insurance claims, we employ a Generalized Additive Model (GAM) approach for the response variable ClaimNB, which represents the count of insurance claims and is assumed to follow a Quasi-Poisson distribution:
where \(\lambda\) is the mean rate of claims. The GAM approach allows for flexible, nonlinear relationships between \(\lambda\) and the predictor variables through the use of smooth functions. Specifically, we express the natural logarithm of \(\lambda\) as a combination of these smooth functions and an additional term accounting for exposure:
where \(f_1(\text{insured age}_i)\), \(f_2(\text{vehicle age}_i)\) are smooth functions of the predictor variables.
In this model, DriverAge represents the age of the insured individual, vehicle_age denotes the age of the vehicle, and \(\log(\text{exposure})\) adjusts for the exposure variable. The intercept \(\beta_0\) and the smooth functions \(f_1\) and \(f_2\) are estimated through regression to quantify their impact on the expected rate of claims. The smooth functions allow the model to capture complex, nonlinear relationships between the predictors and the response variable, providing a more flexible and accurate fit to the data.
The estimated lambda parameter, which represents the mean of claims, is 0.37.
This generalized additive model (GAM) predicts the number of claims based on insured_age and vehicle_age as predictors. The smooth terms in the model are statistically significant, indicating that both insured_age and vehicle_age have a meaningful effect on the number of claims.
A positive coefficient for s(insured_age) suggests that increasing the age of the insured is associated with a higher expected log count of total liability claims. Similarly, the positive coefficient for s(vehicle_age) indicates that an increase in vehicle age is linked to a higher expected log count of claims.
The plot displays the smooth function of insured_age from the Generalized Additive Model (GAM). The solid line represents the estimated effect of insured_age on the log count of claims, with the dashed lines indicating 95% confidence intervals.
Trend: The effect of insured_age is relatively flat for ages 60 to 70, suggesting minimal impact on claims. However, as age increases beyond 80, the smooth function rises, indicating a higher expected log count of claims with increasing age.
Nonlinearity: The curve demonstrates a nonlinear relationship, capturing the increasing risk associated with older insured individuals.
Significance: The upward trend, especially beyond age 80, suggests a significant increase in claim risk for older drivers, with the confidence intervals indicating reliability in these estimates.
This plot suggests that insurers should consider the increasing risk with age, particularly for insured individuals over 80 years old, when assessing premiums and risk.
Code to create the following graph
plot(reg, select =2)
References
Agresti, Alan. 2013. Categorical Data Analysis, 3rd Edition.
For more similar claim frequency datasets with a Poisson-like distribution, see freMTPL (import with data("freMTPLfreq")): French automobile dataset, norauto: Norwegian automobile dataset (import with data("norauto")), ausprivauto0405 (import with data("ausprivauto0405")): Australian automobile dataset, or pg17trainpol (import with data("pg17trainpol")).