Theory Behind Pareto/NBD Part 1
1 Introduction
The Pareto/NBD model was developed by Schmittlein et al. (1987) to describe repeat-buying behavior in a noncontractual setting.
There are 4 key questions:
How many “alive” customers does the firm now have?
How has this customer base grown over the past year?
Which individuals on this list most likely represent active customers? Inactive customers?
What level of transactions should be expected next year by those on the list, both individually and collectively?
In order to answer these questions, we need to build up the model(s) to estimate:
What is the
?What is the
?
2 PNBD Model Assumptions
2.1 Two stages in the lifetime
Customers go through 2 stages in their “lifetime”: they are “alive” for some period of time, then become permanently inactive.
2.2 Poisson Purchase
Given a customer while alive, the number of transactions follows Poisson distribution with parameter
This is equivalent to assuming that the time between transactions is
where
2.3 Exponential Lifetime
A customer’s unobserved “lifetime” of length
2.4 Gamma transaction rate
Heterogeneity in transaction rates across customers follows a gamma distribution with shape parameter
2.5 Gamma dropout rate
Heterogeneity in dropout rates across customers follows a gamma distribution with shape parameter
2.6 Two processes are Independent
The transaction rate
3 Why named Pareto/NBD?
Short answer:
3.1 Poisson Gamma Mixture
Theorem 3.1 If we assume the Poisson purchase and the Gamma transaction rate, then the distribution of the number of transactions while the customer is alive is Negative Binomial (NBD).
Proof.
Note that the last line is the density of negative binomial. It looks a little bit different from our familiar version of NegBin, and the parameter
3.2 Exponential Gamma Mixture
Theorem 3.2 If we assume the Exponential lifetime and the Gamma dropout rate, then the distribution of the “lifetime” is “Pareto distribution of the second kind”.
Proof.
Note that
Therefore, if we assume Exponential lifetime and Gamma dropout rate, we have ended with Pareto Type II distribution, or more specifically, Lomax distribution.
In conclusion, the NBD and Pareto labels for each of the sub-models naturally leads to the name of the integrated model.
In the next post we will talk about the likelihood, the mean of the Pareto/NBD model, and other related derivations, eg. probability of the customer being alive.
4 Reference
Schmittlein DC, Morrison DG, Colombo R (1987). “Counting Your Customers: Who-Are They and What Will They Do Next?” Management Science, 33(1), 1-24.
Fader PS, Hardie BGS (2005). “A Note on Deriving the Pareto/NBD Model and Related Expressions.” URL
Fader PS, Hardie BGS (2007). “Incorporating time-invariant covariates into the Pareto/NBD and BG/NBD models.” URL.
Fader PS, Hardie BGS (2020). “Deriving an Expression for P(X(t)=x) Under the Pareto/NBD Model.” URL