Theory Behond Pareto/NBD Part 2

Jan 31, 2021 4 min read PredictiveAnalytics, Probability, tutorial

1 Deriving the Likelihood Function
2 Derivations
3 Reference

1 Deriving the Likelihood Function

Last time we talked about the ParetoNBD Model. Today we’ll derive the model likelihood function.

1.1 Some notation

For an customer,

Define:

$x = the number of purchase,$

$t_{i} = the time of ith purchase, where 1 \leq t_{i} \leq t_{x},$

$t_{x} = the time of last purchase in the history,$

$T = total time being observed$

Next, we’ll show that it is sufficient to use individual’s $(x, t_{x}, T)$ to describe his/her purchase behavior in Pareto/NBD model.

1.2 Conditional on $λ$ and $μ$

Assume a customer’s $x$ transactions occurred during the period $(0, T]$ ; we denote these times by $t_{1}, t_{2}, t_{3}, \dots, t_{x}$ .

There are two possible ways this pattern of transactions could arise:

The customer is still alive at the end of the observation period (i.e., $τ > T$ ), the individual-level likelihood function is simply the product of the (inter-transaction-time) exponential pdf and the associated survivor function:

\begin{aligned} L (λ ∣ t_{1}, \dots, t_{x}, T, τ > T) & = λ e^{- λ t_{1}} λ e^{- λ (t_{2} - t_{1})} \dots λ e^{- λ (t_{x} - t_{x - 1})} e^{- λ (T - t_{x})} \\ = λ^{x} e^{- λ T} \end{aligned}

The customer became inactive at some time $τ$ in the interval $(t_{x}, T]$ (i.e. $τ \in (t_{x}, T]$ ), in which case the individual-level likelihood function is

\begin{aligned} L (λ ∣ t_{1}, \dots, t_{x}, T, inactive at τ \in (t_{x}, T]) \\ = λ e^{- λ t_{1}} λ e^{- λ (t_{2} - t_{1})} \dots λ e^{- λ (t_{x} - t_{x - 1})} e^{- λ (τ - t_{x})} \\ = λ^{x} e^{- λ τ} \end{aligned}

Note that in both cases, information on when each of the x transactions occurred is not required.

We can replace $t_{1}, . . . t_{x}$ , $T$ with $(x, t_{x}, T)$ where, by definition, $t_{x} = 0$ when $x = 0$ . In other words, $t_{x}$ , $x$ and $T$ are sufficient summaries of a customer’s transaction history. Using direct marketing terminology, $t_{x}$ is recency and $x$ is frequency.

由以上两个事实可知，无需知晓客户每次的购买时间，第一次消费时间、最后一次消费时间、消费频次 作为充分统计量，已经足够我们导出似然函数了！

Removing the conditioning on $τ$ gives us the following expression for the individual-level likelihood function:

\begin{aligned} L (λ, μ ∣ x, t_{x}, T) = & L (λ ∣ x, T, τ > T) P (τ > T ∣ μ) + \\ \int_{t_{x}}^{T} L (λ ∣ x, T, inactive at τ \in (t_{x}, T]) f (τ ∣ μ) d τ \\ = λ^{x} e^{- λ T} e^{- μ T} + λ^{x} \int_{t_{x}}^{T} e^{- λ τ} μ e^{- μ τ} d τ \\ = λ^{x} e^{- (λ + μ) T} + \frac{λ^{x} μ}{λ + μ} e^{- (λ + μ) t_{x}} - \frac{λ^{x} μ}{λ + μ} e^{- (λ + μ) T} \\ = \frac{λ^{x} μ}{λ + μ} e^{- (λ + μ) t_{x}} + \frac{λ^{x + 1}}{λ + μ} e^{- (λ + μ) T} \end{aligned}

1.3 Removing the Conditioning on $λ$ and $μ$

We remove the conditioning on $λ$ and $μ$ by taking the expectation of $L (λ, μ | x, t_{x}, T)$ over the distributions of $λ$ and $μ$ :

$L (r, α, s, β ∣ x, t_{x}, T) = \int_{0}^{\infty} \int_{0}^{\infty} L (λ, μ ∣ x, t_{x}, T) g (λ ∣ r, α) g (μ ∣ s, β) d λ d μ$

The computation is tedious, check the paper “A Note on Deriving the Pareto/NBD Model and Related Expressions” to know the details.

1.4 MLE for $r, α, s, β$

Since we have derived the likelihood function $L (r, α, s, β ∣ x, t_{x}, T)$ , the 4 Pareto/NBD model parameters $(r, α, s, β)$ can be estimated via the method of MLE. Specifically, suppose we have a sample of $N$ customers, where customer $i$ had $x_{i}$ transactions in the period $(0, T_{i}]$ , with the last transaction occurring at $t_{x_{i}}$ . The sample log-likelihood function is given by

$L L (r, α, s, β) = \sum_{i = 1}^{N} \ln [L (r, α, s, β ∣ x_{i}, t_{x_{i}}, T_{i})] .$

This can be maximized using standard numerical optimization routines. Therefore, we will obtain 4 maximum likelihood estimators $(\hat{r}, \hat{α}, \hat{s}, \hat{β})$

2 Derivations

2.1 Mean of the Pareto/NBD Model

Given that the number of transactions follows a Poisson process while the customer is alive,

if $τ > t$ , the expected number of transactions is simply $λ t$ .
if $τ \leq t$ , the expected number of transactions in the interval (0, τ] is $λ τ$ .

Removing the conditioning on the time at which the customer becomes inactive, it follows that the expected number of transactions in the time interval $(0, t]$ , conditional on $λ$ and $μ$ , is

$\begin{aligned} E [X (t) ∣ λ, μ] & = λ t P (τ > t ∣ μ) + \int_{0}^{t} λ τ f (τ ∣ μ) d τ \\ = λ t e^{- μ t} + λ \int_{0}^{t} μ τ e^{- μ τ} d τ \\ = λ t e^{- μ t} + \frac{λ}{μ} \int_{0}^{t} μ^{2} τ e^{- μ τ} d τ, where integrand is an Erlang-2 \\ = λ t e^{- μ t} + \frac{λ}{μ} {1 - e^{- μ t} - μ t e^{- μ t}} \\ = \frac{λ}{μ} - \frac{λ}{μ} e^{- μ t} \end{aligned}$

Now removing the Conditioning on $λ$ and $μ$ ,

\begin{aligned} E [X (t) ∣ r, α, s, β] & = \int_{0}^{\infty} \int_{0}^{\infty} E [X (t) ∣ λ, μ] g (λ ∣ r, α) g (μ ∣ s, β) d λ d μ \\ = \frac{r β}{α (s - 1)} - \frac{r β^{s}}{α (s - 1) (β + t)^{s - 1}} \\ (2.1) & = \frac{r β}{α (s - 1)} [1 - {(\frac{β}{β + t})}^{s - 1}] \end{aligned}

2.2 Derivation of PAlive

The probability that a customer with purchase history $(x, t_{x}, T)$ is “alive” at time $T$ is $P (τ > T)$ .

\begin{aligned} P (τ > T ∣ λ, μ, x, t_{x}, T) & = \frac{L (λ ∣ x, T, τ > T) P (τ > T ∣ μ)}{L (λ, μ ∣ x, t_{x}, T)} \\ = \frac{λ^{x} e^{- (λ + μ) T}}{L (λ, μ ∣ x, t_{x}, T)} \end{aligned}

As the $λ$ and $μ$ are unobserved, we compute $P (a l i v e | x, t_{x}, T)$ for a randomly-chosen individual by taking the expectation of the above result over the distribution of $λ$ and $μ$ , updated to take account of the information $(x, t_{x}, T)$ :

\begin{array}{l} P (alive ∣ r, α, s, β, x, t_{x}, T) \\ = \int_{0}^{\infty} \int_{0}^{\infty} P (τ > T ∣ λ, μ, x, t_{x}, T) g (λ, μ ∣ r, α, s, β, x, t_{x}, T) d λ d μ \end{array}

By Bayes’ theorem, the joint posterior distribution of $λ$ and $μ$ is

$g (λ, μ ∣ r, α, s, β, x, t_{x}, T) = \frac{L (λ, μ ∣ x, t_{x}, T) g (λ ∣ r, α) g (μ ∣ s, β)}{L (r, α, s, β ∣ x, t_{x}, T)}$

Thus,

\begin{array}{l} P (alive ∣ r, α, s, β, x, t_{x}, T) \\ = \int_{0}^{\infty} \int_{0}^{\infty} λ^{x} e^{- (λ + μ) T} g (λ ∣ r, α) g (μ ∣ s, β) d λ d μ / L (r, α, s, β ∣ x, t_{x}, T) \\ = \frac{Γ (r + x) α^{r} β^{s}}{Γ (r) (α + T)^{r + x} (β + T)^{s}} / L (r, α, s, β ∣ x, t_{x}, T) \\ = {1 + (\frac{s}{r + s + x}) (α + T)^{r + x} (β + T)^{s} A_{0}}^{- 1} \end{array}

cap2022-07-09 18.38.17

For details check the reference paper. Note that, the above result is the formula to calculate PAlive used in BTYD 📦 implemented in R.

2.3 Conditional Expectation of Transactions

Let random variable $Y (t) = num of purchase made in (T, T + t]$ . We are interested in computing $E (Y (t) | x, t_{x}, T)$ , the expected number of purchase in the period $(T, T + t]$ for a customer with purchase history $(x, t_{x}, T)$ .

If the customer is active at $T$ ,

$\begin{array}{l} E [Y (t) ∣ λ, μ, alive at T] \\ = λ t P (τ > T + t ∣ μ, τ > T) + \int_{T}^{T + t} λ τ f (τ ∣ μ, τ > T) d τ \\ = λ t e^{- μ t} + λ \int_{0}^{t} μ τ e^{- μ τ} d τ \\ = \frac{λ}{μ} - \frac{λ}{μ} e^{- μ t} \end{array}$

Of course we don’t know whether a customer is alive at $T$ ; therefore

$E [Y (t) ∣ λ, μ, x, t_{x}, T] = E [Y (t) ∣ λ, μ, alive at T] P (τ > T ∣ λ, μ, x, t_{x}, T)$

Also, since $λ$ and $μ$ are unobserved, we need to integrate them out:

$\begin{matrix} E [Y (t) ∣ r, α, s, β, x, t_{x}, T] = \int_{0}^{\infty} \int_{0}^{\infty} {E [Y (t) ∣ λ, μ, alive at T] P (τ > T ∣ λ, μ, x, t_{x}, T) \\ g (λ, μ ∣ r, α, s, β, x, t_{x}, T)} d λ d μ \end{matrix}$

After the tedious computation, we will get

\begin{aligned} E [Y (t) ∣ r, α, s, β, x, t_{x}, T] \\ = {\frac{Γ (r + x) α^{r} β^{s}}{Γ (r) (α + T)^{r + x} (β + T)^{s}} / L (r, α, s, β ∣ x, t_{x}, T)} \\ \times \frac{(r + x) (β + T)}{(α + T) (s - 1)} [1 - {(\frac{β + T}{β + T + t})}^{s - 1}] \\ = {P (alive | x, t_{x}, T)} \times updated mean of Pareto/NBD \end{aligned}

Note that:

The first part, the bracketed term, is out expression for $P (alive | x, t_{x}, T)$ .
The rest part is mean of the Pareto/NBD (2.1), with “updated” parameters that reflect the individual’s behavior up to time $T$ (assuming no “death” in $(0, T])$ ).

Next time, we’ll finally take about the prediction of CLV.

3 Reference

Schmittlein DC, Morrison DG, Colombo R (1987). “Counting Your Customers: Who-Are They and What Will They Do Next?” Management Science, 33(1), 1-24.
Fader PS, Hardie BGS (2005). “A Note on Deriving the Pareto/NBD Model and Related Expressions.” URL
Fader PS, Hardie BGS (2007). “Incorporating time-invariant covariates into the Pareto/NBD and BG/NBD models.” URL.
Fader PS, Hardie BGS (2020). “Deriving an Expression for P(X(t)=x) Under the Pareto/NBD Model.” URL

CLV gamma distribution gamma gamma model negative binomial pareto distribution PNBD prediction

Chen Xing

Founder & Data Scientist

Enjoy Life & Enjoy Work!