Theory Behond Pareto/NBD Part 2

1 Deriving the Likelihood Function

Last time we talked about the ParetoNBD Model. Today we’ll derive the model likelihood function.

1.1 Some notation

For an customer,

Define:

x=the number of purchase,

ti=the time of ith purchase,  where 1titx,

tx=the time of last purchase in the history,

T=total time being observed

Next, we’ll show that it is sufficient to use individual’s (x,tx,T) to describe his/her purchase behavior in Pareto/NBD model.

1.2 Conditional on λ and μ

Assume a customer’s x transactions occurred during the period (0,T]; we denote these times by t1,t2,t3,,tx.

There are two possible ways this pattern of transactions could arise:

  1. The customer is still alive at the end of the observation period (i.e., τ>T ), the individual-level likelihood function is simply the product of the (inter-transaction-time) exponential pdf and the associated survivor function:
L(λt1,,tx,T,τ>T)=λeλt1λeλ(t2t1)λeλ(txtx1)eλ(Ttx)=λxeλT
  1. The customer became inactive at some time τ in the interval (tx,T] (i.e. τ(tx,T]), in which case the individual-level likelihood function is
L(λt1,,tx,T, inactive at τ(tx,T])=λeλt1λeλ(t2t1)λeλ(txtx1)eλ(τtx)=λxeλτ

Note that in both cases, information on when each of the x transactions occurred is not required.

We can replace t1,...tx , T with (x,tx,T) where, by definition, tx=0 when x=0. In other words, tx, x and T are sufficient summaries of a customer’s transaction history. Using direct marketing terminology, tx is recency and x is frequency.

由以上两个事实可知,无需知晓客户每次的购买时间,第一次消费时间最后一次消费时间消费频次 作为充分统计量,已经足够我们导出似然函数了!

Removing the conditioning on τ gives us the following expression for the individual-level likelihood function:

L(λ,μx,tx,T)=L(λx,T,τ>T)P(τ>Tμ)+txTL(λx,T, inactive at τ(tx,T])f(τμ)dτ=λxeλTeμT+λxtxTeλτμeμτdτ=λxe(λ+μ)T+λxμλ+μe(λ+μ)txλxμλ+μe(λ+μ)T=λxμλ+μe(λ+μ)tx+λx+1λ+μe(λ+μ)T

1.3 Removing the Conditioning on λ and μ

We remove the conditioning on λ and μ by taking the expectation of L(λ,μ|x,tx,T) over the distributions of λ and μ :

L(r,α,s,βx,tx,T)=00L(λ,μx,tx,T)g(λr,α)g(μs,β)dλdμ

The computation is tedious, check the paper “A Note on Deriving the Pareto/NBD Model and Related Expressions” to know the details.

1.4 MLE for r,α,s,β

Since we have derived the likelihood function L(r,α,s,βx,tx,T), the 4 Pareto/NBD model parameters (r,α,s,β) can be estimated via the method of MLE. Specifically, suppose we have a sample of N customers, where customer i had xi transactions in the period (0,Ti], with the last transaction occurring at txi . The sample log-likelihood function is given by

LL(r,α,s,β)=i=1Nln[L(r,α,s,βxi,txi,Ti)].

This can be maximized using standard numerical optimization routines. Therefore, we will obtain 4 maximum likelihood estimators (r^ , α^ , s^ , β^)

2 Derivations

2.1 Mean of the Pareto/NBD Model

Given that the number of transactions follows a Poisson process while the customer is alive,

  1. if τ>t, the expected number of transactions is simply λt.
  2. if τt, the expected number of transactions in the interval (0, τ] is λτ.

Removing the conditioning on the time at which the customer becomes inactive, it follows that the expected number of transactions in the time interval (0,t], conditional on λ and μ, is

E[X(t)λ,μ]=λtP(τ>tμ)+0tλτf(τμ)dτ=λteμt+λ0tμτeμτdτ=λteμt+λμ0tμ2τeμτdτ,where integrand is an Erlang-2=λteμt+λμ{1eμtμteμt}=λμλμeμt

Now removing the Conditioning on λ and μ,

E[X(t)r,α,s,β]=00E[X(t)λ,μ]g(λr,α)g(μs,β)dλdμ=rβα(s1)rβsα(s1)(β+t)s1(2.1)=rβα(s1)[1(ββ+t)s1]

2.2 Derivation of PAlive

The probability that a customer with purchase history (x,tx,T) is “alive” at time T is P(τ>T).

P(τ>Tλ,μ,x,tx,T)=L(λx,T,τ>T)P(τ>Tμ)L(λ,μx,tx,T)=λxe(λ+μ)TL(λ,μx,tx,T)

As the λ and μ are unobserved, we compute P(alive|x,tx,T) for a randomly-chosen individual by taking the expectation of the above result over the distribution of λ and μ , updated to take account of the information (x,tx,T):

P( alive r,α,s,β,x,tx,T)=00P(τ>Tλ,μ,x,tx,T)g(λ,μr,α,s,β,x,tx,T)dλdμ

By Bayes’ theorem, the joint posterior distribution of λ and μ is

g(λ,μr,α,s,β,x,tx,T)=L(λ,μx,tx,T)g(λr,α)g(μs,β)L(r,α,s,βx,tx,T)

Thus,

P( alive r,α,s,β,x,tx,T)=00λxe(λ+μ)Tg(λr,α)g(μs,β)dλdμ/L(r,α,s,βx,tx,T)=Γ(r+x)αrβsΓ(r)(α+T)r+x(β+T)s/L(r,α,s,βx,tx,T)={1+(sr+s+x)(α+T)r+x(β+T)s A0}1

cap2022-07-09 18.38.17

For details check the reference paper. Note that, the above result is the formula to calculate PAlive used in BTYD 📦 implemented in R.

2.3 Conditional Expectation of Transactions

Let random variable Y(t)=num of purchase made in (T,T+t]. We are interested in computing E(Y(t)|x,tx,T), the expected number of purchase in the period (T,T+t] for a customer with purchase history (x,tx,T).

If the customer is active at T,

E[Y(t)λ,μ, alive at T]=λtP(τ>T+tμ,τ>T)+TT+tλτf(τμ,τ>T)dτ=λteμt+λ0tμτeμτdτ=λμλμeμt

Of course we don’t know whether a customer is alive at T; therefore

E[Y(t)λ,μ,x,tx,T]=E[Y(t)λ,μ, alive at T]P(τ>Tλ,μ,x,tx,T)

Also, since λ and μ are unobserved, we need to integrate them out:

E[Y(t)r,α,s,β,x,tx,T]=00{E[Y(t)λ,μ, alive at T]P(τ>Tλ,μ,x,tx,T)g(λ,μr,α,s,β,x,tx,T)}dλdμ

After the tedious computation, we will get

E[Y(t)r,α,s,β,x,tx,T]={Γ(r+x)αrβsΓ(r)(α+T)r+x(β+T)s/L(r,α,s,βx,tx,T)}×(r+x)(β+T)(α+T)(s1)[1(β+Tβ+T+t)s1]={P(alive|x,tx,T)}×updated mean of Pareto/NBD

Note that:

  1. The first part, the bracketed term, is out expression for P(alive|x,tx,T).

  2. The rest part is mean of the Pareto/NBD (2.1), with “updated” parameters that reflect the individual’s behavior up to time T (assuming no “death” in (0,T])).

Next time, we’ll finally take about the prediction of CLV.

3 Reference

  1. Schmittlein DC, Morrison DG, Colombo R (1987). “Counting Your Customers: Who-Are They and What Will They Do Next?” Management Science, 33(1), 1-24.

  2. Fader PS, Hardie BGS (2005). “A Note on Deriving the Pareto/NBD Model and Related Expressions.” URL

  3. Fader PS, Hardie BGS (2007). “Incorporating time-invariant covariates into the Pareto/NBD and BG/NBD models.” URL.

  4. Fader PS, Hardie BGS (2020). “Deriving an Expression for P(X(t)=x) Under the Pareto/NBD Model.” URL

Chen Xing
Chen Xing
Founder & Data Scientist

Enjoy Life & Enjoy Work!

Related