Change Point Detection in R

Dec 31, 2021 2 min read Forecast, R

What is the differnce between change point and outlier?

To answer this question, we should really understand what is a change point for a time series.

Changepoints are also known as:

breakpoints
segmentation
structural breaks
regime switching
detecting disorder

and can be found in a wide range of literature including

quality control
economics
medicine
environment
linguistics
$\dots$

For data $y_{1}, \dots, y_{n}$ , if a changepoint exists at $τ$ , then $y_{1}, \dots, y_{τ}$ differ from $y_{τ + 1}, \dots, y_{n}$ in some way.

There are many different types of change.

Thus a changepoint model for a change in mean has the following formulation:

$y_{t} = {\begin{array}{lcl} μ_{1} & if & 1 \leq t \leq τ_{1} \\ μ_{2} & if & τ_{1} < t \leq τ_{2} \\ ⋮ & ⋮ \\ μ_{m + 1} & if & τ_{m} < t \leq τ_{m + 1} = n \end{array}$

What is the goal?

Has a change occurred?
If yes, where is the change?
What is the difference between the pre and post change data?
- Maybe this is the type of change
- Maybe it is the parameter values before and after the change
What is the probability that a change has occured?
How certain are we of the changepoint location?
How many changes have occurred (+ all the above for each change)?
Why has there been a change?

Online vs Offline

Online
- Processes data as it arrives or in batches
- Goal is quickest detection of a change
- Often used in processing control, intrusion detection
Offline
- Processes all the data in one go
- Goal is accurate detection of a change
- Often used in genome analysis, audiology

change point detection function using `ecp` package

library(zeta.forecast)
zetaEDA::enable_zeta_ggplot_theme()

plot_ts_change_point(eg_diamond_ts)

## # A tibble: 2 × 3
##   time        value cpt  
##   <date>      <dbl> <fct>
## 1 2013-12-01 652769 yes  
## 2 2019-09-01 305745 yes

analsyis forecast change point detection time series zeta.forecast

Chen Xing

Founder & Data Scientist

Enjoy Life & Enjoy Work!