Change Point Detection in R
What is the differnce between change point and outlier?
To answer this question, we should really understand what is a change point for a time series.
Changepoints are also known as:
- breakpoints
- segmentation
- structural breaks
- regime switching
- detecting disorder
and can be found in a wide range of literature including
- quality control
- economics
- medicine
- environment
- linguistics
- \(\cdots\)
For data \(y_1, \ldots, y_n\), if a changepoint exists at \(\tau\), then \(y_1,\ldots,y_{\tau}\) differ from \(y_{\tau+1},\ldots,y_n\) in some way.
There are many different types of change.
Thus a changepoint model for a change in mean has the following formulation:
\[ y_t = \left\{ \begin{array}{lcl} \mu_1 & \mbox{if} & 1\leq t \leq \tau_1 \\ \mu_2 & \mbox{if} & \tau_1 < t \leq \tau_2 \\ \vdots & & \vdots \\ \mu_{m+1} & \mbox{if} & \tau_m < t \leq \tau_{m+1}=n \end{array} \right. \]
What is the goal?
- Has a change occurred?
- If yes, where is the change?
- What is the difference between the pre and post change data?
- Maybe this is the type of change
- Maybe it is the parameter values before and after the change
- What is the probability that a change has occured?
- How certain are we of the changepoint location?
- How many changes have occurred (+ all the above for each change)?
- Why has there been a change?
Online vs Offline
- Online
- Processes data as it arrives or in batches
- Goal is quickest detection of a change
- Often used in processing control, intrusion detection
- Offline
- Processes all the data in one go
- Goal is accurate detection of a change
- Often used in genome analysis, audiology
change point detection function using ecp
package
library(zeta.forecast)
zetaEDA::enable_zeta_ggplot_theme()
plot_ts_change_point(eg_diamond_ts)
## # A tibble: 2 × 3
## time value cpt
## <date> <dbl> <fct>
## 1 2013-12-01 652769 yes
## 2 2019-09-01 305745 yes