README.md 8.04 KB
Newer Older
Len Feremans's avatar
Len Feremans committed
1
# TIPM: Pattern mining and anomaly detection in multi-dimensional time series and event logs
Len Feremans's avatar
Len Feremans committed
2

3
4
Implementation of _A framework for pattern mining and anomaly detection in multi-dimensional time series and event logs_,
by Len Feremans and Vincent Vercruyssen.
Len Feremans's avatar
Len Feremans committed
5

6
Presented at [New Frontiers in Mining Complex Patterns workshop](http://www.di.uniba.it/~loglisci/NFMCP2019/index.html), at *ECML-PKDD 2019*, the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2019.
Len Feremans's avatar
Len Feremans committed
7

Len Feremans's avatar
Len Feremans committed
8
9
Abstract:
> In the present-day, sensor data and textual logs are generated by many devices. Analyzing these time series data leads to the discovery of interesting patterns and anomalies. In recent years, numerous algorithms have been developed to discover interesting patterns in time series data as well as detect periods of anomalous behaviour. However, these algorithms are challenging to apply in real-world settings. We propose a framework, consisting of generic transformations, that allows to combine state-of-the-art time series representation, pattern mining, and pattern-based anomaly detection algorithms. Using an early- or late integration, our framework handles a mix of multi-dimensional continuous series and event logs. Finally we present an open-source, lightweight, interactive tool that assists both pattern mining and domain experts to select algorithms, specify parameters, and visually inspect the results, while shielding themfrom the underlying technical complexity of implementing our framework.
Len Feremans's avatar
Len Feremans committed
10

11
12
[Full paper](http://adrem.uantwerpen.be//bibrem/pubs/framework.pdf)

Len Feremans's avatar
Len Feremans committed
13
## Summary
Len Feremans's avatar
Len Feremans committed
14

15
**TIPM** takes *univariate*, *multi-variate* and  *mixed-type time series* as input.
Len Feremans's avatar
Len Feremans committed
16
Using **TIPM** end-users can interactively compute an anomaly score for each window without the need for *labels*,
Len Feremans's avatar
Len Feremans committed
17
by specify options for time series representation, pattern mining, reduction of patterns, and anomaly detection in an interactive manner.
Len Feremans's avatar
Len Feremans committed
18
19

**TIPM** consist of 4 major steps:
Len Feremans's avatar
Len Feremans committed
20

Len Feremans's avatar
Len Feremans committed
21
1. Preprocessing univariate, multivariate, and mixed-type time series.
22
2. Mining a (non-redundant) set of *itemsets* and *sequential patterns* from each time series (using [SPMF](www.philippe-fournier-viger.com/spmf/)).
Len Feremans's avatar
Len Feremans committed
23
3. Computing an anomaly score using generalisation of [PBAD: Pattern based anomaly detection](http://adrem.uantwerpen.be/bibrem/pubs/pbad.pdf) and [Fp-outlier: Frequent pattern based outlier detection](https://www.researchgate.net/profile/Zengyou_He/publication/220117736_FP-outlier_Frequent_pattern_based_outlier_detection/links/53d9dec60cf2e38c63363c05/FP-outlier-Frequent-pattern-based-outlier-detection.pdf).
Len Feremans's avatar
Len Feremans committed
24
25
4. Visualising time series, pattern occurrences, labels and predicted anomaly scores.

Len Feremans's avatar
Len Feremans committed
26
27
## Framework
**TIPM**, (Time Series Pattern Mining) is an open-source web-based application. We can import any dataset that contains at least a datetime and at least one value column, either continuous or discrete.**TIPM**  visualizes the histogram and summary statistics for each column,  and allows to transform continuous time series using our framework. For subsequent _pattern mining_ and _anomaly detection_ we apply generic transformations and existing pattern mining algorithms.  For visualisation, **TIPM** can plot continuous timeseries values, discrete event logs, labels, and segmentation, on different levels of granularity in time (raw, hourly, daily, yearly, etc.). For validation of pattern mining we can ender pattern occurrences and anomaly scores. After each transform, **TIPM** saves intermediate files and end-users can _undo_ any transformation. Most transformations in our framework are implemented using _streaming_ techniques,  thereby loading only a small set of rows at a time, instead of loading all data into main memory. By only loading and processing data in a streaming, or paginated,  way,  the  interface  and  many  preprocessing  and  postprocessing transformations can handle large time series with millions of samples.  For pattern mining we can manage resources by setting support to a relatively high value, and choosing appropriate transformations during preprocessing.
Len Feremans's avatar
Len Feremans committed
28

Len Feremans's avatar
Len Feremans committed
29
Workflow of our **framework**:
Len Feremans's avatar
Len Feremans committed
30
![Overview](https://bitbucket.org/len_feremans/tipm_pub/raw/ade25ad35609c11e70f1474900932d7e460a4c75/doc/img/architecture.png | width=400)
Len Feremans's avatar
Len Feremans committed
31
32


Len Feremans's avatar
Len Feremans committed
33
## Usage
Len Feremans's avatar
Len Feremans committed
34

35
See demo (slightly older version) [video](https://bitbucket.org/len_feremans/tipm_pub/raw/ea97c64d54ece43b227623d914cc37dca4168b5e/video/TIPM%20tutorial.mp4)
Len Feremans's avatar
Len Feremans committed
36

Len Feremans's avatar
Len Feremans committed
37
![Screens](https://bitbucket.org/len_feremans/tipm_pub/raw/ade25ad35609c11e70f1474900932d7e460a4c75/doc/img/screens.png | width=800)
Len Feremans's avatar
Len Feremans committed
38

39
40
41
42
43
44
1. Upload a CSV file containing a multi-variate timeseries.
2. Two special fields are `Label` and the first column, which is assumed to be a `Time` column either coded as integer or in ISO datetime format.
3. Transform continuous time series using discretisation and a sliding window.
4. Compute pattern mining and anomaly detection.
5. Visualise time series, patterns, and anomaly score (including AUC and AP).

45
## Installation
Len Feremans's avatar
Len Feremans committed
46
Remark: The current version was tested with `Java` `jdk1.8.0_60.jdk` and `jdk-9.0.4.jdk`, and `Apache Maven 3.6.3` on `macOs 10.15.2`.
47
It was also tester with `java Openjdk-11.0.15` and `maven 3.8.6` on `archlinux`.
Len Feremans's avatar
Len Feremans committed
48
If you have any issue please contact me.
Len Feremans's avatar
Len Feremans committed
49
50

1. Clone the repository
51
2. Code is implemented in `Java` based on the `Spring` framework for a web-application development.
Len Feremans's avatar
Len Feremans committed
52
53
   User interface is programmed using `Javascript`. Use `Maven` to compile and run the webapp.
3. Go to [http://localhost:8080](http://localhost:8080) with your browser.
Len Feremans's avatar
Len Feremans committed
54
55
56
57

```bash
cd ~
git clone git@bitbucket.org:len_feremans/tipm_pub.git
58
mvn clean install spring-boot:run
Len Feremans's avatar
Len Feremans committed
59
60
```

Len Feremans's avatar
Len Feremans committed
61
For running the `PBAD` anomaly detection method, `PBAD` must be also installed which is implemented in `Python` (and `C` using `Cython`).
62
Compile and install `PBAD`, in the same parent directory as `TIPM`, the name of the folder has to be specified in the Settings.java file:
Len Feremans's avatar
Len Feremans committed
63
64
65
66
67
68
69
70

```bash
cd ~
git clone git@bitbucket.org:len_feremans/pbad.git
cd pbad/src/utils/cython_utils/
python setup.py build_ext --inplace
```

Len Feremans's avatar
Len Feremans committed
71
## More information for researchers and contributors ###
72
73
The current version is 1.01, last updated on February 2020.  The main implementation is written in `Java 1.8`.
For mining closed, maximal and minimal infrequent itemsets and sequential patterns we depend on the `Java`-based [SPMF](www.philippe-fournier-viger.com/spmf/) library.
Len Feremans's avatar
Len Feremans committed
74
Java Dependencies specifed in `Maven` and are `org.springframework.boot=1.1.8`, `com.h2database==1.4.187` (in memory database), `com.google.guava==18.0`, `org.apache.commons==3.2`, `nz.ac.waikato.cms.weka==3.6.11` and `xstream==1.2.2`.
75

Len Feremans's avatar
Len Feremans committed
76
77
78
79
80
Some example datasets are provided in _/data_:

- `univariate` *New york taxi*, *ambient temperature*, and *request latency*. Origin is the [Numenta repository](https://github.com/numenta).
- `multivariate` *Indoor physical exercises* dataset captured using a Microsoft Kinect camera. Origin is [AMIE: Automatic Monitoring of Indoor Exercises](https://dtai.cs.kuleuven.be/software/amie).

81

Len Feremans's avatar
Len Feremans committed
82
## Contributors
Len Feremans's avatar
Len Feremans committed
83
84
- Len Feremans, Adrem Data Labs research group, University of Antwerp, Belgium.

Len Feremans's avatar
Len Feremans committed
85
## Licence ###
Len Feremans's avatar
Len Feremans committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
Copyright (c) [2019] [Len Feremans]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
104
SOFWARE.