Update authored by VERNEREY Charles's avatar VERNEREY Charles
......@@ -2,6 +2,16 @@
Data mining with Choco Solver.
To use constraints in your Maven project, you can add a new dependency in your file **pom.xml**:
```xml
<dependency>
<groupId>io.gitlab.chaver</groupId>
<artifactId>data-mining</artifactId>
<version>1.0.1</version>
</dependency>
```
## Read a transactional database
A transactional database is a file of the following format:
......@@ -22,9 +32,9 @@ Database database = new DatReader(path, 0, true).readFiles()
`DatReader` takes 3 arguments in the constructor:
- path: a String which represents the path of the transactional database
- nbValueFiles: an Integer which represents the number of value files that we want to read (in this example, 0)
- noClasses: a Boolean which is true if we want to ignore class of the transactions (in this example, true)
- **path**: a String which represents the path of the transactional database
- **nbValueFiles**: an Integer which represents the number of value files that we want to read (in this example, 0)
- **noClasses**: a Boolean which is true if we want to ignore class of the transactions (in this example, true)
If the noClasses argument is set to false, then the class item of each transaction will be ignored during the mining. We consider the first item of each transaction as the class item. In the example above, class items are `{1,2}` if noClasses is set to false.
......@@ -43,8 +53,6 @@ It is not mandatory to have n consecutive Integer as items. For example, the fol
The following constraints are available:
`AdequateClosureDC(Database database, List<Measure> measures, BoolVar[] items)`
**Parameters**:
......@@ -55,9 +63,7 @@ The following constraints are available:
**Description**: Ensure that the pattern represented by `items` is closed w.r.t. the set of measures `measures` (Domain Consistency version)
**References**: `Vernerey et al. - Threshold-free Pattern Mining Meets Multi-Objective Optimization: Application to Association Rules`
**References**: *Vernerey et al. - Threshold-free Pattern Mining Meets Multi-Objective Optimization: Application to Association Rules*
`AdequateClosureWC(Database database, List<Measure> measures, BoolVar[] items)`
......@@ -69,9 +75,7 @@ The following constraints are available:
**Description**: Ensure that the pattern represented by `items` is closed w.r.t. the set of measures `measures` (Weak Consistency version)
**References**: `Vernerey et al. - Threshold-free Pattern Mining Meets Multi-Objective Optimization: Application to Association Rules`
**References**: *Vernerey et al. - Threshold-free Pattern Mining Meets Multi-Objective Optimization: Application to Association Rules*
`CoverClosure(Database database, BoolVar[] items)`
......@@ -82,9 +86,7 @@ The following constraints are available:
**Description**: Ensure that the pattern represented by `items` is closed w.r.t. the support
**References**: `Schaus et al. - CoverSize : A Global Constraint for Frequency-Based Itemset Mining`
**References**: *Schaus et al. - CoverSize : A Global Constraint for Frequency-Based Itemset Mining*
`CoverSize(Database database, IntVar freq, BoolVar[] items)`
......@@ -96,9 +98,7 @@ The following constraints are available:
**Description**: Ensure that the variable `freq` is equal to the frequency of the pattern represented by `items` variables
**References**: `Schaus et al. - CoverSize : A Global Constraint for Frequency-Based Itemset Mining`
**References**: *Schaus et al. - CoverSize : A Global Constraint for Frequency-Based Itemset Mining*
`FrequentSubs(Database database, int freq, BoolVar[] x)`
......@@ -110,9 +110,7 @@ The following constraints are available:
**Description**: Ensure that all the subsets of `x` are frequent w.r.t. the `freq` threhsold (i.e. frequency(y) >= freq for all y subsets of x)
**References**: `Belaid et al. - Constraint Programming for Mining Borders of Frequent Itemsets`
**References**: *Belaid et al. - Constraint Programming for Mining Borders of Frequent Itemsets*
`Generator(Database database, BoolVar[] items)`
......@@ -123,9 +121,7 @@ The following constraints are available:
**Description**: Ensure that the pattern represented by `items` is a generator (i.e. has no subset with the same frequency)
**References**: `Belaid et al. - Constraint Programming for Association Rules`
**References**: *Belaid et al. - Constraint Programming for Association Rules*
`InfrequentSupers(Database database, int freq, BoolVar[] x)`
......@@ -137,7 +133,7 @@ The following constraints are available:
**Description**: Ensure that all the supersets of `x` are infrequent w.r.t. the `freq` threhsold (i.e. frequency(y) < freq for all y supersets of x)
**References**: `Belaid et al. - Constraint Programming for Mining Borders of Frequent Itemsets`
**References**: *Belaid et al. - Constraint Programming for Mining Borders of Frequent Itemsets*
......@@ -146,19 +142,19 @@ The following constraints are available:
The following measures are available (see the package `io.gitlab.chaver.mining.patterns.measure`) :
- Pattern measures:
- `AllConf`: All-confidence
- `Area`
- `Freq`
- `Freq1`
- `Freq2`
- `FreqNeg`
- `GrowthRate`
- `Length`
- `MaxFreq`
- `AllConf`: `AllConf(x) = Freq(x) / MaxFreq(x)`, AllConfidence
- `Area` : `Area(x) = Freq(x) * Length(x)`
- `Freq` : Frequency of the pattern, also known as **support**
- `Freq1` : Frequency of the pattern w.r.t. the subset of transactions that has class 1 (denoted D1)
- `Freq2` : Frequency of the pattern w.r.t. the subset of transactions that has not class 1 (denoted D2)
- `FreqNeg`: `FreqNeg(x) = -Freq(x)`
- `GrowthRate`: `GrowthRate(x) = (|D1| * Freq2(x)) / (|D2| * Freq1(x))`
- `Length`: Number of items that contains the pattern, also known as **size**
- `MaxFreq`: Maximum frequency of the pattern, i.e. the maximum frequency of its items
- Attribute measures:
- `Max`
- `Min`
- `Mean`
- `Max`: Maximum value of the items
- `Min`: Minimum value of the items
- `Mean`: `Mean(x) = (Min(x) + Max(x)) / 2`
## Code examples
......
......