Update doc authored by Charles Vernerey's avatar Charles Vernerey
......@@ -211,7 +211,7 @@ model.post(new Constraint("Cover Size", new CoverSize(database, freq, x)));
model.post(new Constraint("Adequate Closure", new AdequateClosureDC(database, measures, x)));
```
We post two constraints, **CoverSize** to compute the frequency of the pattern and **AdequateClosure** to ensure that `x` is closed w.r.t. the set of measures `{freq(x),max(x.freq)}`.
We post two constraints, **CoverSize** to compute the frequency of the pattern and **AdequateClosure** to ensure that `x` is closed w.r.t. the set of measures `{freq(x),max(x.freq)}`. Note that in this example we use the Domain Consistency version of **AdequateClosure**. If you want to use the Weak Consistency version, just replace AdequateClosureDC by AdequateClosureWC in the code.
```java
List<Pattern> closedPatterns = new LinkedList<>();
......@@ -228,4 +228,103 @@ for (Pattern closed : closedPatterns) {
}
```
Finally, we solve our model and find all the closed patterns.
\ No newline at end of file
Finally, we solve our model and find all the closed patterns with a frequency and length >= 1.
**ExampleCoverClosure**
In this example, we want to mine the set of closed patterns w.r.t. the set of measures `{freq}`. The code is similar to ExampleAdequateClosure.
**ExampleGenerator**
In this example, we want to mine the set of generators (a generator is an itemset which has no subset with the same frequency). We can analyse the code of the main method:
```java
String dataPath = "src/test/resources/contextPasquier99/contextPasquier99.dat";
Model model = new Model("generator test");
Database database = new DatReader(dataPath, 0, true).readFiles();
IntVar freq = model.intVar("freq", 1, database.getNbTransactions());
IntVar length = model.intVar("length", 1, database.getNbItems());
BoolVar[] x = model.boolVarArray("x", database.getNbItems());
model.sum(x, "=", length).post();
```
First, we create the data structures to build our model. We create two integer variables `freq` and `length` which represent the frequency and the length of the pattern, and a boolean variables array `x` where `x[i] = 1` means that item `i` belongs to the pattern.
```java
model.post(new Constraint("Cover Size", new CoverSize(database, freq, x)));
model.post(new Constraint("Generator", new Generator(database, x)));
```
We post two constraints, **CoverSize** to compute the frequency of the pattern and **Generator** to ensure that `x` is a generator.
```java
List<Pattern> generators = new LinkedList<>();
while (model.getSolver().solve()) {
int[] itemset = IntStream.range(0, x.length)
.filter(i -> x[i].getValue() == 1)
.map(i -> database.getItems()[i])
.toArray();
generators.add(new Pattern(itemset, new int[]{freq.getValue()}));
}
for (Pattern generator : generators) {
System.out.println(Arrays.toString(generator.getItems()) + ", freq=" + generator.getMeasures()[0]);
}
```
Finally, we solve our model and we find all the generators with a frequency and length >= 1.
**MFI_MII_Mining**
In this example, we want to mine **Maximal Frequent Itemsets** (MFIs) and **Minimal Infrequent Itemsets** (MIIs). First, we can analyse the code of the method createModel, that takes three arguments:
- dataPath: a String which represents the path of the dataset
- mfi: a Boolean which is true if we want to find MFIs, else MIIs
- s: an integer that represents the threshold for the mining
```java
Database database = new DatReader(dataPath, 0, true).readFiles();
Model model = new Model();
BoolVar[] x = model.boolVarArray("x", database.getNbItems());
```
First, we create the required data structures.
```java
int freqLB = mfi ? s : 0;
int freqUB = mfi ? database.getNbTransactions() : (s - 1);
IntVar freq = model.intVar("freq", freqLB, freqUB);
```
Now, we have to compute the bounds of the freq variable that represents the frequency of the pattern `x`. If `x` is an MFI, then its bounds are `[s,database.getNbTransactions()]`. If `x` is an MII, then its bounds are `[0,s-1]`.
```java
model.post(new Constraint("FrequentSubs", new FrequentSubs(database, s, x)));
model.post(new Constraint("InfrequentSupers", new InfrequentSupers(database, s, x)));
model.post(new Constraint("CoverSize", new CoverSize(database, freq, x)));
if (mfi) {
model.post(new Constraint("CoverClosure", new CoverClosure(database, x)));
}
else {
model.post(new Constraint("Generator", new Generator(database, x)));
}
```
Then we post all the necessary constraints (see the paper of Belaid et al. for more information). Interestingly, if `x` is an MFI, then we have the guarantee that `x` is closed w.r.t. the frequency (since `x` is frequent and its supersets are not frequent) and we can post **CoverClosure** constraint. If `x` is an MII, then we have the guarantee that `x` is a generator (since `x` is infrequent and its subsets are frequent) and we can post **Generator** constraint.
```java
int[] itemFreq = database.computeItemFreq();
BoolVar[] xSorted = IntStream
.range(0, database.getNbItems())
.boxed()
.sorted(Comparator.comparingInt(i -> itemFreq[i]))
.map(i -> x[i])
.toArray(BoolVar[]::new);
Solver solver = model.getSolver();
solver.setSearch(Search.intVarSearch(
new InputOrder<>(model),
new IntDomainMax(),
xSorted
));
```
As heuristic, we order items by increasing frequency and we instantiate them first to 1.
\ No newline at end of file