- nbValueFiles: an Integer which represents the number of value files that we want to read (in this example, 0)
- noClasses: a Boolean which is true if we want to ignore class of the transactions (in this example, true)
If the noClasses argument is set to false, then the class item of each transaction will be ignored during the mining. We consider the first item of each transaction as the class item. In the example above, class items are `{1,2}`.
If the noClasses argument is set to false, then the class item of each transaction will be ignored during the mining. We consider the first item of each transaction as the class item. In the example above, class items are `{1,2}` if noClasses is set to false.
If nbValueFiles is set to an Integer n > 0, then we read *.val0, *.val1,..., *.valn files, where each line of this file represents the value of an item. See the `data` directory to get examples of value files.
...
...
@@ -273,7 +273,7 @@ for (Pattern generator : generators) {
Finally, we solve our model and we find all the generators with a frequency and length >= 1.
**MFI_MII_Mining**
**ExampleMFIsMIIsMining**
In this example, we want to mine **Maximal Frequent Itemsets** (MFIs) and **Minimal Infrequent Itemsets** (MIIs). First, we can analyse the code of the method createModel, that takes three arguments:
Next, we create variables to compute the value of each measure m in M for the pattern x. Note that the aconf is converted to an integer variable by multiplying it by 10000.
Next, we add the constraints **CoverSize** to link freq variable to the frequency of x and **AdequateClosure** to ensure that x is closed w.r.t. M'. We also add the **Pareto** constraint to ensure that x is not dominated by any previously found pattern. Note that we also plug the Pareto maximizer to the solver such that each time a new solution is discovered, we add it to the archive and we remove all the dominated solutions.
The heuristic considered is **MinCov** which consists of selecting the item i such that freq(x U {i}) is minimal and we instantiate it first to 0. This heuristic is interesting for the following reasons:
- non-frequent patterns are filtered quickly
- the pattern with the maximal frequency is instantiated first, which is useful to prune efficiently the search space
Next, we add **CoverSize** constraints to compute the frequency of z and x, and a constraint on the min confidence of the rule. We also post two constraints **Generator** and **CoverClosure** to ensure that the rule is an MNR.
Finally, we order the variables such that x is instantiated before y and y is instantiated before z and we select the minimum value first. Each time we find a new MNR, we print it.
**ExampleDiversity**
In this example, we want to find the set of diverse patterns with a min frequency of 10% and a Jmax of 0.05.