\n", "\n", "Given a set of attributes $M = \\{m_i\\}_{i\\in[|M|]}$, a dataset is a collection of examples $(\\bm{x}^i)_i$ where $\\bm{x}^i=(x^i_1,\\dots, x^i_{|M|}) \\in \\mathbb{R}^{|M|}$ is a tuple. An interval pattern of dimension $M$ is denoted $p=([l_i,u_i])_{i\\in[|M|]}$ where $l_i, u_i \\in \\bar{\\mathbb{R}}$.\n", "\n", "**Q2.** Define the relation $\\sqsubseteq$ of containment of a pattern $p$ in an example $\\bm{x}$.\n", "\n", "**Q3.** Define the notion of support for the domain of interval patterns of dimension $M$\n", "\n", "**Q4.** Proposed an definition of a partial order $\\triangleleft$ on the set of the interval patterns of dimension $M$ such that the support measure is anti-monotonic w.r.t. this order.\n", "\n", "\n", "

\n", "We now define a closure operator as follows:\n", "\n", "$cl(p) = \\left( \\left[ min(\\{ x_i \\mid \\bm{x}\\in \\mathcal{D} \\wedge p\\sqsubseteq\\bm{x}\\}), max(\\{ x_i \\mid \\bm{x}\\in \\mathcal{D} \\wedge p\\sqsubseteq\\bm{x}\\}) \\right] \\right)_{i\\in[|M|]} $\n", "\n", "**Q5.** What is $cl(p)$ when $p=([0.1,0.3],[1.0,4.0],[10.0,30.0])$? Comment?\n", "\n", " \n", "By construction, $cl(p)$ is the *smaller* interval patterns that matches the same examples as $p$. The pattern such that $p=cl(p)$ are called *closed patterns*.\n", "\n", "**Q6.** Illustrate a Hasse diagram of the closed interval patterns that match some examples in the dataset above (do at most two levels starting from the degenerated pattern $([-\\infty,+\\infty],[-\\infty,+\\infty],[-\\infty,+\\infty])$ ). \n", "Why do not draw the Hasse diagram of the whole interval patterns?\n", "\n", "\n", "**Q7.** We want to implement a Depth-First Search algorithm that extracts frequent closed interval patterns from a dataset (frequent = support above a threshold $\\sigma$). Propose the function that evaluates the support and extends a pattern $p$ to explore further the search space.\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Implementation of frequent interval pattern extraction\n", "We now implement the algorithm. \n", "For sake of simplicity, you can implement a version of the algorithm specific to a dataset with 3 dimensions ... but it is not so complicated to generalize it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset=[ (0.7, 0.5, 10.5), \n", " (0.2, 3.5, 20.3),\n", " (0.4, 0.6, 14.7),\n", " (0.2, 2.5, 15.89),\n", " (0.6, 6.5, 23.5),\n", " (0.5, 7.4, 10.4) ]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def matches(x, p):\n", " \"\"\"\n", " x: example in dimension 1\n", " p: interval pattern\n", " \"\"\"\n", " \n", " return True" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#example of interval pattern in dimension 3\n", "# -> this is just a proposal for interval-pattern representation: if can change it, if you want\n", "p=([0.1,0.3],[1.0,4.0],[10.0,30.0])\n", "\n", "print(matches(dataset[2], p))\n", "print(matches(dataset[1], p))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def support(dataset,p):\n", " \"\"\"Compute the support of pattern `p` in the `dataset`\"\"\"\n", " s=0\n", " \n", " return s\n", "\n", "# test\n", "support(dataset,p)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#TODO ... depth first search algorithm" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Q8.** Assuming you implement a naive approach of the algorithm, what is the limit/default of the proposed algorithm?\n", "\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Improved version\n", "\n", "In this improved version, we propose to tranverse the search space without redundancy. \n", "The problem we had in the former version is that there is not a unique path for reach a pattern of the search space.\n", "\n", "The solution we propose to avoid exploring twice the same interval pattern is is known as *lectic order*, that is to understand as an order on the possible actions (or \"generating the rightmost pattern\"). It is based on the definition of total order on the extentions alternatives. In this case, we assume that there is an order on the attributes $m_1<_a m_2<_a m_3$, and that the left change is *lower* than the right change.\n", "Then, \n", "\n", "In practice, this means that:\n", "1. after applying a change to the right boundary of an interval, then one cannot apply a change to the left boundary\n", "2. after applying a change to attribute $m$, one cannot apply a change to attribute $m'<_a m$.\n", "\n", "This rules induce a new partial order on the set of interval patterns. We denote $\\blacktriangleleft$ this order (note that it is a bit technical to formalize).\n", "\n", "\n", "**Q9.** Considering interval patterns in dimension 1 and the set of interval patterns $\\mathcal{P}=\\{[4,4], [4,5], [5,4], [4,6], [5,5]\\}$. Draw the Hasse diagram of $(\\mathcal{P},\\triangleleft)$ to represent the search paths used by the previous algorithm. Then, draw the Hasse diagram of $(\\mathcal{P},\\blacktriangleleft)$ and comment.\n", "\n", "**Q10.** Are all patterns accessible with it? (we do not ask for a formal proof)\n", "\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Q11.** (**) Propose an implementation with the new search space strategy and compare the number of output patterns \n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This exercice has been inspired by the work of M. Kaytoue et al:\n", "\n", "> Mehdi Kaytoue, Sergei O. Kuznetsov, and Amedeo Napoli. Revisiting numerical\n", "> pattern mining with formal concept analysis. In Proceedings of International Join\n", "> Conference on Artificial Intelligence (IJCAI), pages 1342–1347, 2011." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Second Exercice: SetChronicle mining\n", "\n", "This section introduces a new type of patterns we named *SetChronicles* that is dedicated to temporal sequences. The first objective of this algorithm is to design a new frequent pattern mining algorithm to extract SetChronicles from a collection of temporal sequences. The second objective is to apply this algorithm on the \n", "\n", "Let's start with some definitions. Let us first define a finite alphabet $(\\Sigma,<)$ equipped with a total order $<$.\n", "\n", "A **temporal sequence** of length $n$ over $\\Sigma$ is a finite sequence $\\rho = \\langle\n", "(\\sigma_1,\\tau_1), \\dots, (\\sigma_{n},\\tau_{n})\\rangle$ in $(\\Sigma\\times \\mathbb{R}_{\\geq 0})^{n}$ where for all $1\\leq i