correction TP3 et nouveau TP4

134f6446 · Helene Coullon · 370b0823 · 134f6446 · 134f6446 · 134f6446
Commit 134f6446 authored 4 months ago by Helene Coullon
--- a/tp1-2-3/tp3/correction_tp3.ipynb
+++ b/tp1-2-3/tp3/correction_tp3.ipynb
--- a/tp4-5-6/Dockerfile
+++ b/tp4-5-6/Dockerfile
+FROM quay.io/jupyter/datascience-notebook
+USER root
+
+ARG openjdk_version="17"
+RUN apt-get update --yes && \
+    apt-get install --yes --no-install-recommends \
+    "openjdk-${openjdk_version}-jre-headless" \
+    ca-certificates-java \
+    default-libmysqlclient-dev \
+    build-essential  \
+    pkg-config && \
+    apt-get clean && rm -rf /var/lib/apt/lists/*
+
+RUN pip install --upgrade pip 
+COPY requirements.txt /home/requirements.txt
+RUN pip install --no-cache-dir --upgrade -r /home/requirements.txt && \
+    fix-permissions "${CONDA_DIR}" && \
+    fix-permissions "/home/${NB_USER}"
+    
+USER ${NB_UID}
\ No newline at end of file
--- a/tp4-5-6/docker-compose.yml
+++ b/tp4-5-6/docker-compose.yml
+version: '3.1'
+
+services:
+  spark:
+    image: docker.io/bitnami/spark:3.5
+    environment:
+      - SPARK_MODE=master
+      - SPARK_RPC_AUTHENTICATION_ENABLED=no
+      - SPARK_RPC_ENCRYPTION_ENABLED=no
+      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
+      - SPARK_SSL_ENABLED=no
+      - SPARK_USER=spark
+    ports:
+      - '8080:8080'
+      - '7077:7077'
+  spark-worker:
+    image: docker.io/bitnami/spark:3.5
+    environment:
+      - SPARK_MODE=worker
+      - SPARK_MASTER_URL=spark://spark:7077
+      - SPARK_WORKER_MEMORY=1G # can be changed
+      - SPARK_WORKER_CORES=1 # can be changed
+      - SPARK_RPC_AUTHENTICATION_ENABLED=no
+      - SPARK_RPC_ENCRYPTION_ENABLED=no
+      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
+      - SPARK_SSL_ENABLED=no
+      - SPARK_USER=spark
+  minio:
+    image: minio/minio
+    container_name: minio
+    environment:
+      MINIO_ROOT_USER: root
+      MINIO_ROOT_PASSWORD: password
+    command: server /data --console-address ":9001"
+    ports:
+      - "19000:9000"
+      - "19001:9001"
+
+  notebook:
+    image: grosinosky/bigdata_fila3_jupyter
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: jupyter
+    ports: 
+    - "8888:8888"
+    - "4040:4040"    
+    environment:
+      JUPYTER_ENABLE_LAB: yes
+    command: start-notebook.py --NotebookApp.token=''
\ No newline at end of file
--- a/tp4-5-6/requirements.txt
+++ b/tp4-5-6/requirements.txt
+#pandas==2.2.3 # ajouter en local
+mysqlclient==2.2.4
+jupysql==0.10.14
+#seaborn==0.13.2 # ajouter en local
+grpcio==1.59.0
+pymongo==4.10.1
+pyspark==3.5.3
+hdfs==2.7.3
+minio==7.2.10
+docker==7.1.0
+kafka-python==2.0.2; python_version < '3.12'
+kafka-python @ git+https://github.com/dpkp/kafka-python.git ; python_version >= '3.12'
\ No newline at end of file
--- a/tp4-5-6/tp4/README.md
+++ b/tp4-5-6/tp4/README.md
+# Map reduce et Spark
+
+Vous aurez à retirer les modules Docker des précédents TP. Vous pouvez le faire avec la commande `docker compose down` dans les répertoires appropriés ou en vous servant du plugin Docker de VsCode.
+
+Dans ce TP en premier lieu nous allons utiliser les fonctionnalités Map Reduce natives de Python dans le notebook [tp-python](tp-python.ipynb). Le code des fonctions décrites dans le cours y est également disponible.
+
+La deuxième partie du TP concerne Spark, dans le notebook [tp-spark](tp-spark.ipynb) (à faire après la présentation correspondante du cours !). Outre l'illustration de PageRank du cours, vous y retrouverez quelques exercices pour s'initier à l'utilisation de Spark, à l'aide de la bibliothèque [PySpark](https://spark.apache.org/docs/latest/api/python/index.html). 
+
+
+
+
+
+
--- a/tp4-5-6/tp4/tp-python.ipynb
+++ b/tp4-5-6/tp4/tp-python.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Map reduce\n",
+    "\n",
+    "Le code disponible ci-dessous est celui du cours. Bien entendu dans ce cas on est limités par la puissance et la mémoire de la machine utilisée."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from itertools import chain\n",
+    "from functools import reduce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Map"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "# Define flat_map using map and itertools.chain\n",
+    "def flat_map(func, iterable):\n",
+    "    return list(chain.from_iterable(map(func, iterable)))\n",
+    "\n",
+    "# Initial list of words\n",
+    "words = [\"The\", \"Dark\", \"Knight\", \"Rises\"]\n",
+    "\n",
+    "# Map to get the length of each word\n",
+    "lengths = list(map(len, words))\n",
+    "print(\"Lengths:\", lengths)\n",
+    "\n",
+    "# Map to get each word as a list of characters\n",
+    "list_of_chars = list(map(list, words))\n",
+    "print(\"List of characters:\", list_of_chars)\n",
+    "\n",
+    "# Map to get the ASCII value of the first character of each word\n",
+    "list_of_asciis = list(map(lambda word: ord(word[0]), words))\n",
+    "print(\"List of ASCII values:\", list_of_asciis)\n",
+    "\n",
+    "# FlatMap to flatten all characters\n",
+    "chars = flat_map(list, words)\n",
+    "print(\"Flattened characters:\", chars)\n",
+    "\n",
+    "# Map to increment each word length by 1\n",
+    "incs = list(map(lambda length: length + 1, lengths))\n",
+    "print(\"Incremented lengths:\", incs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Reduce"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "words = [\"The\", \"Dark\", \"Knight\", \"Rises\"]\n",
+    "lengths = list(map(len, words))  # Creates a list of lengths: [3, 4, 6, 5]\n",
+    "\n",
+    "# Concatenates all elements in `words` into a single string.\n",
+    "res1 = reduce(lambda x, y: x + y, words) if words else None\n",
+    "print(\"res1:\", res1)  \n",
+    "\n",
+    "# Concatenates all elements in `words`, then adds \"AndFalls\" at the end.\n",
+    "res2 = reduce(lambda x, y: x + y, words + [\"AndFalls\"])\n",
+    "print(\"res2:\", res2)  \n",
+    "\n",
+    "# Concatenates \"NaNa\" at the beginning, then adds all elements in `words`.\n",
+    "res3 = reduce(lambda x, y: x + y, [\"NaNa\"] + words)\n",
+    "print(\"res3:\", res3) \n",
+    "\n",
+    "# Takes the first letter of each word in `words` and concatenates them.\n",
+    "res4 = reduce(lambda x, y: x + y, map(lambda word: word[0], words))\n",
+    "print(\"res4:\", res4)  \n",
+    "\n",
+    "# Sums up all the elements in `lengths`, which represents the total length of all words.\n",
+    "res5 = reduce(lambda x, y: x + y, lengths)\n",
+    "print(\"res5:\", res5) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercice"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Ecrire un programme qui calcule la distance totale entre une série de points en 2D, connectés séquentiellement comme un trajet.\n",
+    "On vous donne une liste de points en 2D, chaque point étant représenté par un tuple (x, y).\n",
+    "\n",
+    "Utilisez map pour calculer la distance entre chaque paire de points consécutifs.\n",
+    "\n",
+    "Utilisez ensuite reduce pour calculer la distance totale du trajet reliant tous les points.\n",
+    "\n",
+    "```python\n",
+    "points = [(0, 0), (3, 4), (7, 1), (10, 10)]\n",
+    "```"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+
+# Map reduce
+
+Le code disponible ci-dessous est celui du cours. Bien entendu dans ce cas on est limités par la puissance et la mémoire de la machine utilisée.
+
+%% Cell type:code id: tags:
+
+``` python
+from itertools import chain
+from functools import reduce
+```
+
+%% Cell type:markdown id: tags:
+
+## Map
+
+%% Cell type:code id: tags:
+
+``` python
+
+
+# Define flat_map using map and itertools.chain
+def flat_map(func, iterable):
+    return list(chain.from_iterable(map(func, iterable)))
+
+# Initial list of words
+words = ["The", "Dark", "Knight", "Rises"]
+
+# Map to get the length of each word
+lengths = list(map(len, words))
+print("Lengths:", lengths)
+
+# Map to get each word as a list of characters
+list_of_chars = list(map(list, words))
+print("List of characters:", list_of_chars)
+
+# Map to get the ASCII value of the first character of each word
+list_of_asciis = list(map(lambda word: ord(word[0]), words))
+print("List of ASCII values:", list_of_asciis)
+
+# FlatMap to flatten all characters
+chars = flat_map(list, words)
+print("Flattened characters:", chars)
+
+# Map to increment each word length by 1
+incs = list(map(lambda length: length + 1, lengths))
+print("Incremented lengths:", incs)
+```
+
+%% Cell type:markdown id: tags:
+
+## Reduce
+
+%% Cell type:code id: tags:
+
+``` python
+words = ["The", "Dark", "Knight", "Rises"]
+lengths = list(map(len, words))  # Creates a list of lengths: [3, 4, 6, 5]
+
+# Concatenates all elements in `words` into a single string.
+res1 = reduce(lambda x, y: x + y, words) if words else None
+print("res1:", res1)
+
+# Concatenates all elements in `words`, then adds "AndFalls" at the end.
+res2 = reduce(lambda x, y: x + y, words + ["AndFalls"])
+print("res2:", res2)
+
+# Concatenates "NaNa" at the beginning, then adds all elements in `words`.
+res3 = reduce(lambda x, y: x + y, ["NaNa"] + words)
+print("res3:", res3)
+
+# Takes the first letter of each word in `words` and concatenates them.
+res4 = reduce(lambda x, y: x + y, map(lambda word: word[0], words))
+print("res4:", res4)
+
+# Sums up all the elements in `lengths`, which represents the total length of all words.
+res5 = reduce(lambda x, y: x + y, lengths)
+print("res5:", res5)
+```
+
+%% Cell type:markdown id: tags:
+
+## Exercice
+
+%% Cell type:markdown id: tags:
+
+Ecrire un programme qui calcule la distance totale entre une série de points en 2D, connectés séquentiellement comme un trajet.
+On vous donne une liste de points en 2D, chaque point étant représenté par un tuple (x, y).
+
+Utilisez map pour calculer la distance entre chaque paire de points consécutifs.
+
+Utilisez ensuite reduce pour calculer la distance totale du trajet reliant tous les points.
+
+```python
+points = [(0, 0), (3, 4), (7, 1), (10, 10)]
+```
--- a/tp4-5-6/tp4/tp-spark.ipynb
+++ b/tp4-5-6/tp4/tp-spark.ipynb