Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

mini-projet-intro-ml

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    CHOUMMIKH Meriam authored
    6f1eb995
    History

    Binary Classification Workflow

    Introduction

    This project focuses on applying a binary classification model to two different datasets: Banknote Authentication and Chronic Kidney Disease. It encompasses the entire machine learning workflow from data import and preprocessing to model training, validation, and analysis.

    Datasets

    1. Banknote Authentication Dataset: UCI Machine Learning Repository
    2. Chronic Kidney Disease Dataset: Kaggle

    Installation

    To run this project, you need Python installed on your machine along with the following libraries:

    • Pandas
    • NumPy
    • Scikit-learn
    • Matplotlib
    • Seaborn

    You can install these packages using pip:

    pip install pandas numpy scikit-learn matplotlib seaborn

    File Description

    • binary_classification_workflow.py: Contains all functions for data preprocessing, model training, validation, and result display.
    • main.ipynb: Jupyter Notebook demonstrating the application of the workflow to the datasets.

    Usage

    1. Clone the repository from GitLab : https://gitlab.imt-atlantique.fr/m21aouad/mini-projet-intro-ml.git
    2. Download the datasets and place them in the project directory.
    3. Run the Jupyter Notebook to see the workflow in action.

    Functions Overview

    • Data Preprocessing: Functions for cleaning, filling missing values, scaling, and normalizing data.
    • Model Training and Validation: Includes functions for splitting data, handling categorical features, feature selection, and model performance comparison.
    • Utility Functions: For tasks like checking skewness, identifying outliers, and visualizing data.

    Acknowledgements

    This project is part of the course "Intro ML" at IMT Atlantique.