Binary Classification Workflow
Introduction
This project focuses on applying a binary classification model to two different datasets: Banknote Authentication and Chronic Kidney Disease. It encompasses the entire machine learning workflow from data import and preprocessing to model training, validation, and analysis.
Datasets
- Banknote Authentication Dataset: UCI Machine Learning Repository
- Chronic Kidney Disease Dataset: Kaggle
Installation
To run this project, you need Python installed on your machine along with the following libraries:
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
You can install these packages using pip:
pip install pandas numpy scikit-learn matplotlib seaborn
File Description
-
binary_classification_workflow.py
: Contains all functions for data preprocessing, model training, validation, and result display. -
main.ipynb
: Jupyter Notebook demonstrating the application of the workflow to the datasets.
Usage
- Clone the repository from GitLab : https://gitlab.imt-atlantique.fr/m21aouad/mini-projet-intro-ml.git
- Download the datasets and place them in the project directory.
- Run the Jupyter Notebook to see the workflow in action.
Functions Overview
- Data Preprocessing: Functions for cleaning, filling missing values, scaling, and normalizing data.
- Model Training and Validation: Includes functions for splitting data, handling categorical features, feature selection, and model performance comparison.
- Utility Functions: For tasks like checking skewness, identifying outliers, and visualizing data.
Acknowledgements
This project is part of the course "Intro ML" at IMT Atlantique.