Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • main
1 result

Target

Select target project
  • m21aouad/mini-projet-intro-ml
1 result
Select Git revision
  • main
1 result
Show changes
Commits on Source (2)
Source diff could not be displayed: it is too large. Options to address this: view the blob.
......@@ -3,7 +3,7 @@
Spyder Editor
This file contains the preprocessing functions needed to clean
and prepare the data. We first consider the data related to kidney diseases.
and prepare the data.
"""
import seaborn as sns
......@@ -24,6 +24,7 @@ from binary_classification_workflow import *
"""
kideney data
data description : 25 features ( 11 numeric ,14 nominal)
Numerical Data (11):
1. age: Age in years
......@@ -400,7 +401,8 @@ def split(df, target,alpha=0.2,n=5):
def convert_categorical_feats(df, categorical_cols):
"""
Encode the categorical features of the dataset using OrdinalEncoder and OneHotEncoder.
Encode the categorical features of the dataset using OrdinalEncoder
and OneHotEncoder.
Parameters:
----------
......
%% Cell type:code id: tags:
 
``` python
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.linear_model import LogisticRegression
from binary_classification_workflow import *
```
 
%% Cell type:code id: tags:
 
``` python
df_kidney = pd.read_csv('./data/kidney_disease.csv')
df_kidney.info()
 
nan_count = df_kidney[df_kidney.isna().any(axis=1)].shape[0]
print(f"Number of rows : {len(df_kidney)}")
print(f"Number of rows with at least one NAN value: {nan_count}")
print(f"{round(nan_count/len(df_kidney) * 100)}% of our rows have at least one"
f" missing value")
```
 
%% Output
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 400 non-null int64
1 age 391 non-null float64
2 bp 388 non-null float64
3 sg 353 non-null float64
4 al 354 non-null float64
5 su 351 non-null float64
6 rbc 248 non-null object
7 pc 335 non-null object
8 pcc 396 non-null object
9 ba 396 non-null object
10 bgr 356 non-null float64
11 bu 381 non-null float64
12 sc 383 non-null float64
13 sod 313 non-null float64
14 pot 312 non-null float64
15 hemo 348 non-null float64
16 pcv 330 non-null object
17 wc 295 non-null object
18 rc 270 non-null object
19 htn 398 non-null object
20 dm 398 non-null object
21 cad 398 non-null object
22 appet 399 non-null object
23 pe 399 non-null object
24 ane 399 non-null object
25 classification 400 non-null object
dtypes: float64(11), int64(1), object(14)
memory usage: 81.4+ KB
Number of rows : 400
Number of rows with at least one NAN value: 242
60% of our rows have at least one missing value
 
%% Cell type:code id: tags:
 
``` python
df_kidney.sample(5)
```
 
%% Output
 
id age bp sg al su rbc pc pcc \
367 367 68.0 60.0 1.025 0.0 0.0 normal normal notpresent
100 100 34.0 70.0 1.015 4.0 0.0 abnormal abnormal notpresent
67 67 45.0 80.0 1.020 3.0 0.0 normal abnormal notpresent
76 76 48.0 80.0 1.005 4.0 0.0 abnormal abnormal notpresent
133 133 70.0 100.0 1.015 4.0 0.0 normal normal notpresent
ba ... pcv wc rc htn dm cad appet pe ane \
367 notpresent ... 50 6700 6.1 no no no good no no
100 notpresent ... NaN NaN NaN no no no good yes no
67 notpresent ... NaN NaN NaN no no no poor no no
76 present ... 36 \t6200 4 no yes no good yes no
133 notpresent ... 37 \t8400 8.0 yes no no good no no
classification
367 notckd
100 ckd
67 ckd
76 ckd
133 ckd
[5 rows x 26 columns]
 
%% Cell type:code id: tags:
 
``` python
numerical_columns = get_numerical_columns(df_kidney)
nominal_columns = get_categorical_columns(df_kidney)
```
 
%% Cell type:code id: tags:
 
``` python
##
print(numerical_columns,
nominal_columns)
```
 
%% Output
 
['id', 'age', 'bp', 'sg', 'al', 'su', 'bgr', 'bu', 'sc', 'sod', 'pot', 'hemo'] ['rbc', 'pc', 'pcc', 'ba', 'pcv', 'wc', 'rc', 'htn', 'dm', 'cad', 'appet', 'pe', 'ane', 'classification']
 
%% Cell type:code id: tags:
 
``` python
# visualise_numerical_data(df_kidney)
visualise_numerical_data(df_kidney,columns=numerical_columns)
```
 
%% Output
 
 
 
 
 
 
 
 
 
 
 
 
 
%% Cell type:code id: tags:
 
``` python
fill_categorical_kidney(df_kidney,nominal_columns)
df_kidney.info()
nan_count = df_kidney[df_kidney.isna().any(axis=1)].shape[0]
print(f"Number of rows : {len(df_kidney)}")
print(f"Number of rows with at least one NaN value: {nan_count}")
print(f"{round(nan_count/len(df_kidney) * 100)}% of our rows have at least one"
f" missing value")
```
 
%% Output
 
Going through each categorical feature...: 100%|██████████| 14/14 [00:00<00:00, 353.70it/s]
 
Processing column: rbc
Possible categories and their frequencies:
rbc
normal 0.810484
abnormal 0.189516
Name: proportion, dtype: float64
Processing column: pc
Possible categories and their frequencies:
pc
normal 0.773134
abnormal 0.226866
Name: proportion, dtype: float64
Processing column: pcc
Possible categories and their frequencies:
pcc
notpresent 0.893939
present 0.106061
Name: proportion, dtype: float64
Processing column: ba
Possible categories and their frequencies:
ba
notpresent 0.944444
present 0.055556
Name: proportion, dtype: float64
Processing column: pcv
Possible categories and their frequencies:
pcv
41 0.063830
52 0.063830
44 0.057751
48 0.057751
40 0.048632
43 0.045593
42 0.039514
45 0.039514
32 0.036474
50 0.036474
36 0.036474
33 0.036474
28 0.036474
34 0.033435
37 0.033435
30 0.027356
29 0.027356
35 0.027356
46 0.027356
31 0.024316
24 0.021277
39 0.021277
26 0.018237
38 0.015198
53 0.012158
51 0.012158
49 0.012158
47 0.012158
54 0.012158
25 0.009119
27 0.009119
22 0.009119
19 0.006079
23 0.006079
15 0.003040
21 0.003040
20 0.003040
17 0.003040
9 0.003040
18 0.003040
14 0.003040
16 0.003040
Name: proportion, dtype: float64
Processing column: wc
Possible categories and their frequencies:
wc
9800 0.037415
6700 0.034014
9600 0.030612
7200 0.030612
9200 0.030612
...
19100 0.003401
12300 0.003401
16700 0.003401
14900 0.003401
2600 0.003401
Name: proportion, Length: 89, dtype: float64
Processing column: rc
Possible categories and their frequencies:
rc
5.2 0.066914
4.5 0.059480
4.9 0.052045
4.7 0.040892
4.8 0.037175
3.9 0.037175
4.6 0.033457
3.4 0.033457
5.9 0.029740
5.5 0.029740
6.1 0.029740
5.0 0.029740
3.7 0.029740
5.3 0.026022
5.8 0.026022
5.4 0.026022
3.8 0.026022
5.6 0.022305
4.3 0.022305
4.2 0.022305
3.2 0.018587
4.4 0.018587
5.7 0.018587
6.4 0.018587
5.1 0.018587
6.2 0.018587
6.5 0.018587
4.1 0.018587
3.6 0.014870
6.0 0.014870
6.3 0.014870
4.0 0.011152
3.5 0.011152
3.3 0.011152
4 0.011152
5 0.007435
3.1 0.007435
2.6 0.007435
2.1 0.007435
2.9 0.007435
2.5 0.007435
3.0 0.007435
2.7 0.007435
2.8 0.007435
2.3 0.003717
2.4 0.003717
3 0.003717
8.0 0.003717
Name: proportion, dtype: float64
Processing column: htn
Possible categories and their frequencies:
htn
no 0.630653
yes 0.369347
Name: proportion, dtype: float64
Processing column: dm
Possible categories and their frequencies:
dm
no 0.655779
yes 0.344221
Name: proportion, dtype: float64
Processing column: cad
Possible categories and their frequencies:
cad
no 0.914573
yes 0.085427
Name: proportion, dtype: float64
Processing column: appet
Possible categories and their frequencies:
appet
good 0.794486
poor 0.205514
Name: proportion, dtype: float64
Processing column: pe
Possible categories and their frequencies:
pe
no 0.809524
yes 0.190476
Name: proportion, dtype: float64
Processing column: ane
Possible categories and their frequencies:
ane
no 0.849624
yes 0.150376
Name: proportion, dtype: float64
Processing column: classification
Possible categories and their frequencies:
classification
ckd 0.625
notckd 0.375
Name: proportion, dtype: float64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 400 non-null int64
1 age 391 non-null float64
2 bp 388 non-null float64
3 sg 353 non-null float64
4 al 354 non-null float64
5 su 351 non-null float64
6 rbc 400 non-null object
7 pc 400 non-null object
8 pcc 400 non-null object
9 ba 400 non-null object
10 bgr 356 non-null float64
11 bu 381 non-null float64
12 sc 383 non-null float64
13 sod 313 non-null float64
14 pot 312 non-null float64
15 hemo 348 non-null float64
16 pcv 400 non-null object
17 wc 400 non-null object
18 rc 400 non-null object
19 htn 400 non-null object
20 dm 400 non-null object
21 cad 400 non-null object
22 appet 400 non-null object
23 pe 400 non-null object
24 ane 400 non-null object
25 classification 400 non-null object
dtypes: float64(11), int64(1), object(14)
memory usage: 81.4+ KB
Number of rows : 400
Number of rows with at least one NaN value: 172
43% of our rows have at least one missing value
 
 
%% Cell type:code id: tags:
 
``` python
# Example usage
scale_normalize(df_kidney,numerical_columns)
```
 
%% Output
 
#######BEFORE SCALING AND NORMALIZING########
id age bp sg al su \
count 400.000000 391.000000 388.000000 353.000000 354.000000 351.000000
mean 199.500000 51.483376 76.469072 1.017408 1.016949 0.450142
std 115.614301 17.169714 13.683637 0.005717 1.352679 1.099191
min 0.000000 2.000000 50.000000 1.005000 0.000000 0.000000
25% 99.750000 42.000000 70.000000 1.010000 0.000000 0.000000
50% 199.500000 55.000000 80.000000 1.020000 0.000000 0.000000
75% 299.250000 64.500000 80.000000 1.020000 2.000000 0.000000
max 399.000000 90.000000 180.000000 1.025000 5.000000 5.000000
bgr bu sc sod pot hemo
count 356.000000 381.000000 383.000000 313.000000 312.000000 348.000000
mean 148.036517 57.425722 3.072454 137.528754 4.627244 12.526437
std 79.281714 50.503006 5.741126 10.408752 3.193904 2.912587
min 22.000000 1.500000 0.400000 4.500000 2.500000 3.100000
25% 99.000000 27.000000 0.900000 135.000000 3.800000 10.300000
50% 121.000000 42.000000 1.300000 138.000000 4.400000 12.650000
75% 163.000000 66.000000 2.800000 142.000000 4.900000 15.000000
max 490.000000 391.000000 76.000000 163.000000 47.000000 17.800000
#######AFTER SCALING AND NORMALIZING########
id age bp sg al \
count 4.000000e+02 3.910000e+02 3.880000e+02 3.530000e+02 3.540000e+02
mean -1.421085e-16 1.272071e-16 2.197555e-16 3.220590e-16 8.028731e-17
std 1.001252e+00 1.001281e+00 1.001291e+00 1.001419e+00 1.001415e+00
min -1.727726e+00 -2.885708e+00 -1.936857e+00 -2.173584e+00 -7.528679e-01
25% -8.638630e-01 -5.530393e-01 -4.733701e-01 -1.297699e+00 -7.528679e-01
50% -9.540979e-17 2.050779e-01 2.583733e-01 4.540705e-01 -7.528679e-01
75% 8.638630e-01 7.590867e-01 2.583733e-01 4.540705e-01 7.277723e-01
max 1.727726e+00 2.246163e+00 7.575807e+00 1.329955e+00 2.948733e+00
su bgr bu sc sod \
count 3.510000e+02 3.560000e+02 3.810000e+02 3.830000e+02 3.130000e+02
mean 2.024338e-17 1.596725e-16 5.594825e-17 1.855203e-17 -1.021547e-15
std 1.001428e+00 1.001407e+00 1.001315e+00 1.001308e+00 1.001601e+00
min -4.101061e-01 -1.591967e+00 -1.108830e+00 -4.661019e-01 -1.280094e+01
25% -4.101061e-01 -6.193803e-01 -6.032459e-01 -3.788971e-01 -2.433340e-01
50% -4.101061e-01 -3.414983e-01 -3.058433e-01 -3.091332e-01 4.534651e-02
75% -4.101061e-01 1.890038e-01 1.700008e-01 -4.751867e-02 4.302539e-01
max 4.145186e+00 4.319341e+00 6.613723e+00 1.271927e+01 2.451017e+00
pot hemo
count 3.120000e+02 3.480000e+02
mean -4.554761e-17 -2.858505e-16
std 1.001606e+00 1.001440e+00
min -6.671023e-01 -3.241109e+00
25% -2.594231e-01 -7.655198e-01
50% -7.126345e-02 4.248496e-02
75% 8.553625e-02 8.504897e-01
max 1.328807e+01 1.813219e+00
 
%% Cell type:code id: tags:
 
``` python
nominal_columns = get_categorical_columns(df_kidney)
df_kidney = convert_categorical_feats(df_kidney, nominal_columns)
```
 
%% Output
 
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/preprocessing/_encoders.py:972: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
warnings.warn(
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/preprocessing/_encoders.py:972: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
warnings.warn(
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/preprocessing/_encoders.py:972: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
warnings.warn(
 
%% Cell type:code id: tags:
 
``` python
fill_numerical_columns(df_kidney, skew_threshold=0.5)
```
 
%% Output
 
id done !
age done !
bp done !
sg done !
al done !
su done !
rbc done !
pc done !
pcc done !
ba done !
bgr done !
bu done !
sc done !
sod done !
pot done !
hemo done !
htn done !
dm done !
cad done !
appet done !
pe done !
ane done !
classification done !
pcv_14 done !
pcv_15 done !
pcv_16 done !
pcv_17 done !
pcv_18 done !
pcv_19 done !
pcv_20 done !
pcv_21 done !
pcv_22 done !
pcv_23 done !
pcv_24 done !
pcv_25 done !
pcv_26 done !
pcv_27 done !
pcv_28 done !
pcv_29 done !
pcv_30 done !
pcv_31 done !
pcv_32 done !
pcv_33 done !
pcv_34 done !
pcv_35 done !
pcv_36 done !
pcv_37 done !
pcv_38 done !
pcv_39 done !
pcv_40 done !
pcv_41 done !
pcv_42 done !
pcv_43 done !
pcv_44 done !
pcv_45 done !
pcv_46 done !
pcv_47 done !
pcv_48 done !
pcv_49 done !
pcv_50 done !
pcv_51 done !
pcv_52 done !
pcv_53 done !
pcv_54 done !
pcv_9 done !
wc_10200 done !
wc_10300 done !
wc_10400 done !
wc_10500 done !
wc_10700 done !
wc_10800 done !
wc_10900 done !
wc_11000 done !
wc_11200 done !
wc_11300 done !
wc_11400 done !
wc_11500 done !
wc_11800 done !
wc_11900 done !
wc_12000 done !
wc_12100 done !
wc_12200 done !
wc_12300 done !
wc_12400 done !
wc_12500 done !
wc_12700 done !
wc_12800 done !
wc_13200 done !
wc_13600 done !
wc_14600 done !
wc_14900 done !
wc_15200 done !
wc_15700 done !
wc_16300 done !
wc_16700 done !
wc_18900 done !
wc_19100 done !
wc_21600 done !
wc_2200 done !
wc_2600 done !
wc_26400 done !
wc_3800 done !
wc_4100 done !
wc_4200 done !
wc_4300 done !
wc_4500 done !
wc_4700 done !
wc_4900 done !
wc_5000 done !
wc_5100 done !
wc_5200 done !
wc_5300 done !
wc_5400 done !
wc_5500 done !
wc_5600 done !
wc_5700 done !
wc_5800 done !
wc_5900 done !
wc_6000 done !
wc_6200 done !
wc_6300 done !
wc_6400 done !
wc_6500 done !
wc_6600 done !
wc_6700 done !
wc_6800 done !
wc_6900 done !
wc_7000 done !
wc_7100 done !
wc_7200 done !
wc_7300 done !
wc_7400 done !
wc_7500 done !
wc_7700 done !
wc_7800 done !
wc_7900 done !
wc_8000 done !
wc_8100 done !
wc_8200 done !
wc_8300 done !
wc_8400 done !
wc_8500 done !
wc_8600 done !
wc_8800 done !
wc_9000 done !
wc_9100 done !
wc_9200 done !
wc_9300 done !
wc_9400 done !
wc_9500 done !
wc_9600 done !
wc_9700 done !
wc_9800 done !
wc_9900 done !
rc_2.1 done !
rc_2.3 done !
rc_2.4 done !
rc_2.5 done !
rc_2.6 done !
rc_2.7 done !
rc_2.8 done !
rc_2.9 done !
rc_3 done !
rc_3.0 done !
rc_3.1 done !
rc_3.2 done !
rc_3.3 done !
rc_3.4 done !
rc_3.5 done !
rc_3.6 done !
rc_3.7 done !
rc_3.8 done !
rc_3.9 done !
rc_4 done !
rc_4.0 done !
rc_4.1 done !
rc_4.2 done !
rc_4.3 done !
rc_4.4 done !
rc_4.5 done !
rc_4.6 done !
rc_4.7 done !
rc_4.8 done !
rc_4.9 done !
rc_5 done !
rc_5.0 done !
rc_5.1 done !
rc_5.2 done !
rc_5.3 done !
rc_5.4 done !
rc_5.5 done !
rc_5.6 done !
rc_5.7 done !
rc_5.8 done !
rc_5.9 done !
rc_6.0 done !
rc_6.1 done !
rc_6.2 done !
rc_6.3 done !
rc_6.4 done !
rc_6.5 done !
 
%% Cell type:code id: tags:
 
``` python
df_kidney.sample()
```
 
%% Output
 
id age bp sg al su rbc pc pcc \
84 -1.000262 0.438345 -0.47337 -1.297699 1.468092 -0.410106 1 0 0
ba ... rc_5.7 rc_5.8 rc_5.9 rc_6.0 rc_6.1 rc_6.2 rc_6.3 rc_6.4 \
84 0 ... 0 0 0 0 0 0 0 0
rc_6.5 rc_8.0
84 0 0
[1 rows x 202 columns]
 
%% Cell type:code id: tags:
 
``` python
df_kidney.isna().sum()
```
 
%% Output
 
id 0
age 0
bp 0
sg 0
al 0
..
rc_6.2 0
rc_6.3 0
rc_6.4 0
rc_6.5 0
rc_8.0 0
Length: 202, dtype: int64
 
%% Cell type:code id: tags:
 
``` python
##
df_kidney.sample()
```
 
%% Output
 
id age bp sg al su rbc pc pcc \
354 1.338013 -1.136206 -1.205114 1.329955 -0.752868 -0.410106 1 1 0
ba ... rc_5.7 rc_5.8 rc_5.9 rc_6.0 rc_6.1 rc_6.2 rc_6.3 rc_6.4 \
354 0 ... 0 0 0 0 0 0 0 0
rc_6.5 rc_8.0
354 0 0
[1 rows x 202 columns]
 
%% Cell type:code id: tags:
 
``` python
df_kidney['classification'].value_counts()
```
 
%% Output
 
classification
0 250
1 150
Name: count, dtype: int64
 
%% Cell type:code id: tags:
 
``` python
result_df, explainable_ratios= feature_selection(df_kidney, 'classification', threshold_variance_ratio=0.90)
```
 
%% Cell type:code id: tags:
 
``` python
result_df['classification'].value_counts()
```
 
%% Output
 
classification
0 250
1 150
Name: count, dtype: int64
 
%% Cell type:code id: tags:
 
``` python
X_train, X_test, y_train, y_test, cv = split(result_df, 'classification',alpha=0.2,n=5)
```
 
%% Cell type:code id: tags:
 
``` python
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].scatter(result_df.loc[result_df['classification'] == 0, 'PCA1'], result_df.loc[result_df['classification'] == 0, 'PCA2'], color='red', label="CKD")
axes[0].scatter(result_df.loc[result_df['classification'] == 1, 'PCA1'], result_df.loc[result_df['classification'] == 1, 'PCA2'], color='green', label="NOT CKD")
axes[0].set_xlabel("PCA1: First component")
axes[0].set_ylabel("PCA2: Second component")
axes[0].legend()
 
axes[1].plot(range(1,len(explainable_ratios)+1), explainable_ratios)
axes[1].axhline(y=0.90, linestyle='--', color='red', label='Threshold Ratio')
axes[1].set_xlabel('Number of components')
axes[1].set_ylabel('Cumulative Explainable variance')
 
plt.tight_layout()
plt.subplots_adjust(wspace=0.4)
plt.show()
 
#The two classes are distinguishable if we project the feature space onto the first two PCA's eigenvectors
```
 
%% Output
 
 
%% Cell type:code id: tags:
 
``` python
dict_models = {
'RandomForestClassifier': {
'model': RandomForestClassifier(),
'param_grid': {
'n_estimators': [50, 100, 150, 200],
'criterion': ['gini', 'entropy'],
'max_depth': [None, 10, 20],
'bootstrap': [True, False]
}
},
'Logistic Regression': {
'model': LogisticRegression(),
'param_grid': {
'C': [0.001, 0.01, 0.1, 1, 10, 100],
'fit_intercept': [True, False],
'intercept_scaling': [1, 10, 100]
}
},
'AdaBoostClassifier': {
'model': AdaBoostClassifier(),
'param_grid': {
'n_estimators': [50, 100, 150, 200],
'algorithm': ['SAMME', 'SAMME.R'],
'learning_rate': [0.01, 0.1, 0.5, 1]
}
}
}
```
 
%% Cell type:code id: tags:
 
``` python
display_results(dict_models, X_train, y_train, X_test, y_test, cv, 'f1 scoring on Kidney data(%)')
```
 
%% Output
 
Going through each model defined in the dictionnary...: 0%| | 0/3 [00:00<?, ?it/s]
 
Fitting 5 folds for each of 48 candidates, totalling 240 fits
 
/Users/ilyaschahed/git/mini-projet-intro-ml/binary_classification_workflow.py:500: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df_results = pd.concat([df_results, pd.DataFrame([new_row])], ignore_index=True)
 
 
Model: RandomForestClassifier
Accuracy: 0.975
Precision: 1.0
Recall: 0.9333333333333333
ROC-AUC: 1.0
 
 
Going through each model defined in the dictionnary...: 33%|███▎ | 1/3 [00:46<01:32, 46.18s/it]
 
Fitting 5 folds for each of 36 candidates, totalling 180 fits
 
 
Model: Logistic Regression
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
ROC-AUC: 1.0
 
 
Going through each model defined in the dictionnary...: 67%|██████▋ | 2/3 [00:48<00:20, 20.11s/it]
 
Fitting 5 folds for each of 32 candidates, totalling 160 fits
 
 
Model: AdaBoostClassifier
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
ROC-AUC: 1.0
 
 
Going through each model defined in the dictionnary...: 100%|██████████| 3/3 [00:49<00:00, 16.64s/it]
 
<pandas.io.formats.style.Styler at 0x12e47c450>
 
%% Cell type:code id: tags:
%% Cell type:markdown id: tags:
 
``` python
```
# Code good practices
%% Cell type:markdown id: tags:
Programming is not just about writing code that works; it's also about writing code that is maintainable, readable, and efficient. Good programming practices contribute to the overall quality of code, making it easier to understand, modify, and collaborate on. Here are some essential good programming practices that we tried to follow in our work.
%% Cell type:markdown id: tags:
### Code redability
%% Cell type:markdown id: tags:
- Use meaningful variable and function names: Choose names that clearly convey the purpose of the variable or function.
- Maintain consistent line length: try to avoid lines longer than 80-120 characters
%% Cell type:markdown id: tags:
### Modularity
%% Cell type:markdown id: tags:
- Break code into functions or classes: Divide your code into smaller, reusable modules. This promotes code reuse and makes it easier to understand.
%% Cell type:markdown id: tags:
### Comments and documentation
%% Cell type:markdown id: tags:
- Write clear comments: Use comments to explain complex logic, assumptions, or any non-obvious aspects of your code.
- Provide documentation: Include docstrings to describe the purpose, parameters, and return values of functions or methods.
%% Cell type:markdown id: tags:
### Other good practices
%% Cell type:markdown id: tags:
- Implement proper error handling: Anticipate and handle exceptions gracefully to prevent unexpected crashes.
- Use version control systems (e.g., Git): Keep track of changes, collaborate with others, and easily revert to previous versions if needed.
- Optimize when necessary: Identify bottlenecks and optimize critical sections of your code. However, prioritize readability over premature optimization.
%% Cell type:markdown id: tags:
### Conclusion
%% Cell type:markdown id: tags:
Adhering to good programming practices is crucial for writing code that is not only functional but also maintainable, scalable, and collaborative. By following these practices, you contribute to the creation of high-quality software that stands the test of time. Remember, writing code is not just about solving a problem; it's about solving it in the best possible way.
......