Preprocess data
Finishing preprocess data function. Future things to explore if we have time :
- For categorical columns which contains more than 2 different values that can be cast to integers / float, try to replace nan values with the median of data.
- Try to replace nan values in a "smarter" way (predict the missing value of a column with the most correlated columns).
- See if certain columns seem to be unhelpfull to determine the target variable drop_first. If you do so for categorical columns, do not forget to not use drop_first while one-hot-encoding.