Skip to content

Preprocess data

Finishing preprocess data function. Future things to explore if we have time :

  • For categorical columns which contains more than 2 different values that can be cast to integers / float, try to replace nan values with the median of data.
  • Try to replace nan values in a "smarter" way (predict the missing value of a column with the most correlated columns).
  • See if certain columns seem to be unhelpfull to determine the target variable drop_first. If you do so for categorical columns, do not forget to not use drop_first while one-hot-encoding.

Merge request reports

Loading