Anonymous
The data should be cleaned, pre processed and converted into the required format necessary for the specific ML algorithm being used. For example, in case of numeric or categorical data there could be missing values that should be imputed. Values in certain columns may need to be bucketed. Unstructured data has to be formatted in the required input format. In case of text data, we may have to tokenize, lematize and perhaps also create an embedding of the text data. The possibilities are many and depends on the specific task at hand.