Not all input data in machine learning is numerical. We often have to deal with categorical data stored as strings. A categorical variable is a variable that has two or more categories. For example:

  • ‘colours’ which can be red, blue and yellow.
  • ‘grades’ that take values like A, B, C, D and E.