Puspa Subedi 2 years ago

Day1. Data Preprocessing with Scikit Learn

Blog image

Scikit-learn is a free software machine-learning library for the Python programming language. It features various classification, regression, and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means, and DBSCAN. It is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project.

Standardizing Data

Data can contain all sorts of different values. These values can be in kilograms, grams, calories, etc. When data has a diverse range of values then it isn't easy to interpret it.

Standardize can be obtained through this formula:  z =  data(x - u)mean / standard deviation

In scikit-learn, we can obtain this through the function scale of Sklearn.preprocessing module