在機器學習中,不同的特徵常具有不同的範圍,例如身高和體重就是如此。而對於某些分類器來說,是否將不同特徵的範圍調整至相近,可能會對辨識效果有明顯的影響。下列是使用 scikit-learn 的 preprocessing.StandardScaler,將每一維的特徵正規化至平均為 0 且標準差為 1 的範例:

import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler

dataset = load_wine()
total_data_num = dataset.data.shape[0]
print('Data shapes:', dataset.data.shape, dataset.target.shape)

X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target)
print('Training data shapes:', X_train.shape, y_train.shape)
print('Test data shapes:', X_test.shape, y_test.shape)

model = KNeighborsClassifier(n_neighbors=1)
model.fit(X_train, y_train)
pred = model.predict(X_test)
print('Acc using raw feature:', np.mean(100 * np.mean(pred == y_test)))

ss = StandardScaler()
X_train_norm = ss.fit_transform(X_train)
X_test_norm = ss.transform(X_test)
model = KNeighborsClassifier(n_neighbors=1)
model.fit(X_train_norm, y_train)
pred = model.predict(X_test_norm)
print('Acc using normalised feature:', np.mean(100 * np.mean(pred == y_test)))
print('Training data mean (by sklearn):', ss.mean_)
print('Training data mean (by numpy):', np.mean(X_train, axis=0))
print('Training data var (by sklearn):', ss.var_)
print('Training data var (by numpy):', np.std(X_train, axis=0) ** 2)

在上述範例中: