👨‍🚀 ☘️ 💼 Distancia de Mahalanobis 🤸🏻 ▫️ 🤵🏾

Contenido

El significado principal del uso de la métrica de Mahalanobis

1. Términos y definiciones

2. Distancia de Mahalanobis entre dos puntos y entre un punto y una clase

2.1. Información teórica

2.2. Algoritmo para calcular la distancia entre dos puntos y entre un punto y una clase

2.3. Un ejemplo de cálculo de la distancia entre dos puntos y entre un punto y una clase

3. Distancia de Mahalanobis entre dos clases

3.1. Información teórica

3.2. Algoritmo para calcular la distancia entre dos clases

3.3. Un ejemplo de cálculo de la distancia entre dos clases

4. Distancia de Mahalanobis y método de k vecinos más cercanos

5. Distancia ponderada de Mahalanobis

6. Conclusión

Si tiene algún comentario o error, escriba a quwarm@gmail.com o en los comentarios.

El punto principal de usar la distancia de Mahalanobis

En la Figura 1, dos observaciones se representan como puntos rojos.

El centro de la clase se muestra como un punto azul.

Figura 1. Datos 2D con elipses de pronóstico

La pregunta es ¿qué observación está más cerca del centro de la clase?

La respuesta depende de cómo se mida la distancia.

Si medimos la distancia según la métrica euclidiana , obtenemos que la distancia del centro de la clase (0, 0) al punto (-4, 4) es igual a $\ sqrt {32}$ , al punto (5, 5) es igual $\ sqrt {50}$ , es decir , el punto está (-4, 4) más cerca del centro de la clase.

, , (-4, 4) « » , (5, 5) .

, , , (5, 5) , (-4, 4) . , , (0, 0) (-4, 4) 0.15686 , (5, 5) 0.07519 , . . (5, 5) . — .

, , .

1.

— , $\ mathbb {R} ^ n$ , norte — .

— : $C = \ {X_1, \ ldots, X_m \}$ , metro — .

— norte : $X = (x_1, \ ldots, x_n)$ .

norte , — .

«» - . .

-. - — ${\ cuadrado} ^ T$ , .

: $X _ {(k) i}$ .

, (-) .

2.

( ) ( ) .

2.1

— , ( ) COV :

$d_M (U, V, COV ^ {- 1}) = \ sqrt {(U - V) \ cdot COV ^ {- 1} \ cdot (U - V) ^ T}$

, $COV ^ {- 1}$ , .

, .

, ( 1) ( 0) , .

( [internet archive] ), . . d_M COV :

1. : $d_M (U, V, COV ^ {- 1}) = 0 \ iff U = V$ ;

2. : $d_M (U, V, COV ^ {- 1}) = d_M (V, U, COV ^ {- 1})$ ;

3. : $d_M(U,W,COV^{-1}) \le d_M(U,V,COV^{-1})+d_M(V,W,COV^{-1})$ .

: $d_M(U,V,COV^{-1}) \ge 0$ .

, 0, 0 (max(0.0, value)

) NaN, ( sqrt

0.5

) 0 (, $\mathrm {-1e^{-17}} \approx0$ ). .

, — .

( ) , — ( ) (. . « »).

, . , .

(, - , . 4) , .

, , * .

$\mu_i = \frac {1} {|C|} \sum_{X \in C} {X_i}$

$\mu_i$ — , |C| — , X_i — .

$\mu$ :

$\mu = ( \mu_1, \ldots, \mu_n ) = \left ( \frac {1} {|C|} \sum_{X \in C} {X_i} \middle| i=1 \ldots n \right )$

— .

, ().

. , — $n \times n$ , :

$COV= \begin{pmatrix} cov_{1,1} & cov_{1,2} & \cdots & cov_{1,n} \\ cov_{2,1} & cov_{2,2} & \cdots & cov_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ cov_{n,1} & cov_{n,2} & \cdots & cov_{n,n} \end{pmatrix}$

— — ( , . «sample covariance»):

$cov_{a,b} = \frac {1} {|C|-1} \sum_{X \in C} {(X_a - \mu_a) \cdot (X_b - \mu_b)} \tag {SC}$

$\mu_a$ $\mu_b$ — , |C| — .

$\mathrm {(SC)}$ , $\operatorname E_a$ $\operatorname E_b$ . , ( , . «population covariance»):

$cov_{a,b} = \frac {1} {|C|} \sum_{X \in C} {(X_a - \operatorname E_a) \cdot (X_b - \operatorname E_b)} \tag {PC}$

() , $cov_{a,b}>0$ ;
, ( ), $cov_{a,b}<0$ ;
, $cov_{a,b}=0$ ( *).
: $cov_{a,b} = cov_{b,a}$ .
— : $\left|{cov_{a,b}}\right| \leq \sigma_a \sigma_b$ .

— [-1, 1] Y=X^2 .

, :

$\operatorname {cov}(X, Y) = \operatorname {cov}(X, X^2) = M[X \cdot X^2] - M[X] \cdot M[X^2] = \\ = M[X^3] - M[X] \cdot M[X^2] =0-0 \cdot M[X^2] = 0$

— .

COV , , ( ), , COV . .

, :

1. - , , .

: $\{ (1, 1), (2, 1), (3, 1) \}$ .

2. ( $\forall a \forall b \space cov_{a,b}=\sigma_a \sigma_b$ , «perfect covariance»). :

$\{(1, 1), (2, 2), (3, 3)\}$ — ;

$\{(1, 3), (2, 2), (3, 1)\}$ — .

3. |C| :

|C|<n+1

.

, ?

.

, .

:

, ( — ) .
.

2. —

(, ), — - ( ):

$d_{E-M}(U, V, (COV+E)^{-1}) = \sqrt {(U - V) \cdot {(COV+E)}^{-1} \cdot (U - V)^T}$

— , COV .

, .

.

${\square}^{+}$ — ( — ).

:

— ginv MASS (R);

— pinv numpy (Python);

— pinv MATLAB;

— pinv Octave.

, $A^{+}$ , ( ) , «» : $Ax=b \implies x=A^{+}b$ , — , () (); , «» , «» .

. .

, $A^{-1}$ ( , — ), $A^{-1}$ : $\mathrm {det}(A_{n \times n}) \ne 0 \iff A^{+}=A^{-1}$ .

$d_M^+ (U, V, COV^{+}) = \sqrt {(U - V) \cdot COV^{+} \cdot (U - V)^T}$

, : « . , , » ( ).

, , (- ).

(shrinkage) — (. . ).

COV $COV_{(*)} = \left ((1 - \lambda) COV + \lambda T \right)$ , — , $\lambda \in (0,1]$ — , $COV_{(*)}$ $\lambda$ , .

:

$d_{M{(*)}}(U, V, COV^{-1}_{(*)}) = \sqrt {(U - V) \cdot COV^{-1}_{(*)} \cdot (U - V)^T}$

Olivier Ledoit Michael Wolf — $((1 - \lambda) COV + \lambda \mu E)$ , $\mu=\mathbb{trace}(COV)/n$ — COV , , — , $\lambda$ .

, , Python scikit-learn (sklearn.covariance.LedoitWolf, sklearn.covariance.ledoit_wolf, sklearn.covariance.ledoit_wolf_shrinkage).

. 8 , « , , » ( ). — ( , $\lambda$ , ) , .

$\lambda \in (0,1]$ .

$C=\{ (1, 1), (2, 2) \}$ , ( Python):

— $\lambda=0$ ;

— (1,1) (1.5,1.5) : $\approx 0.7071$ ;

— (2,2) : $\approx 0.7071$ ;

— (1,1) (1.51,1.5) : $\approx 671088.64 \ldots {63} \ldots$ ;

— (2,2) : $\approx 671088.64 \ldots 04 \ldots$ .

:

$T=\mathrm {diag}(COV) \implies COV_{(*)}= ((1 - \lambda) COV + \lambda \mathrm {diag}(COV))$

$\mathrm {diag}(COV)$ — COV .

Shrunk Covariance (sklearn.covariance.ShrunkCovariance, sklearn.covariance.shrunk_covariance). $\lambda$ , ( $\lambda_{SC}=0.1$ ).

( Ledoit — Wolf): $((1 - \lambda) COV + \lambda \mu E)$ .

, LedoitWolf ShrunkCovariance ( ) empirical_covariance, (. «population covariance», $\mathrm {(PC)}$ ).

( ), « »:

$d_{std}(U, V, \sigma) = \sqrt {\sum_{i=1}^n {\left (\frac {U_i - V_i} {\sigma_i} \right)^2}}$

$\sigma_i$ — ( / ) ( , «corrected sample standard deviation»):

$\sigma_i = \sqrt {\frac {1} {|C|-1} \sum_{X \in C} {(X_i - \mu_i)^2}} \tag {CSSD}$

$\mu_i$ — / ) .

:

$\mu_i = \frac {1} {|C|} \sum_{X \in C} {X_i}$

$\mathrm {(CSSD)}$ , $\operatorname E_i$ . , ( , . «uncorrected sample standard deviation»):

$\sigma_i = \sqrt {\frac {1} {|C|} \sum_{X \in C} {(X_i - \operatorname E_i)^2}} \tag {USSD}$

- .

$d_{diag}(U, V, \sigma) = d_{std}(U, V, \sigma) \cdot \sqrt[n] {\prod^n_{i=1} \sigma^2_i}$

$d_{diag}(U, V, \sigma) = \sqrt {\sum_{i=1}^n {\left (\frac {U_i - V_i} {\sigma_i} \right)^2}} \cdot \sqrt[n] {\prod^n_{i=1} \sigma^2_i}$

- .

, .

, , 10 , . , , , .

2.2

1. .

2. .

3. .

4. , . , .

2.3

(4, 2) (. 3):

$C_{(1)} = \{ ( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 3 ) , ( 4 , 4 ) , ( 5 , 5 ) \} \\ C_{(2)} = \{ ( 3 , 1 ) , ( 4 , 0 ) , ( 6 , 0 ) , ( 6 , 2 ) , ( 5 , 3 ) \}$

1. .

$\mu_{(1)} = \left (\frac {1 + 2 + 3 + 4 + 5} {5}, \frac {1 + 2 + 3 + 4 + 5} {5} \right) = (3, 3) \\ \mu_{(2)} = \left (\frac {3 + 4 + 6 + 6 + 5} {5}, \frac {1 + 0 + 0 + 2 + 3} {5} \right) = (4.8, 1.2)$

2. .

$\sigma_{(1)1} = \sqrt {\frac {(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2} {5 - 1}} = \sqrt {2.5} \\ \sigma_{(1)2} = \sqrt {\frac {(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2} {5 - 1}} = \sqrt {2.5}$

$\sigma_{(2)1} = \sqrt {\frac {(3-4.8)^2+(4-4.8)^2+(6-4.8)^2+(6-4.8)^2+(5-4.8)^2} {5 - 1}} = \sqrt {1.7} \\ \sigma_{(2)2} = \sqrt {\frac {(1-1.2)^2+(0-1.2)^2+(0-1.2)^2+(2-1.2)^2+(3-1.2)^2} {5 - 1}} = \sqrt {1.7}$

3. .

$cov_{(1)1,1} = \sigma^2_{(1)1} = 2.5 \quad cov_{(1)1,2} = cov_{(1)2,1} \quad cov_{(1)2,2} = \sigma^2_{(1)2} = 2.5 \\ cov_{(1)1,2} = \frac {1} {5-1} \sum_{X \in C_{(1)}} {(X_1 - \mu_1) \cdot (X_2 - \mu_2)} = \\ \frac {1} {4} \left ( (1 - 3) (1 - 3) + (2 - 3) (2 - 3) + (3 - 3) (3 - 3) + \\ + (4 - 3) (4 - 3) + (5 - 3) (5 - 3) \right) = \frac {10} {4} = 2.5$

$COV_{(1)} = \begin{pmatrix} cov_{(1)1,1} & cov_{(1)1,2} \\ cov_{(1)2,1} & cov_{(1)2,2} \end{pmatrix} = \begin{pmatrix} 2.5 & 2.5 \\ 2.5 & 2.5 \end{pmatrix}$

: $2.5 \cdot 2.5 - 2.5 \cdot 2.5 = 0$ . , $COV_{(1)}$ .

$cov_{(2)1,1} = \sigma^2_{(2)1} = 1.7 \quad cov_{(2)1,2} = cov_{(2)2,1} \quad cov_{(2)2,2} = \sigma^2_{(2)2} = 1.7 \\ cov_{(2)1,2} = \frac {1} {5-1} \sum_{X \in C_{(2)}} {(X_1 - \mu_1) \cdot (X_2 - \mu_2)} = \\ = \frac {1} {4} \left ( (3 - 4.8) (1 - 1.2) + (4 - 4.8) (0 - 1.2) + (6 - 4.8) (0 - 1.2) + \\ + (6 - 4.8) (2 - 1.2) + (5 - 4.8) (3 - 1.2) \right) = \frac {1.2} {4} = 0.3$

$COV_{(2)} = \begin{pmatrix} cov_{(2)1,1} & cov_{(2)1,2} \\ cov_{(2)2,1} & cov_{(2)2,2} \end{pmatrix} = \begin{pmatrix} 1.7 & 0.3 \\ 0.3 & 1.7 \end{pmatrix}$

: $1.7*1.7-0.3*0.3=2.8 \neq 0$ . , $COV_{(2)}$ .

Python 3.6 numpy 1.19.5

import numpy as np

classes = [
    np.array([[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]]),
    np.array([[3, 1], [4, 0], [6, 0], [6, 2], [5, 3]])
]

centroids = [class_.mean(axis=0) for class_ in classes]
standard_deviations = [class_.std(axis=0, ddof=1) for class_ in classes]
covariance_matrices = np.array([np.cov(class_, rowvar=False, ddof=1) for class_ in classes])
det_covariance_matrices = [np.linalg.det(cov) for cov in covariance_matrices]

print("Centroids:", *centroids)
print("Standard deviations:", *standard_deviations)
print("Covariance matrices:", *covariance_matrices.tolist())
print("Determinants of covariance matrices:", det_covariance_matrices)

Centroids: [3. 3.] [4.8 1.2]
Standard deviations: [1.58113883 1.58113883] [1.30384048 1.30384048]
Covariance matrices: [[2.5, 2.5], [2.5, 2.5]] [[1.7, 0.3], [0.3, 1.7]]
Determinants of covariance matrices: [0.0, 2.8]

4. , — . , .

, , , .

(4,2) , .

. — , 5 : (1) — , (2) (LedoitWolf), (3) , (4) , (5) — :

1. — .

$d_{E-M}\left((4,2), (1,1), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.8257 \\ d_{E-M}\left((4,2), (2,2), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.5275 \\ d_{E-M}\left((4,2), (3,3), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.4142 \\ d_{E-M}\left((4,2), (4,4), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.5275 \\ d_{E-M}\left((4,2), (5,5), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.8257$

Python 3.6 numpy 1.19.5

import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
inverse_covariance_matrix = np.linalg.inv(covariance_matrix + np.identity(covariance_matrix.shape[0]))
print("  :\n", inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_E-M (", test_point, ", ", point_to, ", (COV+E)^(-1)) = ",
          mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

  :
[[ 0.58333333 -0.41666667]
 [-0.41666667  0.58333333]]
d_E-M ([4. 2.], [3. 3.], (COV+E)^(-1)) = 1.414213562373095
d_E-M ([4. 2.], [1. 1.], (COV+E)^(-1)) = 1.8257418583505538
d_E-M ([4. 2.], [2. 2.], (COV+E)^(-1)) = 1.5275252316519465
d_E-M ([4. 2.], [3. 3.], (COV+E)^(-1)) = 1.414213562373095
d_E-M ([4. 2.], [4. 4.], (COV+E)^(-1)) = 1.5275252316519465
d_E-M ([4. 2.], [5. 5.], (COV+E)^(-1)) = 1.8257418583505536

— , .

2. (LedoitWolf).

$d_{M{(*)}}\left((4,2), (1,1), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.4284 \\ d_{M{(*)}}\left((4,2), (2,2), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.0378 \\ d_{M{(*)}}\left((4,2), (3,3), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 1.8898 \\ d_{M{(*)}}\left((4,2), (4,4), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.0378 \\ d_{M{(*)}}\left((4,2), (5,5), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.4284$

Python 3.6 numpy 1.19.5 scikit-learn 0.24.1

import numpy as np
from sklearn.covariance import LedoitWolf


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
lw = LedoitWolf().fit(class_)
lw_covariance_matrix = lw.covariance_
lw_lambda = lw.shrinkage_
covariance_matrix = np.cov(class_, rowvar=False, ddof=0)
mu = np.sum(np.trace(covariance_matrix)) / class_.shape[0]
T = mu * np.identity(class_.shape[1])
print("T:", *T)
print("COV(*):", *lw_covariance_matrix)
print("Lambda:", lw_lambda)

#   - T    
# ( :     T )
# ddof=0, . . LedoitWolf  empirical_covariance (.  )
first_condition = (np.linalg.eig(T)[0] > approx(0., sign=+1)).all()
print("All(", np.linalg.eig(T)[0], ") > 0 ? -> ", first_condition, sep='')

#   -    (0, 1]
second_condition = approx(0., sign=+1) < lw_lambda <= 1
print("Lambda =", lw_lambda, "in (0, 1] ? ->", second_condition)

#   -     COV(*)
#     lambda,      T
cov_eig = min(np.linalg.eig(lw_covariance_matrix)[0])
lambda_t_eig = lw_lambda * min(np.linalg.eig(T)[0])
third_condition = cov_eig >= lambda_t_eig
print(cov_eig, ">=", lambda_t_eig, "? ->", third_condition)
conditions = [first_condition, second_condition, third_condition]

if all(conditions):
    print("   ")
    #  
    inverse_lw_covariance_matrix = np.linalg.inv(lw_covariance_matrix)
    print("  :\n", inverse_lw_covariance_matrix, sep='')
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_M(*) (", test_point, ", ", point_to, ", COV(*)) = ",
              mahalanobis(test_point, point_to, inverse_lw_covariance_matrix), sep='')
else:
    print("  (1-3): ", [i for i, x in enumerate(conditions, 1) if not x])

T: [0.8 0. ] [0.  0.8]
COV(*): [2.   1.44] [1.44 2.  ]
Lambda: 0.27999999999999997
All([0.8 0.8]) > 0 ? -> True
Lambda = 0.27999999999999997 in (0, 1] ? -> True
0.56 >= 0.22399999999999998 ? -> True
   
  :
[[ 1.03820598 -0.74750831]
 [-0.74750831  1.03820598]]
d_M(*) ([4. 2.], [3. 3.], COV(*)) = 1.889822365046136
d_M(*) ([4. 2.], [1. 1.], COV(*)) = 2.4283759936997833
d_M(*) ([4. 2.], [2. 2.], COV(*)) = 2.037847864848056
d_M(*) ([4. 2.], [3. 3.], COV(*)) = 1.889822365046136
d_M(*) ([4. 2.], [4. 4.], COV(*)) = 2.037847864848056
d_M(*) ([4. 2.], [5. 5.], COV(*)) = 2.4283759936997833

— , .

3. .

. .

$d_{M^+}\left((4,2), (1,1), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 1.2649 \\ d_{M^+}\left((4,2), (2,2), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 0.6324 \\ d_{M^+}\left((4,2), (3,3), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) = 0.0000 \\ d_{M^+}\left((4,2), (4,4), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 0.6324 \\ d_{M^+}\left((4,2), (5,5), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 1.2649$

, — .

Python 3.6 numpy 1.19.5

import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
#    (Singular Value Decomposition, SVD)
#    
pseudo_inverse_covariance_matrix = np.linalg.pinv(covariance_matrix)
print("  :\n", pseudo_inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_M+ (", test_point, ", ", point_to, ", COV+) = ",
          mahalanobis(test_point, point_to, pseudo_inverse_covariance_matrix), sep='')

  :
[[0.1 0.1]
 [0.1 0.1]]
d_M+ ([4. 2.], [3. 3.], COV+) = 0.0
d_M+ ([4. 2.], [1. 1.], COV+) = 1.2649110640673513
d_M+ ([4. 2.], [2. 2.], COV+) = 0.6324555320336757
d_M+ ([4. 2.], [3. 3.], COV+) = 0.0
d_M+ ([4. 2.], [4. 4.], COV+) = 0.6324555320336757
d_M+ ([4. 2.], [5. 5.], COV+) = 1.2649110640673513

— , .

4. .

$d_{std}((4,2), (1,1), (\sqrt {2.5}, \sqrt {2.5})) = 2.0000 \\ d_{std}((4,2), (2,2), (\sqrt {2.5}, \sqrt {2.5})) \approx 1.2649 \\ d_{std}((4,2), (3,3), (\sqrt {2.5}, \sqrt {2.5})) \approx 0.8944 \\ d_{std}((4,2), (4,4), (\sqrt {2.5}, \sqrt {2.5})) \approx 1.2649 \\ d_{std}((4,2), (5,5), (\sqrt {2.5}, \sqrt {2.5})) = 2.0000$

Python 3.6 numpy 1.19.5

import numpy as np


def euclid_std(point_from, point_to, standard_deviations):
    return sum(((point_from - point_to) / standard_deviations) ** 2) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
standard_deviations = class_.std(axis=0, ddof=1)

#       0
std_le_0 = standard_deviations <= approx(0., sign=+1, epsilon=1e-6)
print(" :\n", standard_deviations, sep='')

if std_le_0.any():
    print("     0: ", np.where(std_le_0)[0])
else:
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_std (", test_point, ", ", point_to, ", sigma) = ",
              euclid_std(test_point, point_to, standard_deviations), sep='')

 :
[1.58113883 1.58113883]
d_std ([4. 2.], [3. 3.], sigma) = 0.8944271909999159
d_std ([4. 2.], [1. 1.], sigma) = 1.9999999999999998
d_std ([4. 2.], [2. 2.], sigma) = 1.2649110640673518
d_std ([4. 2.], [3. 3.], sigma) = 0.8944271909999159
d_std ([4. 2.], [4. 4.], sigma) = 1.2649110640673518
d_std ([4. 2.], [5. 5.], sigma) = 1.9999999999999998

— , .

5. .

$d_{diag}((4,2), (1,1), (\sqrt {2.5}, \sqrt {2.5})) = 5.0000 \\ d_{diag}((4,2), (2,2), (\sqrt {2.5}, \sqrt {2.5})) \approx 3.1623 \\ d_{diag}((4,2), (3,3), (\sqrt {2.5}, \sqrt {2.5})) \approx 2.2360 \\ d_{diag}((4,2), (4,4), (\sqrt {2.5}, \sqrt {2.5})) \approx 3.1623 \\ d_{diag}((4,2), (5,5), (\sqrt {2.5}, \sqrt {2.5})) = 5.0000$

Python 3.6 numpy 1.19.5

import numpy as np


def euclid_std(point_from, point_to, standard_deviations):
    return sum(((point_from - point_to) / standard_deviations) ** 2) ** 0.5


def euclid_diag(point_from, point_to, standard_deviations):
    return euclid_std(point_from, point_to, standard_deviations) \
           * (np.prod(standard_deviations ** 2)) ** (1. / point_from.shape[0])


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
standard_deviations = class_.std(axis=0, ddof=1)

#       0
std_le_0 = standard_deviations <= approx(0., sign=+1, epsilon=1e-6)
print(" :\n", standard_deviations, sep='')

if std_le_0.any():
    print("     0: ", np.where(std_le_0)[0])
else:
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_diag (", test_point, ", ", point_to, ", sigma) = ",
              euclid_diag(test_point, point_to, standard_deviations), sep='')

 :
[1.58113883 1.58113883]
d_diag ([4. 2.], [3. 3.], sigma) = 2.2360679774997902
d_diag ([4. 2.], [1. 1.], sigma) = 5.0
d_diag ([4. 2.], [2. 2.], sigma) = 3.16227766016838
d_diag ([4. 2.], [3. 3.], sigma) = 2.2360679774997902
d_diag ([4. 2.], [4. 4.], sigma) = 3.16227766016838
d_diag ([4. 2.], [5. 5.], sigma) = 5.0

— , .

. ( ). :

$COV^{-1}_{(2)} = \frac {1} {\Delta_{(2)}} A^{T}_{(2)}$

$\Delta_{(2)}$ — $COV_{(2)}$ , $A^{T}_{(2)}$ — .

$A^{T}_{(2)} = \begin{pmatrix} A_{(2)1,1} & A_{(2)2,1} \\ A_{(2)1,2} & A_{(2)2,2} \end{pmatrix} = \begin{pmatrix} 1.7 & -0.3 \\ -0.3 & 1.7 \end{pmatrix} \\ COV^{-1}_{(2)} = \frac {1} {2.8} \begin{pmatrix} 1.7 & -0.3 \\ -0.3 & 1.7 \end{pmatrix} = \begin{pmatrix} 0.6071 & -0.1071 \\ -0.1071 & 0.6071 \end{pmatrix}$ $d_{M}((4,2), (3,1), COV^{-1}_{(2)}) = \sqrt {((4,2)-(3,1)) \cdot COV^{-1}_{(2)} \cdot ((4,2)-(3,1))^T} = \\ \sqrt {(4-3, 2-1) \cdot \begin{pmatrix} 0.6071 & -0.1071 \\ -0.1071 & 0.6071 \end{pmatrix} \cdot (4-3, 2-1)^T} = \\ \sqrt {(0.5, 0.5) \cdot \begin{pmatrix} 1 \\ 1 \end{pmatrix}} = 1$ $d_{M}((4,2), (4.8,1.2), COV^{-1}_{(2)}) \approx 0.9562 \\ d_{M}((4,2), (3,1), COV^{-1}_{(2)}) = 1.0000 \\ d_{M}((4,2), (4,0), COV^{-1}_{(2)}) \approx 1.5584 \\ d_{M}((4,2), (6,0), COV^{-1}_{(2)}) \approx 2.3905 \\ d_{M}((4,2), (6,2), COV^{-1}_{(2)}) \approx 1.5584 \\ d_{M}((4,2), (5,3), COV^{-1}_{(2)}) = 1.0000$ $d_{E-M}((4,2), (4.8,1.2), (COV_{(2)}+E)^{-1}) \approx 0.7303 \\ d_{E-M}((4,2), (3,1), (COV_{(2)}+E)^{-1}) \approx 0.8165 \\ d_{E-M}((4,2), (4,0), (COV_{(2)}+E)^{-1}) \approx 1.2247 \\ d_{E-M}((4,2), (6,0), (COV_{(2)}+E)^{-1}) \approx 1.8257 \\ d_{E-M}((4,2), (6,2), (COV_{(2)}+E)^{-1}) \approx 1.2247 \\ d_{E-M}((4,2), (5,3), (COV_{(2)}+E)^{-1}) \approx 0.8165$

Python 3.6 numpy 1.19.5

import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[3., 1.], [4., 0.], [6., 0.], [6., 2.], [5., 3.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
if abs(np.linalg.det(covariance_matrix)) <= approx(0., sign=+1):
    print("  0.  .")
else:
    inverse_covariance_matrix = np.linalg.inv(covariance_matrix)
    print("   (d_M):\n", inverse_covariance_matrix, sep='')
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_M (", test_point, ", ", point_to, ", COV^(-1)) = ",
              mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

covariance_matrix = covariance_matrix + np.identity(class_.shape[1])
inverse_covariance_matrix = np.linalg.inv(covariance_matrix)
print("   (d_E-M):\n", inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_E-M (", test_point, ", ", point_to, ", (COV+E)^(-1)) = ",
          mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

   (d_M):
[[ 0.60714286 -0.10714286]
 [-0.10714286  0.60714286]]
d_M ([4. 2.], [4.8 1.2], COV^(-1)) = 0.9561828874675149
d_M ([4. 2.], [3. 1.], COV^(-1)) = 1.0
d_M ([4. 2.], [4. 0.], COV^(-1)) = 1.5583874449479593
d_M ([4. 2.], [6. 0.], COV^(-1)) = 2.3904572186687876
d_M ([4. 2.], [6. 2.], COV^(-1)) = 1.5583874449479593
d_M ([4. 2.], [5. 3.], COV^(-1)) = 1.0
   (d_E-M):
[[ 0.375      -0.04166667]
 [-0.04166667  0.375     ]]
d_E-M ([4. 2.], [4.8 1.2], (COV+E)^(-1)) = 0.7302967433402214
d_E-M ([4. 2.], [3. 1.], (COV+E)^(-1)) = 0.8164965809277259
d_E-M ([4. 2.], [4. 0.], (COV+E)^(-1)) = 1.224744871391589
d_E-M ([4. 2.], [6. 0.], (COV+E)^(-1)) = 1.8257418583505536
d_E-M ([4. 2.], [6. 2.], (COV+E)^(-1)) = 1.224744871391589
d_E-M ([4. 2.], [5. 3.], (COV+E)^(-1)) = 0.8164965809277259

— .

•

:

1. — : $\approx 1.4142$ ;

2. (LedoitWolf): $\approx 1.8898$ ;

3. : 0.0000 ;

4. : $\approx 0.8944$ ;

5. : $\approx 2.2360$ .

( ): 1.0000 .

( — ): $\approx 0.8165$ .

3, ( ) — , .

•

:

1. — : $\approx 1.8257$ ;

2. (LedoitWolf): $\approx 2.4284$ ;

3. : $\approx 1.2649$ ;

4. : 2.0000 ;

5. : 5.0000 .

( ): $\approx 2.3905$ .

( — ): $\approx 1.8257$ .

3, ( ) .

•

:

1. — : $\approx 1.4142$ ;

2. (LedoitWolf): $\approx 1.8898$ ;

3. : 0.0000 ;

4. : $\approx 0.8944$ ;

5. : $\approx 2.2360$ .

( ): $\approx 0.9562$ .

( — ): $\approx 0.7303$ .

3, ( ) — , .

3.

( ) .

, . .:

— $d_M(C_i, C_j, COV_0^{-1}) \ge 0$ , $d_M(C_i, C_j, COV_0^{-1})=0 \impliedby C_i=C_j$ , $d_M(C_i, C_j, COV_0^{-1})=d_M(C_j, C_i, COV_0^{-1})$ .

— ( ) $d_M(C_i, C_j, COV_0^{-1})=0 \implies C_i=C_j$ .

.

3.1

— C_1 C_2 $\overline {C_1}$ $\overline {C_2}$ COV_1 COV_2 :

$d_M \left(\overline {C_1}, \overline {C_2}, COV^{-1}_0 \right) = \sqrt {\left (\overline {C_1} - \overline {C_2} \right) \cdot COV^{-1}_0 \cdot \left (\overline {C_1} - \overline {C_2} \right)^T} \\ COV_0 = \frac {1} {|C_1| + |C_2| - 2} \left (COV_{(1)} + COV_{(2)} \right)$

COV_0 — , $COV^{-1}_0$ — , COV_1 COV_2 — , |C_1| |C_2| — .

$COV_0 = COV_{(1)} + COV_{(2)}$

.

, , .

, , ( ) (. 2 « »):

$d_M \left (X_{(1)}, \overline C_2, COV^{-1}_0 \right) = \sqrt {\left(X_{(1)} - \overline {C_2} \right) \cdot COV^{-1}_0 \cdot \left (X_{(1)} - \overline {C_2} \right)^T} \\ COV_0 = 0 + COV_{(2)} = COV_{(2)}$

- :

$d_{E-M} \left(\overline {C_1}, \overline {C_2}, \left (COV_0+E \right)^{-1} \right) = \sqrt {\left (\overline {C_1} - \overline {C_2} \right) \cdot (COV_0+E)^{-1} \cdot \left (\overline {C_1} - \overline {C_2} \right)^T}$

— , COV_0 .

3.2

1. .

2. .

3. , .

4. , . , — .

3.3

. 2.2.

$C_{(1)} = \{ ( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 3 ) , ( 4 , 4 ) , ( 5 , 5 ) \} \\ C_{(2)} = \{ ( 3 , 1 ) , ( 4 , 0 ) , ( 6 , 0 ) , ( 6 , 2 ) , ( 5 , 3 ) \}$

.

3 . 2.2.

4 .

4. , . , — .

$COV_0 = \frac {1} {5 + 5 - 2} \left (\begin{pmatrix} 2.5 & 2.5 \\ 2.5 & 2.5 \end{pmatrix} + \begin{pmatrix} 1.7 & 0.3 \\ 0.3 & 1.7 \end{pmatrix} \right) = \\ = \frac {1} {8} \begin{pmatrix} 4.2 & 2.8 \\ 2.8 & 4.2 \end{pmatrix} = \begin{pmatrix} 0.525 & 0.35 \\ 0.35 & 0.525 \end{pmatrix}$

$COV^{-1}_0 \approx \begin{pmatrix} 3.42857143 & -2.28571429 \\ -2.28571429 & 3.42857143 \end{pmatrix}$

•

$\approx 6.0851$ .

$d_M \left((3, 3), (4.8, 1.2), COV^{-1}_0 \right) = \\ = \sqrt {\left ((3, 3) - (4.8, 1.2) \right) \cdot COV^{-1}_0 \cdot \left ((3, 3) - (4.8, 1.2) \right)^T} = \\ = \sqrt {(-1.8, 1.8) \cdot \begin{pmatrix} 3.42857143 & -2.28571429 \\ -2.28571429 & 3.42857143 \end{pmatrix} \cdot (-1.8, 1.8)^T} \approx 6.0851$

— $\approx 2.3484$ .

$COV_0 = COV_{(1)} + COV_{(2)}$ :

— $\approx 2.1514$ ;

— — $\approx 1.6432$ .

•

$\approx 3.3806$ (2,2) (3,1) (4,4) (5,3) .

— $\approx 1.3047$ (2,2) (3,1) (4,4) (5,3) .

$COV_0 = COV_{(1)} + COV_{(2)}$ :

— $\approx 1.1952$ (2,2) (3,1) (4,4) (5,3) ;

— — $\approx 0.9129$ (2,2) (3,1) (4,4) (5,3) .

•

$\approx 10.5830$ (1,1) (6,0) (5,5) (6,0) .

— $\approx 4.4256$ (1,1) (6,0) (5,5) (6,0) .

$COV_0 = COV_{(1)} + COV_{(2)}$ :

— $\approx 3.7417$ (1,1) (6,0) (5,5) (6,0) ;

— — $\approx 2.9155$ (1,1) (6,0) (5,5) (6,0) .

Python 3.6 numpy 1.19.5 .

4. k-

- . , - ( ) .

- .

:

— : $(COV+E)^{-1}$ ( — );

— ( , ).

Python 3.6 numpy 1.19.5

#    

import heapq
from collections import Counter
from operator import itemgetter

import numpy as np


class MkNN:
    def __init__(self, k, classes, inv_cov_matrices):
        self.n = len(classes)
        self.k = k
        self.classes = classes
        self.inv_cov_matrices = inv_cov_matrices

    @staticmethod
    def mahalanobis_sqr(point_from, point_to, inverse_covariance_matrix):
        delta = point_from - point_to
        return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta))

    def _get_k_smallest(self, test_point):
        generator = (
            (MkNN.mahalanobis_sqr(test_point, point, inv_cov), i)
            for i, (class_, inv_cov) in enumerate(zip(self.classes, self.inv_cov_matrices))
            for point in class_
        )
        return heapq.nsmallest(self.k, generator, key=itemgetter(0))

    def predict(self, test_point):
        return heapq.nlargest(1, Counter((i for _, i in self._get_k_smallest(test_point))).items(),
                              key=lambda t: (t[1], -t[0]))[0][0]

    def predict_proba(self, test_point):
        most_common = Counter((i for _, i in self._get_k_smallest(test_point)))
        classes_proba = np.array([most_common.get(i, 0) for i in range(self.n)])
        return classes_proba / classes_proba.sum()

    def predict_all_max(self, test_point):
        p = self.predict_proba(test_point)
        return np.where(p == max(p))[0]


def main():
    #  ,     -  classes
    test_points = np.array([[4., 2.]])
    #   
    classes = [
        np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]]),
        np.array([[3., 1.], [4., 0.], [6., 0.], [6., 2.], [5., 3.]])
    ]
    #   
    n_train_points = sum(class_.shape[0] for class_ in classes)
    #      
    cov_matrices = [np.cov(class_, rowvar=False, ddof=1) for class_ in classes]
    #        --   - 
    inv_cov_matrices = [np.linalg.inv(cov + np.identity(cov.shape[0])) for cov in cov_matrices]
    for test_point in test_points:
        print("Point:", test_point)
        # k  1    ( )
        for i in range(1, n_train_points):
            classifier = MkNN(i, classes, inv_cov_matrices)
            print(f"{i}nn:",
                  1 + classifier.predict(test_point),
                  classifier.predict_proba(test_point),
                  classifier.predict_all_max(test_point))


if __name__ == "__main__":
    main()

:

"knn: [ ( 1), ] [ ] [ ( 0), ]".

Point: [4. 2.]
1nn: 2 [0. 1.] [1]
2nn: 2 [0. 1.] [1]
3nn: 2 [0. 1.] [1]
4nn: 2 [0. 1.] [1]
5nn: 2 [0.2 0.8] [1]
6nn: 2 [0.33333333 0.66666667] [1]
7nn: 2 [0.42857143 0.57142857] [1]
8nn: 1 [0.5 0.5] [0 1]
9nn: 2 [0.44444444 0.55555556] [1]

. 4.

Python 3.6

# ...

from operator import sub

import numpy as np  # 1.19.5
from matplotlib import colors, pyplot as plt  # 3.3.4

#    
def show_data_on_mesh(k, classes, inv_cov_matrices):
    #  
    min_ = np.min([np.min(class_, axis=0) for class_ in classes], axis=1) - 1
    max_ = np.max([np.max(class_, axis=0) for class_ in classes], axis=1) + 1
    min_c = min(min_[0], min_[1])
    max_c = max(max_[0], max_[1])
    h = 0.05
    test_mesh = np.meshgrid(np.arange(min_c, max_c, h), np.arange(min_c, max_c, h))
    test_points = np.c_[test_mesh[0].ravel(), test_mesh[1].ravel()]
    #   
    classifier = MkNN(k, classes, inv_cov_matrices)
    test_mesh_labels = [sub(*classifier.predict_proba(x)) for x in test_points]
    #  
    plt.figure(figsize=(6, 5), dpi=90)
    class_colormap = colors.ListedColormap(['#070648', '#480607'])
    plt.pcolormesh(test_mesh[0], test_mesh[1],
                   np.asarray(test_mesh_labels).reshape(test_mesh[0].shape),
                   cmap='coolwarm', shading='nearest')
    plt.colorbar()
    plt.scatter([point[0] for class_ in classes for point in class_],
                [point[1] for class_ in classes for point in class_],
                c=[-i for i, class_ in enumerate(classes) for _ in class_],
                cmap=class_colormap)
    plt.axis([min_c, max_c, min_c, max_c])
    plt.xlabel("X")
    plt.ylabel("Y")
    plt.title("k=" + str(k))
    plt.show()

# ...

— .

5.

, — $\Lambda$ , :

$d_{M-\Lambda} ( U , V, COV^{-1}, \Lambda ) = \sqrt { ( U - V ) * \Lambda \cdot COV^{-1} \cdot \Lambda^T * ( U - V )^T }$

( , ) :

$\Lambda = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$

— :

$d_ {EM- \ Lambda} (U, V, \ left (COV + E \ right) ^ {- 1}, \ Lambda) = \ sqrt {(U - V) * \ Lambda \ cdot \ left (COV + E \ right) ^ {- 1} \ cdot \ Lambda ^ T * (U - V) ^ T}$

6.

, ( ) , - — .

?

1. , .

2. «Metric Learning» (: Aurélien Bellet, Amaury Habrard, Marc Sebban; ).

3. (, k-means: 1, 2, 3).

4. ().

5. ( 1, artículo 2 , artículo 3 ).

Si tiene algún comentario o error, escriba a quwarm@gmail.com o en los comentarios.

Distancia de Mahalanobis

Contenido

El punto principal de usar la distancia de Mahalanobis

1.

2.

2.1

2.2

2.3

3.

3.1

3.2

3.3

4. k-

5.

6.

More articles: