The motivation for the mahalanobis distance is found in Wikipedia:
The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936. Mahalanobis’s definition was prompted by the problem of identifying the similarities of skulls based on measurements in 1927.
In this definition, the goal is to measure at how far from the center (in a multivariate gaussian distribution) the sample is from the mean in standard deviations.
$$ d(x,y,P) = \sqrt{(x-y)^\mathsf{T} P (x-y)} $$
where $x,y \in \mathbb{R^n}$ and $P \in \textbf{S}_+$, in the gaussian distribution context, $P$ is the inverse of the covariance matrix $\Sigma$ (i.e. $P=\Sigma^{-1}$) and $y$ can be translated to be the mean of the distribution.
On the other hand our motivation is, given a vector $y$ and a distance matrix $P$, find the nearest vector $x$ respecting a sum constraint.
In our case, given $y$, we want to find the values of $x$ such that the sum of its elements is equal to some target $t$. We can rewrite our problem as.
$$ \begin{align} \begin{split} minimize & \quad \frac{1}{2}(x-y)^\mathsf{T} P (x-y)\\ s.t. & \quad \textbf{1}x=t \end{split} \end{align} $$
Where $\textbf{1}$ is the vector of ones of size $n$. We can solve directly for $x$ using Lagrange Multipliers. Remember that the lagrangian function can be expressed as $\mathcal{L} = f(x)+ \lambda g(x)$, so, we rewrite as follows:
$$ \begin{align} \begin{split} f(x) = \frac{1}{2}(x-y)^\mathsf{T} P (x-y)\\ g(x) = \textbf{1}^\mathsf{T}x-t \\ \end{split} \end{align} $$
We need to find the roots, so we derive $\mathcal{L}$ w.r.t. $x$ and $\lambda$.
$$ \begin{align} \nabla_x \mathcal{L} = P (x-y) + \lambda \textbf{1} = 0 \\ \nabla_\lambda \mathcal{L} = g(x) = \textbf{1}^\mathsf{T}x-t =0 \end{align} $$
Let’s now expand and solve for $x$ in (3):
$$ \begin{align} \begin{split} Px-Py+\lambda \textbf{1} =0 \\ x = P^{-1}(Py+\lambda \textbf{1}) \\ x = y+\lambda P^{-1} \textbf{1} \end{split} \end{align} $$
With (5), let’s now expand (4) and solve for $\lambda$:
$$ \begin{align} \begin{split} \textbf{1}^\mathsf{T}x-t = 0 \\ \textbf{1}^\mathsf{T}(y+\lambda P^{-1} \textbf{1})-t=0 \\ \textbf{1}^\mathsf{T}y+\lambda\textbf{1}^\mathsf{T}P^{-1} \textbf{1}-t=0 \\ \lambda = \frac{t-\textbf{1}^\mathsf{T}y}{\textbf{1}^\mathsf{T}P^{-1}\textbf{1}} \end{split} \end{align} $$
Disclaimer, we used the property $\textbf{1}^\mathsf{T}x=x^\mathsf{T}\textbf{1}$ to make our lives easier during the derivatives and expansions.
Now that modeling is done, we can translate to code. We only need the solutions of (5) and (6).
import numpy as np
import numba as nb # you can skip this line
@nb.njit # you can skip this line
def mahalanobis_rectifier(P, y, t):
n_size = len(y)
ones = np.ones((n_size, 1))
P_ones = np.linalg.solve(P, ones)
λ = (t-ones.T @ y)/(ones.T @ P_ones)
w = λ*P_ones+y
return w
That’s it!.
You can see that there is no inverse in the code, because we use solve
instead. Thus, P_ones = np.linalg.solve(P, ones)
can be replaced to np.linalg.inv(P)@ones
, but with slower performance. Also, the code uses numba, but you don’t actually need it. It’s just to make it faster since it will compile the function.
Finally, some examples of the function with input and output.
np.random.seed(0)
n_size = 2
t = 2.
r = np.random.rand(n_size)
P = np.outer(r, r)
y = np.array([[0.4, 0.9]]).T
w = mahalanobis_rectifier(P, y, t)
print("first")
print(y.ravel(), w.ravel())
P = np.eye(n_size)
w = mahalanobis_rectifier(P, y, t)
print("second")
print(y.ravel(), w.ravel())
n_size = 5
r = np.random.rand(n_size)
P = np.outer(r, r)
y = np.random.rand(n_size, 1)
y /= y.sum()*1.25
w = mahalanobis_rectifier(P, y, t)
print("third")
print(y.ravel(), w.ravel())
# result
# first
# [0.4 0.9] [ 3.4090456 -1.4090456]
# second
# [0.4 0.9] [0.75 1.25]
# third
# [0.20042673 0.21658402 0.0861788 0.17794087 0.11886958]
# [ 2.07710124 0.30703598 3.26683829 -3.66483831 0.01386279]
As sanity check, you could test that mahalanobis_rectifier(P, y, y.sum())
will be equal to y
because the value is already in the objective. The test can be all(mahalanobis_rectifier(P, y, y.sum())==y)
.