A set of vectors \(\vec{v}_1, \vec{v}_2, \dots, \vec{v}_n\) in a vector space is linearly independent if the only solution to their linear combination being zero is when all coefficients are zero.
Example: Vectors \((1, 0)\) and \((0, 1)\) are linearly independent.
Random variables \(X_1, X_2, \dots, X_n\) are statistically independent if the joint probability is the product of individual probabilities.
Example: The outcome of two dice rolls are independent.
Key Difference: Linear independence deals with vector spaces, while statistical independence focuses on probability distributions.
The Central Limit Theorem states that the sum (or average) of independent random variables with finite mean and variance tends towards a normal distribution, regardless of the original distribution.
Example: The average of a large number of dice rolls will follow a normal distribution, even though a single roll is uniformly distributed.
Mathematical Implication: Mixed signals tend to be more Gaussian, which ICA exploits to recover independent, non-Gaussian sources.
The Central Limit Theorem implies that linear mixtures of independent, non-Gaussian variables tend to be more Gaussian.
Key Insight: When independent sources mix linearly, the result looks more Gaussian than the original sources.
Takeaway: ICA identifies independent sources by finding non-Gaussian signals within the observed mixtures, which are closer to the original, independent sources.
Linear transformations (e.g., rotations or scaling) preserve Gaussianity. Nonlinear transformations distort the data in ways that highlight deviations from Gaussianity.
Nonlinear functions like \( \tanh(u) \) or \( u^3 \) react strongly to outliers or higher-order statistics, making non-Gaussian features more prominent.
The FastICA algorithm computes expectations using these nonlinear functions, which helps it detect signals that are far from Gaussian.
Nonlinear transformations amplify the non-Gaussian properties in the data, making it easier to separate independent components.
import numpy as np
# Step 1: Center and whiten the data
def whiten(X):
X = X - np.mean(X, axis=0) # Center the data
cov = np.cov(X, rowvar=False) # Covariance matrix
eigvals, eigvecs = np.linalg.eigh(cov) # Eigen-decomposition
D = np.diag(1.0 / np.sqrt(eigvals)) # Whitening matrix
return X @ eigvecs @ D @ eigvecs.T
# Step 2: Nonlinear function for maximizing non-Gaussianity
def g(u):
return np.tanh(u) # Hyperbolic tangent nonlinearity
def g_derivative(u):
return 1 - np.tanh(u) ** 2 # Derivative of tanh
# Step 3: FastICA iteration
def fastica(X, n_components, max_iter=100, tol=1e-5):
X = whiten(X)
n_samples, n_features = X.shape
W = np.random.rand(n_components, n_features) # Initialize random weights
for i in range(max_iter):
W_new = (X.T @ g(X @ W.T)) / n_samples # Update all weights
W_new -= np.diag(np.mean(g_derivative(X @ W.T), axis=0)) @ W_new
# Decorrelate weights (orthogonalization)
W_new = np.linalg.qr(W_new)[0]
# Check for convergence
if np.max(np.abs(np.abs(np.diag(W_new @ W.T)) - 1)) < tol:
break
W = W_new
return W @ X.T # Recovered signals
# Step 4: Example usage with synthetic data
np.random.seed(0)
S = np.array([np.sin(np.linspace(0, 8, 1000)),
np.sign(np.sin(np.linspace(0, 8, 1000)))]).T
A = np.array([[1, 1], [0.5, 2]]) # Mixing matrix
X = S @ A.T # Mixed signals
# Apply FastICA
S_estimated = fastica(X, n_components=2)
# Plot results
import matplotlib.pyplot as plt
fig, axs = plt.subplots(3, 1, figsize=(8, 6))
axs[0].plot(S)
axs[0].set_title('Original Signals')
axs[1].plot(X)
axs[1].set_title('Mixed Signals')
axs[2].plot(S_estimated.T)
axs[2].set_title('Recovered Signals (FastICA)')
plt.tight_layout()
plt.show()
Find a low rank non-negative approximation to a matrix
Advantages:
Reconstruction error:
Principal Component Analysis
Independent Component Analysis
Nonnegative Matrix Factorization
Dictionary Learning