# | date | topic | description |
---|---|---|---|
1 | 22-Aug-2022 | Introduction | |
2 | 24-Aug-2022 | Foundations of learning | |
3 | 29-Aug-2022 | PAC learnability | |
4 | 31-Aug-2022 | Linear algebra (recap) | hw1 released |
05-Sep-2022 | Holiday | ||
5 | 07-Sep-2022 | Linear learning models | |
6 | 12-Sep-2022 | Principal Component Analysis | project ideas |
7 | 14-Sep-2022 | Curse of Dimensionality | hw1 due |
8 | 19-Sep-2022 | Bayesian Decision Theory | hw2 release |
9 | 21-Sep-2022 | Parameter estimation: MLE | |
10 | 26-Sep-2022 | Parameter estimation: MAP & NB | finalize teams |
11 | 28-Sep-2022 | Logistic Regression | |
12 | 03-Oct-2022 | Kernel Density Estimation | |
13 | 05-Oct-2022 | Support Vector Machines | hw3, hw2 due |
10-Oct-2022 | * Mid-point projects checkpoint | * | |
12-Oct-2022 | * Midterm: Semester Midpoint | exam | |
14 | 17-Oct-2022 | Matrix Factorization | |
15 | 19-Oct-2022 | Stochastic Gradient Descent |
# | date | topic | description |
---|---|---|---|
16 | 24-Oct-2022 | k-means clustering | |
17 | 26-Oct-2022 | Expectation Maximization | hw4, hw3 due |
18 | 31-Oct-2022 | Automatic Differentiation | |
19 | 02-Nov-2022 | Nonlinear embedding approaches | |
20 | 07-Nov-2022 | Model comparison I | |
21 | 09-Nov-2022 | Model comparison II | hw5, hw4 due |
22 | 14-Nov-2022 | Model Calibration | |
23 | 16-Nov-2022 | Convolutional Neural Networks | |
21-Nov-2022 | Fall break | ||
23-Nov-2022 | Fall break | ||
24 | 28-Nov-2022 | Word Embedding | hw5 due |
30-Nov-2022 | Presentation and exam prep day | ||
02-Dec-2022 | * Project Final Presentations | * | |
07-Dec-2022 | * Project Final Presentations | * | |
12-Dec-2022 | * Final Exam | * | |
15-Dec-2022 | Grades due |
Everyone believes in the Gaussian law of errors. The experimentalists think it is a mathematical theorem, and the mathematicians think it is an established experimental fact.
-Henri Poincaré
Consider a linear transform, ${\bf P}_{\cal B}$, and its inverse, ${\bf P}^{-1}_{\cal B}$, which map a vector back and forth between its representation in the standard basis and its representation in the basis, ${\cal B}$$$ \begin{array}{ccc} & {\bf P}_{\cal B} & \\ \vec{u} & \stackrel{\longrightarrow}{\longleftarrow} & \left[\vec{u}\right]_{\cal B}\\ & {\bf P}^{-1}_{\cal B} & \end{array} $$
Let ${\cal B}$ consist of $N$ basis vectors, $\vec{b}_1 \dots \vec{b}_N$. Since $\left[{\bf u}\right]_{\cal B}$ is the representation of $\vec{u}$ in ${\cal B}$, it follows that\[ \vec{u} = \left(\left[\vec{u}\right]_{\cal B}\right)_1 \vec{b}_1 + \left(\left[\vec{u}\right]_{\cal B}\right)_2 \vec{b}_2 + \dots \left(\left[\vec{u}\right]_{\cal B}\right)_N \vec{b}_N. \]
But this is just the matrix vector product $\vec{u} = {\bf B}\left[\vec{u}\right]_{\cal B}$
\[ \begin{array}{lcl } {\bf P}_{\cal B} &=& {\bf B}^{-1}\\ {\bf P}^{-1}_{\cal B} &=& {\bf B} \end{array} \]
Consider a linear transformation represented in the standard basis by the matrix ${\bf A}$. We seek $\left[{\bf A}\right]_{\cal B}$, i.e., the representation of the corresponding linear transformation in the basis ${\cal B}$
Let ${\bf A}$ be a representation of a transform in the standard basis and let the columns of ${\bf X}$ be the eigenvectors of ${\bf A}$ (symmetric). Then ${\bf X}$ and ${\bf X}^{\rm T}={\bf X}^{-1}$ take us back and forth between the standard basis and ${\cal X}$:\[ \begin{array}{ccc} & {\bf X}^{\rm T} & \\ {\bf u} & \stackrel{\longrightarrow}{\longleftarrow} & \left[{\bf u}\right]_{\cal X}\\ & {\bf X} & \end{array} \]
The matrix we seek maps $\left[{\bf u}\right]_{\cal X}$ into $\left[{\bf A}{\bf u}\right]_{\cal X}$:\[ \begin{array}{ccc} {\bf u} & \stackrel{{\bf A}}{\longrightarrow} & {\bf A}{\bf u}\\ \uparrow {\bf X} & & \downarrow {\bf X}^{\rm T}\\ \left[{\bf u}\right]_{\cal X} & \stackrel{\left[{\bf A}\right]_{\cal X}\;\;\;}{\longrightarrow} & \left[{\bf A}{\bf u}\right]_{\cal X} \end{array} \]
Any symmetric $N \times N$ matrix, ${\bf A}$, with $N$ distinct eigenvalues, can be factored as follows: \[ {\bf A} = {\bf X} \Lambda {\bf X}^{\rm T} \] where $\Lambda$ is $N \times N$ and diagonal, ${\bf X}$ and ${\bf X}^{\rm T}$ are $N \times N$ matrices, and the $i$-th column of ${\bf X}$ (equal to the $i$-th row of ${\bf X}^{\rm T}$) is an eigenvector of ${\bf A}$: \[ \lambda_i {\bf x}_i = {\bf A}{\bf x}_i \] with eigenvalue $\Lambda_{ii} = \lambda_i$.
Note that ${\bf x}_i$ is orthogonal to ${\bf x}_j$ when $i \neq j$: \[ \left({\bf X}{\bf X}^{\rm T}\right)_{ij} = \delta_{ij} = \left\{\begin{array}{ll}1 & {\rm if}\;\;i=j\\ 0 & {\rm otherwise.}\end{array} \right. \] In other words, ${\bf X}{\bf X}^{\rm T} = {\bf I}$. Consequently, \[ {\bf X}^{\rm T} = {\bf X}^{-1} \]
Let's rewrite ${\bf A} = {\bf X} \Lambda {\bf X}^{\rm T}$ using the definition of matrix product and the fact that $\Lambda$ is diagonal:\begin{eqnarray*} \left({\bf A}\right)_{ij} & = & \sum_{k=1}^N \left({\bf X}\right)_{ik} \Lambda_{kk} \left({\bf X}^{\rm T}\right)_{kj} \end{eqnarray*}
An $N \times K$ matrix, $X$, where the $n$-th row is the $n$-th sample of a Gaussian distributed vector random variable, ${\bf x} = \left[x, y\right]^T$For example, when $K=2$: \[ {\bf X} = \left[ \begin{array}{cc} x_1 & y_1\\ x_2 & y_2\\ \vdots & \vdots\\ x_N & y_N \end{array} \right] \]
The level surfaces of the multivariate Gaussian distribution, $G({\bf x})$, consist of those points where $G({\bf x})$ has constant value:\[ G({\bf x}) = \frac{1}{(2\pi)^{K/2} |{\bf C}|^{1/2}} e^{-\frac{1}{2}{\bf x}^{\;{\rm T}} {\bf C}^{-1} {\bf x}} = P \]
The principal axes of the ellipsoids forming the level surfaces of the multivariate Gaussian distribution are given by the eigenvectors of its covariance matrix: \[ \lambda_k {\bf w}_k = {\bf C} {\bf w}_k \] where $|\lambda_1| > |\lambda_2| > \cdots |\lambda_K|$
Question What is the distribution of the ${\bf u}$?
Answer It is the multivariate Gaussian with covariance matrix, ${\bf D}$: \[ \hat{G}({\bf u}) = \frac{1}{(2\pi)^{K/2} |{\bf D}|^{1/2}} e^{-\frac{1}{2}{\bf u}^{\;{\rm T}} {\bf D}^{-1} {\bf u}}. \]
Since the joint distribution of $u$ and $v$ can be expressed as the product of the distribution of $u$ and the distribution of $v$, we say that $u$ and $v$ are independent.
Knowing the value of $u$ tells you nothing about the value of $v$!