| # | date | topic | description |
|---|---|---|---|
| 1 | 25-Aug-2025 | Introduction | |
| 2 | 27-Aug-2025 | Foundations of learning | Drop/Add |
| 3 | 01-Sep-2025 | Labor Day Holiday | Holiday |
| 4 | 03-Sep-2025 | Linear algebra (self-recap) | HW1 |
| 5 | 08-Sep-2025 | PAC learnability | |
| 6 | 10-Sep-2025 | Linear learning models | |
| 7 | 15-Sep-2025 | Principal Component Analysis | Project ideas |
| 8 | 17-Sep-2025 | Curse of Dimensionality | |
| 9 | 22-Sep-2025 | Bayesian Decision Theory | HW2, HW1 due |
| 10 | 24-Sep-2025 | Parameter estimation: MLE | |
| 11 | 29-Sep-2025 | Parameter estimation: MAP & NB | finalize teams |
| 12 | 01-Oct-2025 | Logistic Regression | |
| 13 | 06-Oct-2025 | Kernel Density Estimation | |
| 14 | 08-Oct-2025 | Support Vector Machines | HW3, HW2 due |
| 15 | 13-Oct-2025 | * Midterm | Exam |
| 16 | 15-Oct-2025 | Matrix Factorization | |
| 17 | 20-Oct-2025 | * Mid-point projects checkpoint | * |
| 18 | 22-Oct-2025 | k-means clustering |
| # | date | topic | description |
|---|---|---|---|
| 19 | 27-Oct-2025 | Expectation Maximization | |
| 20 | 29-Oct-2025 | Stochastic Gradient Descent | HW4, HW3 due |
| 21 | 03-Nov-2025 | Automatic Differentiation | |
| 22 | 05-Nov-2025 | Nonlinear embedding approaches | |
| 23 | 10-Nov-2025 | Model comparison I | |
| 24 | 12-Nov-2025 | Model comparison II | HW5, HW4 due |
| 25 | 17-Nov-2025 | Model Calibration | |
| 26 | 19-Nov-2025 | Convolutional Neural Networks | |
| 27 | 24-Nov-2025 | Thanksgiving Break | Holiday |
| 28 | 26-Nov-2025 | Thanksgiving Break | Holiday |
| 29 | 01-Dec-2025 | Word Embedding | |
| 30 | 03-Dec-2025 | * Project Final Presentations | HW5 due, P |
| 31 | 08-Dec-2025 | Extra prep day | Classes End |
| 32 | 10-Dec-2025 | * Final Exam | Exam |
| 34 | 17-Dec-2025 | Project Reports | due |
| 35 | 19-Dec-2025 | Grades due 5 p.m. |
If we know posteriors exactly, this the optimal strategy!
We have assumed that either
What if all that we have and know is the data?
Ooh! Howchallengingexciting!
The simplest form of non-parametric density estimation
$$\prob{P}{\vec{x}} = \frac{1}{N}\frac{\text{# of } \vec{x}^i \text{ in the same bin as }\vec{x}}{\text{bin width}}$$
What we are trying to accomplish?
We obtain a more accurate estimate increasing $N$ and shrinking $V$
\[ \prob{P}{x} \simeq \frac{k/N}{V} \mbox{, where } \begin{cases} V & \text{volume surrounding } \vec{x} \\ N & \text{total #examples}\\ k & \text{#examples inside } V \end{cases} \]
\begin{align} &\underset{n\to\infty}{\lim} V = 0\\ &\underset{n\to\infty}{\lim} k = \infty\\ &\underset{n\to\infty}{\lim} k/N = 0\\ \end{align}
- Fix $V$ and estimate $k$ - kernel density estimation (KDE)
- Fix $k$ and estimate $V$ - k-neares neighbor (kNN)
Note, Parzen window resembles histogram but with the bin location determined by the data
\[ \prob{P$_{KDE}$}{\vec{x}} = \frac{1}{Nh^d} \sum_{n=1}^N \prob{K}{\frac{\vec{x} - \vec{x}^n}{h}} \]
\[ \prob{K}{u} = \begin{cases} 1 & |u_j| \lt \frac{1}{2} \forall j = 1\dots d\\ 0 & \text{otherwise} \end{cases} \]
Subjective choice
Assuming everything is Normal
Things to watch out for
A couple of "hacks" to fix this
Use it if "hacky" solutions are not your thing