# | date | topic | description |
---|---|---|---|
1 | 22-Aug-2022 | Introduction | |
2 | 24-Aug-2022 | Foundations of learning | |
3 | 29-Aug-2022 | PAC learnability | |
4 | 31-Aug-2022 | Linear algebra (recap) | hw1 released |
05-Sep-2022 | Holiday | ||
5 | 07-Sep-2022 | Linear learning models | |
6 | 12-Sep-2022 | Principal Component Analysis | project ideas |
7 | 14-Sep-2022 | Curse of Dimensionality | hw1 due |
8 | 19-Sep-2022 | Bayesian Decision Theory | hw2 release |
9 | 21-Sep-2022 | Parameter estimation: MLE | |
10 | 26-Sep-2022 | Parameter estimation: MAP & NB | finalize teams |
11 | 28-Sep-2022 | Logistic Regression | |
12 | 03-Oct-2022 | Kernel Density Estimation | |
13 | 05-Oct-2022 | Support Vector Machines | hw3, hw2 due |
10-Oct-2022 | * Mid-point projects checkpoint | * | |
12-Oct-2022 | * Midterm: Semester Midpoint | exam | |
14 | 17-Oct-2022 | Matrix Factorization | |
15 | 19-Oct-2022 | Stochastic Gradient Descent |
# | date | topic | description |
---|---|---|---|
16 | 24-Oct-2022 | k-means clustering | |
17 | 26-Oct-2022 | Expectation Maximization | hw4, hw3 due |
18 | 31-Oct-2022 | Automatic Differentiation | |
19 | 02-Nov-2022 | Nonlinear embedding approaches | |
20 | 07-Nov-2022 | Model comparison I | |
21 | 09-Nov-2022 | Model comparison II | hw5, hw4 due |
22 | 14-Nov-2022 | Model Calibration | |
23 | 16-Nov-2022 | Convolutional Neural Networks | |
21-Nov-2022 | Fall break | ||
23-Nov-2022 | Fall break | ||
24 | 28-Nov-2022 | Word Embedding | hw5 due |
30-Nov-2022 | Presentation and exam prep day | ||
02-Dec-2022 | * Project Final Presentations | * | |
07-Dec-2022 | * Project Final Presentations | * | |
12-Dec-2022 | * Final Exam | * | |
15-Dec-2022 | Grades due |
Independent random variables: \begin{align} \prob{P}{X,Y} &= \prob{P}{X}\prob{P}{Y}\\ \prob{P}{X|Y} &= \prob{P}{X} \end{align}
Conditionally independent:
$$\prob{P}{X,Y|Z} = \prob{P}{X|Z}\prob{P}{Y|Z}$$ Knowing $Z$ makes $X$ and $Y$ independent
- Examples:
- Dependent: shoe size and reading skills in kids
- Conditionally Independent: shoe size and readnig skills given age
Storks deliver babies: Highly statistically significant correlation ($p=0.008$) exists between stork populations and human birth rates across Europe
London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving.
I have a coin, if I flip it, what's the probability it will fall with head up?
The estimated probability is $\frac{3}{5}$. "Frequency of heads"
The estimated probability is $\frac{3}{5}$. "Frequency of heads"
Data $D = $ $D = \{x_i\}_{i=1}^n, x_i \in \{\text{H}, \text{T}\}$$\prob{P}{\text{Heads}} = \theta, \prob{P}{\text{Tails}} = 1-\theta$
Flips are i.i.d.:
- Independent events
- Identically distributed according to Bernoulli distribution
MLE: Choose $\theta$ that maximizes the probability of observed data
MLE: Choose $\theta$ that maximizes the probability of observed data
$J(\theta) = \theta^{\alpha_H} (1-\theta)^{\alpha_T}$
MLE: Choose $\theta$ that maximizes the probability of observed data
That's exactly "Frequency of heads"
The estimated probability is $\frac{3}{5}$. "Frequency of heads"
Which estimator should we trust more?
Hoeffding's inequality:
\begin{align} \prob{P}{|\hat{\theta} - \theta^*| \ge \epsilon} \le 2e^{-2n\epsilon^2} \end{align}
The estimated probability is $\frac{3}{5}$. "Frequency of heads"
Machine Learning is the study of algorithms that
- improve their performance
- at some task
- with experience
Let us try Gaussians...\begin{align} \prob{p}{x|\mu,\sigma} &= \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} = {\cal N}_x(\mu, \sigma) \end{align}
\begin{align} \hat{\mu}_{MLE} &= \frac{1}{n} \displaystyle\sum_{i=1}^n x_i\\ \hat{\sigma}^2_{MLE} &= \frac{1}{n} \displaystyle\sum_{i=1}^n (x_i - \hat{\mu}_{MLE})^2\\ \end{align}
MLE for $\sigma^2$ of a Gaussian is biased: expected result of estimation is not the true parameter! $$\hat{\sigma}^2_{unbiased} = \frac{1}{n-1} \displaystyle\sum_{i=1}^n (x_i - \hat{\mu}_{MLE})^2$$
The only function which satisfies these requirements: \[ \ell \log(s) = \log(s^\ell) \]
Let $X$ be a discrete random variable with $n$ outcomes, $\{x_1,...,x_n\}$. The probability that the outcome will be $x_i$ is $p(x_i)$. Theaverage information (orentropy ) contained in a message about the outcome of $X$ is:
\[ H_p = -\sum_{i=1}^n p_X(x_i) \log p_X(x_i) \]
\[ H_{p,q} = -\sum_{i=1}^n p_X(x_i) \log q_X(x_i) \]
\[ D_{\rm KL} (P\|Q) = \int P(x) \log \frac{P(x)}{Q(x)} \]
\[ D_{\rm KL} (P\|Q) = \EE_{X\sim P} \left[ \log \frac{P(x)}{Q(x)} \right] \]