1 |
Mathematical Foundations -- Refresher |
- Linear Algebra
- Applied Probability
- Differential Equations and Calculus
- Optimization Theory
|
|
|
1 |
Foundations of Learning |
- Formal Learning Model
- Generalization and Overfitting
- Empirical Risk Minimization (ERM)
- ERM with Inductive Bias
- Bound the probability of error -- confidence and accuracy
|
|
|
2 |
PAC Learning |
- PAC Learning Framework
- VC Dimension
- Sample Complexity
- Learnability Conditions
|
|
|
3 |
Linear Learning Models |
- Linear decision boundary -- binary classifier
- Perceptron algorithm -- batch & stochastic
- Proof of convergence
- Inseparable case
|
|
|
4 |
Principal Component Analysis |
- Linear Regression (LR), Least Means Squares (LMS)
- Spectral theorem, similarity transform, eigenvectors, diagonalization, spectral factorization
- PCA -- quadratic forms, multivariate Gaussian density, Isodensity surfaces, Principal Axes Theorem
- Eigenvectors and Eigenvalues
- Covariance Matrix -- Diagonalizing the Covariance Matrix, KL-transform, ex. bivariate case
|
|
|
5 |
Curse of Dimensionality |
- The curse of dimensionality
- Volume in high-dimensional space -- Stirling approximation, Hypersphere Volume
- Ex: Gaussian distributions in high dimensional space
|
|
- Notes: CMU, Princeton
- Books: Bishop Chapt. 1 (pg. 33-37), Hastie Chapt. 2 (pg. 22-23), Hamming Chapt. 9 (pg. 58-66)
- Slides: Lecture Slides
|
6 |
Bayesian Decision Theory |
- Review probability distributions, Random variables, joint and marginal probabilities
- Bayes Theorem, Prior and Posterior Distributions
- Decision Rules -- Continuous and Discrete Features
- Bayesian vs Frequentist Approaches
- General Bayesian Decision Theory
- Maximum A Posteriori (MAP) -- conjugate priors
|
|
|
7 |
Parameter Estimation -- MLE |
- Maximum Likelihood Estimation (MLE) -- conditional independence, MLE for Bernoulli and Gaussian distributions, sample complexity & PAC learning
- MLE and KL-divergence -- Hartley's Information and Shannon's entropy, cross-entropy, KL-divergence minimization
|
|
|
7 |
Parameter Estimation -- MAP & NB |
- Maximum A Posteriori (MAP) Estimation
- MLE vs. MAP
- MAP for binomial and multinomial distributions
- Bayes rule -- AIDS test example
- Naive Bayes classifier -- continuous and discrete features, text classification example
|
|
|
8 |
Logistic Regression |
- Naive Bayes recap -- Gaussian NB as a linear classifier, generative vs. discriminative classifiers
- Defining Logistic Regression -- Linear Fit to Log-Odds, softmax
- Solving Logistic Regression -- an alternative perspective on log odds, Logistic Sigmoid, MLE & Negative Log likelihood -- Taylor expansion, Newton-Raphson update for linear and logistic regression
|
|
|
9 |
Kernel Density Estimation |
- Density Estimation Basics -- Non-parametric density estimation, histogram-based, Parzen windows, smooth kernels
- Bandwidth Selection, Bias-variance tradeoff (digression)
- Multivariate density estimation, Product kernels, Unimodal and Bimodal distribution KDE
- Applications of KDE
|
|
|
10 |
Support Vector Machines |
- Maximum Margin Classifier -- Bayes decision boundary, Restricted Bayes optimal classifier, Linear SVM Classifier, Linear SVM: primal formulation, problems and solutions
- Lagrange Duality -- Karush-Kuhn-Tucker (KKT) conditions, Quadratic programming
- Dual Formulation of SVM
- Kernel Tricks -- Mapping to Higher Dimensions, Mercer's theorem
- Soft Margin
|
|
|
11 |
Matrix Factorization |
- Singular Value Decomposition (SVD), Cocktail party problem
- Independent Component Analysis (ICA) -- Linear vs statistical independence
- Methods: projection pursuit, infomax (mutual information), and MLE
- FastICA
- Non-negative Matrix Factorization (NMF)
- Dictionary Learning
- Autoencoders
|
|
|
15 |
Stochastic Gradient Descent (SGD) |
- Gradient Descent Basics
- Stochastic vs Batch Gradient Descent
- Learning Rate Scheduling
- Convergence Properties
|
|
|
16 |
k-means Clustering |
- Clustering Basics
- Hard k-means
- Soft k-means
- Gaussian Mixture Models (GMM)
|
|
|
17 |
Expectation Maximization (EM) |
- EM Algorithm
- Gaussian Mixture Models (GMM)
- Convergence Properties
- Applications of EM
|
|
|
18 |
Automatic Differentiation |
- Forward Mode AD
- Reverse Mode AD
- Backpropagation
- Applications in Deep Learning
|
|
|
19 |
Nonlinear Embedding Approaches |
- Manifold Learning
- t-SNE
- UMAP
- Applications in Visualization
|
|
|
20 |
Model Comparison I |
- Bias-Variance Tradeoff
- No Free Lunch Theorem
- Confusion Matrix
- Cross-Validation
|
|
|
21 |
Model Comparison II |
- Cross-Validation and Hyperopt
- Expected Value Framework
- Visualizing Model Performance
- Receiver Operating Characteristics (ROC)
|
|
|
22 |
Model Calibration |
- Calibration Techniques
- Reliability Diagrams
- Platt Scaling
- Isotonic Regression
|
|
|
23 |
Convolutional Neural Networks (CNNs) |
- Building Blocks
- Skip Connections
- Fully Convolutional Networks
- Semantic Segmentation
|
|
|
24 |
Word Embedding |
- Bag of Words
- Word2Vec
- GloVe
- Applications in NLP
|
|
|