CS8850 - Advanced Machine Learning - Spring 2025

Georgia State University, Atlanta, GA

Instructor

Prof. Sergey Plis

Email: splis@gsu.edu

Office Hours: TBA

Office: 55 Park Place, Room 1821

Teaching Assistant

Mansoor Ahmed

Email: mahmed76@gsu.edu

Office Hours: TBA

Office: TBA

Course Description

Machine learning studies algorithms that build models from data for subsequent use in prediction, inference, and decision-making tasks. Although an active field for the last 60 years, the current demand as well as trust in machine learning exploded as increasingly more data become available and the problems needed to be addressed become literally impossible to program directly. In this advanced course, we will cover essential algorithms, concepts, and principles of machine learning. Along with the traditional exposition, we will learn how these principles are currently being revisited thanks to the recent discoveries in the field.

Prerequisites

Reading List

I will use parts of the following textbooks, accompanied by mandatory research papers and optional reading material that I may recommend to supplement the lectures.

Course Schedule and Resources

Below is the detailed course schedule with subtopics, lecture videos (with timestamps), and additional reading materials.

S.No Topic Subtopics Lecture Videos Reading References
1 Mathematical Foundations -- Refresher
  • Linear Algebra
  • Applied Probability
  • Differential Equations and Calculus
  • Optimization Theory
1 Foundations of Learning
  • Formal Learning Model
  • Generalization and Overfitting
  • Empirical Risk Minimization (ERM)
  • ERM with Inductive Bias
  • Bound the probability of error -- confidence and accuracy
2 PAC Learning
  • PAC Learning Framework
  • VC Dimension
  • Sample Complexity
  • Learnability Conditions
3 Linear Learning Models
  • Linear decision boundary -- binary classifier
  • Perceptron algorithm -- batch & stochastic
  • Proof of convergence
  • Inseparable case
4 Principal Component Analysis
  • Linear Regression (LR), Least Means Squares (LMS)
  • Spectral theorem, similarity transform, eigenvectors, diagonalization, spectral factorization
  • PCA -- quadratic forms, multivariate Gaussian density, Isodensity surfaces, Principal Axes Theorem
  • Eigenvectors and Eigenvalues
  • Covariance Matrix -- Diagonalizing the Covariance Matrix, KL-transform, ex. bivariate case
5 Curse of Dimensionality
  • The curse of dimensionality
  • Volume in high-dimensional space -- Stirling approximation, Hypersphere Volume
  • Ex: Gaussian distributions in high dimensional space
  • Notes: CMU, Princeton
  • Books: Bishop Chapt. 1 (pg. 33-37), Hastie Chapt. 2 (pg. 22-23), Hamming Chapt. 9 (pg. 58-66)
  • Slides: Lecture Slides
6 Bayesian Decision Theory
  • Review probability distributions, Random variables, joint and marginal probabilities
  • Bayes Theorem, Prior and Posterior Distributions
  • Decision Rules -- Continuous and Discrete Features
  • Bayesian vs Frequentist Approaches
  • General Bayesian Decision Theory
  • Maximum A Posteriori (MAP) -- conjugate priors
7 Parameter Estimation -- MLE
  • Maximum Likelihood Estimation (MLE) -- conditional independence, MLE for Bernoulli and Gaussian distributions, sample complexity & PAC learning
  • MLE and KL-divergence -- Hartley's Information and Shannon's entropy, cross-entropy, KL-divergence minimization
7 Parameter Estimation -- MAP & NB
  • Maximum A Posteriori (MAP) Estimation
  • MLE vs. MAP
  • MAP for binomial and multinomial distributions
  • Bayes rule -- AIDS test example
  • Naive Bayes classifier -- continuous and discrete features, text classification example
8 Logistic Regression
  • Naive Bayes recap -- Gaussian NB as a linear classifier, generative vs. discriminative classifiers
  • Defining Logistic Regression -- Linear Fit to Log-Odds, softmax
  • Solving Logistic Regression -- an alternative perspective on log odds, Logistic Sigmoid, MLE & Negative Log likelihood -- Taylor expansion, Newton-Raphson update for linear and logistic regression
9 Kernel Density Estimation
  • Density Estimation Basics -- Non-parametric density estimation, histogram-based, Parzen windows, smooth kernels
  • Bandwidth Selection, Bias-variance tradeoff (digression)
  • Multivariate density estimation, Product kernels, Unimodal and Bimodal distribution KDE
  • Applications of KDE
10 Support Vector Machines
  • Maximum Margin Classifier -- Bayes decision boundary, Restricted Bayes optimal classifier, Linear SVM Classifier, Linear SVM: primal formulation, problems and solutions
  • Lagrange Duality -- Karush-Kuhn-Tucker (KKT) conditions, Quadratic programming
  • Dual Formulation of SVM
  • Kernel Tricks -- Mapping to Higher Dimensions, Mercer's theorem
  • Soft Margin
11 Matrix Factorization
  • Singular Value Decomposition (SVD), Cocktail party problem
  • Independent Component Analysis (ICA) -- Linear vs statistical independence
    • Methods: projection pursuit, infomax (mutual information), and MLE
    • FastICA
  • Non-negative Matrix Factorization (NMF)
  • Dictionary Learning
  • Autoencoders
15 Stochastic Gradient Descent (SGD)
  • Gradient Descent Basics
  • Stochastic vs Batch Gradient Descent
  • Learning Rate Scheduling
  • Convergence Properties
16 k-means Clustering
  • Clustering Basics
  • Hard k-means
  • Soft k-means
  • Gaussian Mixture Models (GMM)
17 Expectation Maximization (EM)
  • EM Algorithm
  • Gaussian Mixture Models (GMM)
  • Convergence Properties
  • Applications of EM
18 Automatic Differentiation
  • Forward Mode AD
  • Reverse Mode AD
  • Backpropagation
  • Applications in Deep Learning
19 Nonlinear Embedding Approaches
  • Manifold Learning
  • t-SNE
  • UMAP
  • Applications in Visualization
20 Model Comparison I
  • Bias-Variance Tradeoff
  • No Free Lunch Theorem
  • Confusion Matrix
  • Cross-Validation
21 Model Comparison II
  • Cross-Validation and Hyperopt
  • Expected Value Framework
  • Visualizing Model Performance
  • Receiver Operating Characteristics (ROC)
22 Model Calibration
  • Calibration Techniques
  • Reliability Diagrams
  • Platt Scaling
  • Isotonic Regression
23 Convolutional Neural Networks (CNNs)
  • Building Blocks
  • Skip Connections
  • Fully Convolutional Networks
  • Semantic Segmentation
24 Word Embedding
  • Bag of Words
  • Word2Vec
  • GloVe
  • Applications in NLP

I manage this page, errors and omissions are accepted.

  • Tools and Libraries: