A Minimal Book Example
1
Introduction
2
What is CS181?
2.1
Why Is AI a Big Deal?
2.1.1
But Is Accuracy Enough?
2.1.2
What Happens When Machine Learning Models are Catastrophically Wrong?
2.1.3
Are Machine Models Right for the Right Reasons?
2.1.4
What is the Role of the Human Decision Maker?
2.1.5
What are the Broader Impacts of Tech?
2.2
Machine Learning is Much More Than Accuracy
2.3
What is CS181?
2.4
What We are Offering You
2.5
What We are Asking From You
2.6
Grading & Evaluation
3
What is Regression?
3.1
What is Machine Learning?
3.2
What is Regression?
3.3
(Almost) Everything is Linear Regression
3.4
What is Model Evaluation?
3.5
What is Model Critique?
3.6
Limitations and Connections
4
What are Probablistic and Non-Probablistic Regression?
4.1
What is Probabilistic Regression?
4.2
(Almost) Everything is Linear Regression
4.3
The Cube: A Model Comparison Paradigm
5
What Matters in ML Besides Prediction?
5.1
What is Machine Learning? Revisited
5.2
What Are We Uncertain About?
5.3
Where is Uncertainty Coming From?
5.4
How Do We Compute Uncertainty?
5.5
Mathematizing Uncertainty: Starting with Bias and Variance
5.6
The Bias-Variance Trade-off in Machine Learning
5.6.1
Examples of the Bias-Variance Trade-off
6
What is Logistic Regression?
6.1
Logistic Regression and Soft-Classification
6.2
Logistic Regression and Bernoulli Likelihood
6.3
How to Perform Maximum Likelihood Inference for Logistic Regression
6.4
How (Not) to Evaluate Classifiers
6.5
How to Interpret Logistic Regression
7
How Do We Responsibly Use Conditional Models?
7.1
Everything We’ve Done So Far in Probabilistic ML
8
Case Study: Responsibly Using Logistic Regression
8.1
Case Study: Machine Learning Model for Loan Approval
8.1.1
The Big Vague Question
8.1.2
The Concrete and Rigorous Process of Post-Inference Analysis of Machine Learning Models
9
The Math of Training and Interpreting Logistic Regression Models
9.1
The Math of Convex Optimization
9.1.1
Convexity of the Logistic Regression Negative Log-Likelihood
9.2
Important Mathy Details of Gradient Descent
9.2.1
Does It Converge?
9.2.2
How Quickly Can We Get There?
9.2.3
Does It Scale?
9.3
Interpreting a Logistic Regression Model: Log-Odds
10
What are Neural Networks?
10.1
Neural Network as Universal Function Approximators
10.2
Neural Networks as Regression on Learned Feature Map
10.3
Everything is a Neural Network
10.3.1
Architecture Zoo
10.3.2
ChatGPT
10.3.3
Stable Diffusion
10.4
Neural Network Optimization
10.5
Bias-Variance Trade-off for Neural Networks
10.6
Interpretation of Neural Networks
10.6.1
Example 1: Can Neural Network Models Make Use of Human Concepts?
10.6.2
Example 2: Can Neural Network Models Learn to Explore Hypothetical Scenarios?
10.6.3
Example 3: A Powerful Generalization of Feature Importance for Neural Network Models
10.7
The Difficulty with Interpretable Machine Learning
10.7.1
Example 4: Not All Explanations are Created Equal
10.7.2
Example 5: Explanations Can Lie
10.7.3
Example 6: The Perils of Explanations in Socio-Technical Systems
11
The Math and Interpretation of Neural Network Models
11.1
Neural Networks Regression
11.1.1
Why It’s Hard to Differentiate a Neural Network
11.1.2
Differentiating Neural Networks: Backpropagation
11.2
Interpreting Neural Networks
11.2.1
Example 1: Can Neural Network Models Make Use of Human Concepts?
11.2.2
Example 2: Can Neural Network Models Learn to Explore Hypothetical Scenarios?
11.2.3
Example 3: A Powerful Generalization of Feature Importance for Neural Network Models
11.2.4
Example 4: The Perils of Explanations
11.3
Neural Network Models and Generalization
12
The Math Behind Bayesian Regression
12.1
Bayesian Linear Regression
12.2
Bayesian Linear Regression over Arbitrary Bases
13
Bayesian Modeling Framework
13.1
Components of Machine Learning Reasoning
13.2
Bayesian Modeling Paradigm
14
Bayesain vs Frequentist Inference?
14.1
The Bayesian Modeling Process
14.2
Bayesian vs Frequentist Inference
15
The Math of Posterior Inference
15.1
The Bayesian Modeling Process
15.2
Point Estimates from the Posterior
15.2.1
Comparison of Posterior Point Estimates and MLE
15.2.2
Law of Large Numbers for Bayesian Inference
15.3
Bayesian Logistic Regression
16
What’s Hard About Sampling?
16.1
Bayesian vs Frequentist Inference
16.2
What is Sampling and Why do We Care?
17
The Math of Principal Component Analysis
17.1
PCA as Dimensionality Reduction to Maximize Variance
17.1.1
Finding a Single PCA Component
17.2
PCA as Dimensionality Reduction to Minimize Reconstruction Loss
17.3
A Latent Variable Model for PCA
17.3.1
One Principle Component
17.4
Autoencoders and Nonlinear PCA
17.5
A Probabilistic Latent Variable Model for PCA
18
The Math of Expectation Maximization
19
Motivation for Latent Variable Models
19.1
Latent Variable Models
19.1.1
Example: Gaussian Mixture Models (GMMs)
19.1.2
Item-Response Models
19.1.3
Example: Factor Analysis Models
19.2
Maximum Likelihood Estimation for Latent Variable Models: Expectation Maximization
19.3
The Expectation Maximization Algorithm
19.4
Monotonicity and Convergence of EM
20
Review of Latent Variables, Compression and Clustering
20.0.1
Example: Gaussian Mixture Models (GMMs)
20.0.2
Example: Item-Response Models
20.0.3
Example: Factor Analysis Models
20.1
PCA Versus Probabilistic PCA (pPCA)
20.1.1
What to Know About Expectation Maximization
20.2
Non-Probabilistic Clustering Versus Probabilistic Clustering
21
Topic Models
21.1
Our First Latent Variable Model
21.2
Reasoning About Text Corpa Using Topic Modeling
21.3
Our Second Latent Variable Model: pLSA
21.4
Our Third Latent Variable Model: LDA
22
Math and Intuition of Hidden Markov Models
22.1
Markov Models
22.1.1
Transition Matrices and Kernels
22.1.2
Applications of Markov Models
22.2
Hidden Markov Models
22.3
Learning and Inference for HMMs
23
The Intuition of Markov Decision Processes
23.1
Review: Modeling Sequential Data
23.1.1
Why Model Sequential Data (Dynamics)?
23.2
Modeling Sequential Data and Sequential Actions
23.2.1
Describing a Dynamic World
23.2.2
Acting in a Dynamic World
23.2.3
Describing Worlds as MDP’s
23.3
Modeling Sequential Decisions: Planning
23.3.1
Modeling Action Choice
23.3.2
Modeling Cumulative Reward
23.3.3
Planning: Optimizing Action Choice for Cumulative Reward
Published with bookdown
Notes for CS181: Machine Learning
Chapter 18
The Math of Expectation Maximization