AM207 - Stochastic Methods for Data Analysis, Inference and Optimization

Logo

This is a introductory graduate course on probabilistic modeling and inference for machine learning.

Course Syllabus

What is this Course?

The aim of tThe aim of this course is to help students develops skills for computational research with focus on stochastic approaches, emphasizing implementation and examples. Stochastic methods make it feasible to tackle very diverse problems when the solution space is too large to explore systematically, or when microscopic rules are known, but not the macroscopic behavior of a complex system. Methods are illustrated with examples from a wide variety of fields, like demography, health-care, and finance. We tackle Bayesian methods of data analysis as well as various stochastic optimization methods. Topics include stochastic optimization such as stochastic gradient descent (SGD) and simulated annealing, Bayesian data analysis, Markov chain Monte Carlo (MCMC), and variational analysis. In this course we also study the broader social impact of statistical models and algorithms when deployed in real-life downstream applications. While the technical content of this course connects theory with implementation/engineering, students are also required to connect the technical materials to downstream tasks, especially focusing on assessing potential negative real-life impacts. 

Learning Outcomes

After successful completion of this course, you will be able to:

  1. Build basic Bayesian and non-Bayesian statistical models for continuous, ordinal, categorical and sequential data
  2. Learn point estimates of model parameters using stochastic optimization methods
  3. Perform inference on models using sampling methods as well as variational inference approaches
  4. Evaluate the effectiveness of your inference methods
  5. Evaluate the usefulness/appropriateness of your models
  6. Implement inference methods from scratch in python
  7. Build statistical models and perform inference using python libraries
  8. Think broadly and critically about the entire modeling pipeline: from data collection to assessing broader downstream impact

General Information:

This course follows a flipped classroom structure. The lectures are pre-recorded and made available at the beginning of each week. Students are expected to watch the relevant lecture videos and study the lecture materials before the class meeting. For each video, there is a concept quiz for you to check your understanding - the quizzes will be graded holistically.

For each lecture, there are two alternate time slots: TTh 09:45 AM - 11:00 AM and TTh 2:15 PM - 3:30 PM. Students should register for only one time slot.

Each class meeting will consists of 1) a discussion portion where students discuss the materials that they had studied, and 2) a practical exercise portion where students work in small teams on a coding or qualitative analysis exercise applying the concepts from lecture/readings to a small example. Students are expected to actively participate in both the class discussion as well as the practical exercise.

You will not be able to complete the exercise if you do not study the lecture videos and materials before class!

The in-class practical exercises will be collected at the end of each class and graded for effort. 

There will be 9 weekly individual assignments and a team project. All assignments (including the project) will emphasize both the mastery of theoretical concepts as well as python implementation. There will be a Canvas website for this course, assignments, lecture notes and all course related information/announcements will be posted online. Regular class attendance and participation is essential for this course and is expected of all students.

Course Schedule:

Course schedule

Course Materials:

You recommended to get the textbook Bayesian Data Analysis by Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin (3rd Edition). We will be using this text as a reference for statistical modeling but the course will not follow the content or structure of the text faithfully! In addition to readings from the text book, relevant reference papers will be recommended for topics. To complete the assignments, you must either install (on your own machine) Jupyter Notebook with python3.7or familiarize yourself with Deepnote.

Grading:

Your lowest homework grade will be down-weighted by half. 

The In-class Participation grade is based both on your grade for the in-class exercises as well as your contribution to the small group discussions during class. Students are expected to actively engage with their discussion groups: both sharing their own ideas as well as proactively soliciting ideas from team members.

Homework:

Homework will be assigned weekly. You are welcome to seek help on the individual homework assignments from other students, your TFs and your instructor. While collaboration is encouraged, copy is strictly forbidden. Submissions that are highly similar will be flagged and all such submission may be returned ungraded.

Late submission policy: Each student is allowed 3 late days over the semester to be applied to any one or two homework. Outside of these allotted late days, late homework will not be accepted. Homework, like all assignments in this class, will be graded for correctness as well as clarity of exposition and presentation (a “right” answer by itself without an explanation or is presented with a difficult to follow format will receive no credit).

Project:

During the semester, you will work on a project reading, understanding and implementing a model or inference method from a staff approved research paper. The deliverable is a Jupyter notebook tutorial containing a summary of the main ideas of the paper (with concrete pedagogical examples) and code implementing the main methods of the paper. You must work in a team of size 2 to 4 people.

Details on the course project.

Expectations and Policies:

Attendance, Participation and Timeliness: Since we believe that learning is facilitated by discussion, debate and the free exchange of ideas. We expect students to attend all class meetings and participate actively during small group discussions. We also expect students to engage with the teaching staff and their peers during class and during office hours: proactively soliciting feedback, sharing their own perspectives/expertise as well as encouraging others in their groups to participate in the discussion.

All meetings (class meetings and OHs) will start promptly at the times listed, as such, we expect students to arrive on time to meetings. Entering the classroom late is distracting and can be disruptive and you will most likely miss important announcements. Please plan for sufficient travel time between your classes and if you anticipate that you will be arriving late to any meeting please let the teaching staff know before hand.

Respect for Diversity: It is the mission of the teaching staff that students from all diverse backgrounds and perspectives be well served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. We aim to create a learning environment that is inclusive and respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions for how to better our classroom community are always encouraged and appreciated.

As a large part of this course requires students to work in groups, in alignment with our teaching mission, we ask that students explicitly reflect on and implement practices for building teams that are diverse along many axes. Students who enroll in AM207 traditionally come from a wide range of technical, cultural and other demographic backgrounds, we hope that each student group can benefit from these diverse perspectives and experiences. The teaching staff is happy to help you brainstorm how to create an inclusive and productive working culture for your team.

Help for the Course:

Office Hours & Office Hour Policy:

There will be two weekly instructor office hours for the course. Please feel free to take full advantage of my office hours. If you wish to meet with me outside of office hours please contact me via email or speak to me in person. 

In addition to instructor office hours, there will be at least one TF office hours each day, Friday through Wednesday.

Overall, there will be at least two office hours per day (except for on Thursdays). 

The office hours are themed, focusing on different aspects of the homework completion process, with earlier office hours devoted to understanding background concepts, setting up problems and office hours closer to the due date devoted to interpretation, broader impact analysis:

  1. Friday OHs: focus on understanding materials from the week

  2. Saturday & Sunday OHs: focus on background concepts and homework problem setup

  3. Monday & Tuesday OHs: focus on trouble-shooting and interpretation

  4. Wednesday OHs: focus on interpretation and broader impact analysis

Questions that are not within the scope of the focus of the OH will be given lower priority and will be answered only as time allows (e.g. questions about how to set-up homework problems will be given low priority during Wednesday office hours). 

To maximize the benefit you get form office hours, we are requiring students to submit their questions (anonymously if you so wish) prior to each office hour. The staff can then structure their answers in the most productive and pedagogical way that they can see. For this reason, office hours are not drop-in. Students should arrive on time at the beginning of each office hour, drop-in questions and questions that were not previously submitted on Piazza will be given lower priority and answered only as time allows.

Piazza & Piazza Policy:

There will be a course Piazza, where students are encouraged to discuss their questions and ideas about the course material. Discussions will be moderated by the teaching staff, but the staff is not in charge of answering Piazza questions! To get your questions answered by the teaching staff you must attend office hours or make a separate appointment.

Email Policy:

The best way to get help directly from the teaching staff is through office hours and individual appointments. Due to the large size of the class, we ask that students please do not email staff with content or specific grading questions - technical questions regarding class materials, homework assignments, projects or questions about grading details (e.g. “what does lambda mean in Question 2 of Problem 1?”, “why did I get 0.5 points deducted here?”). I welcome urgent questions about class policy, grading and logistics (e.g. absences, catastrophic tech failures, your TF has accidentally given you a zero for an assignment you submitted etc), these questions can be directly submitted via email to the instructor.

Grading Questions:

The grading scheme for every component of the course is formative, that is, your grade is not the result of adding up a bunch of little points that can be deducted for minor mistakes/omissions. Rather, you should treat your grade like a general signal beacon (what did you do well on, where can you improve). As such, grades are not appealable. In fact, it is actually not worth your time and generally not beneficial to your final course grade to argue for every point. If you feel, however, that a significant grading mistake has been made on an assignment you can request to having the grading TF reevaluate your work. Such requests should be sent directly to the instructor.