BSM

Course description:

This is an introductory course to Deep Learning. We will cover both the theoretical aspects in the design of neural networks and the practical aspects of their training and evaluation. For the latter, we will use the PyTorch library, which will be introduced as we go. We will warm up by teaching a precursor model, logistic regression, simple image and text classification tasks. Then after a brief introduction to Reinforcement Learning, we shall study Multilayer Perceptrons, the simplest Deep Learning Architecture. Afterwards, we shall introduce the general approach of Geometric Deep Learning and investigate various realizations of it such as Deep Sets, Convolutional Neural Networks and Transformers, the current architecture of choice for most Large Language Models. We will finish with student presentations on additional topics (see below).

Topics:

Basic Machine Learning: Linear regression and classification, dataset splits, metrics, losses, stochastic gradient descent.
Evaluation and hyperparameter tuning: Confidence intervals, ensemble methods, early stopping, grid search.
Handling Categorical and Sequential Data: Tokenization, Latent Semantic Analysis, embeddings.
Reinforcement Learning: Markov Decision Processes, Bellmann Equation, dynamic programming, Monte Carlo methods, Q-Learning.
Multilayer Perceptrons and Optimizers: Initialization, momentum, Adaptive Moment Estimation (Adam).
Deep Reinforcement Learning: Policy Gradient Theorem, REINFORCE, Deep Q-Network.
Introduction to Geometric Deep Learning: Symmetries, invariant and equivariant maps, deep sets, graph equivariant networks.
Convolutional Neural Networks: Convolution, batch and layer normalization, dropout, residual connections.
Transformers: Attention, positional encodings, Masked Language Modeling, Next Token Prediction, alignment with human and AI preferences (eg. ChatGPT).

Final presentation topics:

Part of class work is to develop a topic and give an oral and a written presentation of it. This can be presenting some area of Deep Learning not covered in class, for example:

Unbalanced datasets: Under- and oversampling, Focal loss.
Diffusion models.
Graph Neural Networks (GNN)
Point Cloud Learning
Decision Transformers
Proximal Policy Optimization (PPO)
Rainbow: Combining Improvements in Deep Reinforcement Learning
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
Direct Preference Optimization
Grokking
Mechanistic Interpretability
Recurrent Neural Networks (RNN): Stateful Neural Networks, Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU)
Bring Your Own Topic! It can be another area of Deep Learning or even independent research: I would be happy to help you get started on your own Deep Learning project.

Note: You need to have a modern laptop. But you do not need to have a good GPU (Graphical Processing Unit) in it: to speed up calculations, you can use for example Google Colab (https://colab.research.google.com/), Kaggle (https://www.kaggle.com/) or CoCalc (https://cocalc.com/).

Introduction to Deep Learning — DML