Course description:
This is an introductory course to Deep Learning.
We will cover both the theoretical aspects in the design of neural networks and how to train and use them in practice.
We will start by recalling classical machine learning tasks, and then we will proceed to studying Deep Learning.
We will present Deep Learning architectures and approaches while we learn how to use the PyTorch library.
We will also study famous models and learning algorithms such as DeepSet, ResNet, Proximal Policy Optimization (PPO)
or Generative Pre-trained Transformer (GPT). We will finish with student presentations on additional topics (see Topics).
Topics:
- Basic Machine Learning tasks: Linear regression and classification. Basic concepts: losses, gradient descent, train-test-validation split, under- and overfitting, grid and random search.
- Deep Neural Networks: Architecture (e.g., Multilayer Perceptrons, that is MLPs), activation functions, backpropagation, momentum, Adaptive Moment Estimation (Adam), gradient explosion and vanishing, initialization, dropout.
- Principles of Geometric Deep Learning: Making use of symmetries in data to boost efficiency via parameter sharing, and acquire equivariant / invariant models. Deep Sets.
- Convolutional Neural Networks (CNN): Definition as translation equivariant / invariant models, valid convolution, padding, and stride. Pooling layers. Normalization layers: batch norm, layer norm, instance norm. Skip connections. Optional: Object detection, Group Equivariant CNNs.
- Transformers: Text preprocessing, tokenization, embeddings. Multihead attention. Absolute and relative position embedding. Encoders and decoders. Pretraining and finetuning tasks. Alignment with human and AI preferences (eg. ChatGPT). Optional: Efficient transformers, LoRA, bitsandbytes.
- Deep Reinforcement Learning: Markov Decision Processes, Bellmann Equation, temporal difference learning. Policy gradient, Proximal Policy Optimization.
Final presentation topics:
- Recurrent Neural Networks (RNN): Stateful Neural Networks, Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU)
- Diffusion models.
- Graph Neural Networks (GNN)
- Metric Learning: Triplet loss, Hard negative mining, Circle loss.
- Point Cloud Learning
- Tensor deep learning: Tensor decompositions, bottlenecks
- Unbalanced datasets: Under- and oversampling, Focal loss.
- Grokking
- Mechanistic Interpretability
Note: You need to have a modern laptop.
But you do not need to have a good GPU (Graphical Processing Unit) in it: if your
laptop cannot handle a calculation, you can use for example Google Colab (https://colab.research.google.com/),
Kaggle (https://www.kaggle.com/) or CoCalc (https://cocalc.com/) for that.