Data Analysis: Theory and Applications — DATA

  • Instructor: Zoltán Madari
  • Contact: amadarizoli@gmail.com
  • Prerequisites: calculus and basic probability
  • Text: Lecture notes and slides
    R scripts
    Wooldridge, J. M. (2016). Introductory econometrics a modern approach. South- Western cengage learning.

Course description:

“It is easy to lie with statistics; it is easier to lie without them.” /Frederick Mosteller/

Nowadays, thanks to digitalization, we are bombarded with vast amounts of data and information. It is often difficult to navigate through this information and decide which data and statistical values we can trust and which we cannot. The aim of this course is to provide a statistical foundation for data analysis. We will move from simple descriptive analyses to advanced regression models. An important feature of the course is that there are no black boxes. We look at each methodology with an appropriate depth of mathematical statistics background and then apply them in practice. A secondary goal of the course is to enable participants to apply the methods they have learned in an appropriate IT environment. To achieve this goal, we use R and RStudio software throughout the course. We use R not only for calculations, but also for testing various theories through simulation. R is the market- leading software for data analysis alongside Python. This allows participants to gain competitive data analysis knowledge and skills in the course. Participants are requested to bring their own laptops to the sessions.

Topics:

  • Week 1: Basic statistical concepts (Variable types, scales, data sources, ratios) + R introduction
  • Week 2: Descriptive statistics indicators and their properties + R introduction 2
  • Week 3: Data visualization in R with GGplot2 package
  • Week 4: Analysis of distributions, simulations and properties
  • Week 5: Sampling techniques (IID and its properties) – Mathematical background (laws of large numbers, Central Limit Theorem) + simulation
  • Week 6: Estimation theory in practice
  • Week 7: Hypothesis testing I – theory and practice (one sample tests)
  • Week 8: Hypothesis testing II – tests for more samples (comparing group means)
  • Week 9: Testing relation between variables (Test of independence and ANOVA)
  • Week 10: Correlation and bivariate regression in view of causality and confounding effects
  • Week 11: Multiple regression model – OLS properties, estimation and interpretation
  • Week 12: Nonlinear effects (logarithm, quadratic term) in regression models
  • Week 13: Testing the assumptions of OLS – multicollinearity and heteroskedasticity