31 December 2022 - 30 January 2023

Data Science 1

Date 31 Dec 2022 - 30 Jan 2023
Time 23:00 CET - 23:00 CET
Level of instruction
Instructor
Adjunct Associate Professor Ann Maharaj
Prof. Delia North
Prof. Edward L. Boone
Registration fee

 

This course is presented by the ISI Statistical Capacity Development Committee. 
It is available for free to everyone.

The course includes an introduction to (descriptive) statistics, and modules on sampling, probability, statistical inference, experimental design, categorical data, non-parametric methods, and linear regression.

We thank Ann Maharaj (Australia), Delia North and her three collaborators (South Africa), and Edward Boone (United States) for their efforts in developing it.

Capacity building is a strategic priority for ISI, and the Statistical Capacity Development Committee has been very active in organising courses and favouring participation in events, sometimes in cooperation with other organisations. Now the Statistical Capacity Development Committee is working to provide online courses and webinars, starting with this course.

The Committee is also looking into providing sessions devoted to Questions and Answers. More information will be provided as soon as we have it available.

Data Science 1 modules

Module 1: Introduction to Statistics and Descriptive Statistics 
Module 2: Probability 
Module 3: Sampling 
Module 4: Statistical Inference for One Population 
Module 5: Statistical Inference for Two Populations 
Module 6: Experimental Design and Analysis of Variance 
Module 7: Analysis of Categorical Data 
Module 8: Non-Parametric Methods 
Module 9: Simple Linear Regression (coming soon) 
Module 10: Multiple Linear Regression (coming soon) 
Module 11: Time Series Analysis and Forecasting (Link to online textbook)

Data Science Modules 1+2 
Data Science Modules 3+9+10 
Data Sciende Modules 4-8

 


Introduction and Descriptive Statistics

1.1 An introduction to statistics: Basic terminology and concepts 
1.2 An introduction to statistics: Types of variables and their measurement scales 
1.3 Introducing descriptive statistics and its role in extracting essential information 
1.4 Introducing Excel basics and its ability to perform calculations 
1.5 Using Excel to determine the descriptive statistics of a data set

Go to top


 

Probability

Introduction to Probability

Counting Methods

Random Variables and Probability Distributions (Discrete)

Properties of Discrete Probability Distributions

Continuous Probability Distributions

Bivariate Distributions

Expectation Rules

The Central Limit Theorem and Sums of Random Variables

Inference from data using probability

Go to top


 

Sampling

Go to top


 

Statistical Inference for One Population

Introduction

Sample Statistics and Sampling Distributions

The material for this topic consists of slides and 2 videos. The videos are based on the material in the slides. Three additional Youtube videos which demonstrate the the concept of the Central Limit Theorem are available in the Additional Materials section.

Confidence Interval Estimation

The material for this topic consists of 5 videos and the accompanying slides. Also available is the R script containing the code for all the examples. There is also a set of exercises. Two additional YouTube videos which demonstrate the simulation of confidence intervals are available in the Additional Materials section.

Hypothesis Testing

The material for this topic consists of 10 short videos and the accompanying slides. Also available are the R scripts containing the code for all the examples. There is also a set of exercises.

Data Sets and Exercises

Data Sets for Hypothesis Testing: Examples 2 and 4

Exercises

Data Sets for Exercises C2 and C4

References

The proofs of the concepts that are referred to in this module, can be found in several mathematical statistics text books. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.

Sample Size Determination

The material for this topic consists of 7 videos and the accompanying slides. Also available is the R script containing the code for all the examples. There is also a set of exercises.

Additional Materials

1) The Central Limit Theorem was discussed in the topic on Sample Statistics and Sampling Distributions.The first three videos demonstrate the concept of the Central Limit Theorem.

2) Confidence interval estimation of the population mean was discussed in the topic Confidence Interval Estimation 
The last two videos demonstrate the simulation of confidence intervals. 
Many thanks to Ed Boone for making these videos available.

3) Estimation of the Population Total.

Go to top


 

Statistical Inference for Two Populations

Introduction

Statistical Inference: Two Means

Statistical Inference: Two Variances

Statistical Inference: Two Proportions

Data Sets and Exercises

Data sets for Examples 1 and 2 on differences between means, and for Examples 1 and 2 on equality of variances; Exercises; Data sets for Exercises A1, A2, B1 and B2.

References

The proofs of the concepts that are referred to in this module, can be found in several mathematical statistics text books. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.

Additional Materials

Further examples on inference for population means

Go to top


 

Experimental Design and Analysis of Variance

Introduction

Completely Randomised Design

Slides, R script and 5 short videos.

Randomised Block Design

Five Videos, accompanying slides, R script

2-Factor Factorial Design

Five videos, accompanying slides, R script

Data Sets and Exercises

Data sets for examples: fabric.csv, tools.csv, lifetime.cvs  
Exercises; Data sets for exercises: sales.csv, tyre.csv. products.csv

References

The proofs of the concepts that are referred to in this module, can be found in several text books of experimental design. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.

Additional materials

Further Examples on One-Way ANOVA

Go to top


 

Analysis of Categorical Data

Introduction

Introduction to the analysis of categorical data

Categorical Data Analysis: Part 1

Six videos with accompanying slides and R scripts for:
Chi-square and Binomial tests for proportion of one population
Chi-square test and Fishers' tests for equality of proportions of two independent populations.

Categorical Data Analysis: Part 2

Three videos with accompanying slides and R scripts for:
Chi-square test for equality of proportions of more than two independent populations.
Chi-square test of independence.

Categorical Data Analysis: Part 3

Two videos with accompanying slides and R scripts for:
McName test for equality of proportions of two related populations.
Cochrane Q test for equality of proportions of more than two related populations.

Data Sets and Exercises

Data for example on Cochrane Q test
Exercises on analysis of categorical data
Data for exercise on Cochrane Q test

References

The technical concepts that are referred to in this module, can be found in several statistics textbooks or more specifically textbooks on the analysis of categorical data. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.

Additional materials

Further examples on the chi-square test for proportion and the chi-square test of independence.

Go to top


 

Nonparametric Methods

Introduction

Introduction to Non-Parametric Methods

Non-Parametric Tests

Five videos and accompanying slides and R Scripts for:
Test for the Median of One Population
Tests for Equality of Two Population Medians
Tests for Equality of More Than Two Population Medians

Data Sets and Exercises

Data sets for examples; Exercises; Data sets for exercises

References

The technical concepts that are referred to in this module, can be found in several statistics textbooks or more specifically textbooks on the non-parametric methods. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.

Additional Materials

Further examples on non-parametric tests for the:
Median of one population.
Equality of medians of two independent populations.
Equality of medians of more that two independent populations.

Go to top


 

Time Series Analysis and Forecasting

All the relevant material for this module is available in the book: Forecasting: Principles and Practice (3rd ed)
Many thanks to Rob Hyndman and George Athanasopoulos, the authors of this book, for making their material available.

Go to top


 

Instructors

Ann Maharaj
Instructor
Adjunct Associate Professor Ann Maharaj

About the instructor

Ann is an Adjunct Associate Professor in the Department of Econometrics and Business Statistics at Monash University in Melbourne, Australia where she lectured for 30 years from February 1990 until her retirement in July 2020. 

Prior to that she lectured for 11 years, first in the Department of Mathematics and then in the Department of Statistics at the University of Durban-Westville in South Africa (now part of the University of KwaZulu-Natal). 

Ann is an elected member of the International Statistical Institute (ISI), a member of the International Association of Statistical Computing (IASC), and a member and accredited statistician of the Statistical Society of Australia (SSA).

Prof. Delia North
Instructor
Prof. Delia North

About the instructor

Delia is Professor in Statistics at, and Dean and Head of, the School of Mathematics, Statistics and Computer Science at the University of KwaZulu-Natal in South Africa. 

She has over 30 years of experience in teaching Statistics, and has become a leading figure in Statistics Education circles, nationally and internationally (Chair: South African Statistical Association Education Committee for 17 years; Vice President: International Association of Statistics Education; Master Trainer: Maths4stats for schools). 

She has received national and international awards for excellence in building capacity through advocating for studies in data analytics by holding fun activities and workshops for teachers and learners.

Delia’s contribution to this course is a Capacity Building project she lead at her school. A set of videos was made by academics from the Statistics sector.

Prof. Edward L. Boone
Instructor
Prof. Edward L. Boone

About the instructor

Edward is Professor of Statistics at Virginia Commonwealth University in Richmond Virginia USA, and has been teaching Statistics for 17 years. 

He has been very active in researchin Statistics as has a passion for teaching. Learning should be fun and complex ideas should be presented first from an intuitive approach. 

He is currently the Publications Officer for the Section onStatistical Consulting of the American Statistical Association and the President Elect for the Southern Regional Council on Statistics.