Data Science 1
| Date | 31 Dec 2022 - 30 Jan 2023 |
| Time | 23:00 CET - 23:00 CET |
| Level of instruction | |
| Instructor |
Adjunct Associate Professor Ann Maharaj
Prof. Delia North
Prof. Edward L. Boone
|
| Registration fee | |
This course is presented by the ISI Statistical Capacity Development Committee.
It is available for free to everyone.
The course includes an introduction to (descriptive) statistics, and modules on sampling, probability, statistical inference, experimental design, categorical data, non-parametric methods, and linear regression.
We thank Ann Maharaj (Australia), Delia North and her three collaborators (South Africa), and Edward Boone (United States) for their efforts in developing it.
Capacity building is a strategic priority for ISI, and the Statistical Capacity Development Committee has been very active in organising courses and favouring participation in events, sometimes in cooperation with other organisations. Now the Statistical Capacity Development Committee is working to provide online courses and webinars, starting with this course.
The Committee is also looking into providing sessions devoted to Questions and Answers. More information will be provided as soon as we have it available.
Data Science 1 modules
Module 1: Introduction to Statistics and Descriptive Statistics
Module 2: Probability
Module 3: Sampling
Module 4: Statistical Inference for One Population
Module 5: Statistical Inference for Two Populations
Module 6: Experimental Design and Analysis of Variance
Module 7: Analysis of Categorical Data
Module 8: Non-Parametric Methods
Module 9: Simple Linear Regression (coming soon)
Module 10: Multiple Linear Regression (coming soon)
Module 11: Time Series Analysis and Forecasting (Link to online textbook)
Data Science Modules 1+2
Data Science Modules 3+9+10
Data Sciende Modules 4-8
Introduction and Descriptive Statistics
1.1 An introduction to statistics: Basic terminology and concepts
1.2 An introduction to statistics: Types of variables and their measurement scales
1.3 Introducing descriptive statistics and its role in extracting essential information
1.4 Introducing Excel basics and its ability to perform calculations
1.5 Using Excel to determine the descriptive statistics of a data set
Probability
Introduction to Probability
- Probability 1 - Beginnings (PDF)
- Probability 2 - Sample Spaces and Events (PDF)
- Probability 3 - Relative Frequency (PDF)
- Probability 4 - Equally Likely Approach (PDF)
- Probability 5 - Properties of Probabilites (PDF)
- Probability 6 - Handy Probability Rules (PDF)
- Probability 7 - Conditional Probability (PDF)
- Probability 8 - Independent Events (PDF)
- Probability 9 - Bayes’ Theorem (PDF)
- Data Science 1 - Probability 1 - Beginnings (YouTube)
- Data Science 1 - Probability 2 - Sample Spaces and Events (YouTube)
- Data Science 1 - Probability 3 - Empirical Probability (YouTube)
- Data Science 1 - Probability 4 - Equally Likely Approach (YouTube)
- Data Science 1 - Probability 5 - Properties of Probability (YouTube)
- Data Science 1 - Probability 6 - Handy Probability Rules (YouTube)
- Data Science 1 - Probability 7 - Conditional Probability (YouTube)
- Data Science 1 - Probability 8 - Independent Events (YouTube)
- Data Science 1 - Probability 9 - Bayes' Theorem (YouTube)
Counting Methods
- Data Science 1 - Probability 10 - Counting Methods 1 (YouTube)
- Data Science 1 - Probability 11 - Counting Methods 2 - Permutations (YouTube)
- Data Science 1 - Probability 12 - Counting Methods 3 - Combinations (YouTube)
- Data Science 1 - Probability 10 - Counting Methods 1 (PDF)
- Data Science 1 - Probability 11 - Counting Methods 2 - Permutations (PDF)
- Data Science 1 - Probability 12 - Counting Methods 3 - Combinations (PDF)
Random Variables and Probability Distributions (Discrete)
- Data Science 1 - Probability 13 - Random Variables (YouTube)
- Data Science 1 - Probability 13 - Random Variables (PDF)
- Data Science 1 - Probability 14 - Probability Distributions (YouTube)
- Data Science 1 - Probability 14 - Probability Distributions (PDF)
- Data Science 1 - Probability 15 - Bernoulli Experiments (YouTube)
- Data Science 1 - Probability 15 - Bernoulli Experiments (PDF)
- Data Science 1 - Probability 16 - Binomial Experiments 1 (YouTube)
- Data Science 1 - Probability 16 - Binomial Experiments 1 (PDF)
- Data Science 1 - Probability 17 - Binomial Experiments 2 (YouTube)
- Data Science 1 - Probability 17 - Binomial Experiments 2 (PDF)
- Data Science 1 - Probability 18 - Geometric Experiments (YouTube)
- Data Science 1 - Probability 18 - Geometric Experiments (PDF)
- Data Science 1 - Probability 19 - Negative Binomial Experiment (YouTube)
- Data Science 1 - Probability 19 - Negative Binomial Experiment (PDF)
- Data Science 1 - Probability 20 - Poisson Distribution (YouTube)
- Data Science 1 - Probability 20 - Poisson Distribution (PDF)
- Data Science 1 - Probability 21 - Working With Discrete Distributions Part 1 (YouTube)
- Data Science 1 - Probability 21 - Working With Discrete Distributions Part 1 (PDF)
- Data Science 1 - Probability 22 - Working with Discrete Distributions Part 2 (YouTube)
- Data Science 1 - Probability 22 - Working with Discrete Distributions Part 2 (PDF)
Properties of Discrete Probability Distributions
- Data Science 1 - Probability 23 - Discrete Cumulative Probability Distribution (YouTube)
- Data Science 1 - Probability 24 - Graphical Representations of Probability Distributions (YouTube)
- Data Science 1 - Probability 25 - Describing Probability Distributions (YouTube)
- Data Science 1 - Probability 26 - Expectations and the Mean (YouTube)
- Data Science 1 - Probability 27 - Expectation (Discrete) (YouTube)
- Data Science 1 - Probability 28 - Variance (Discrete) (YouTube)
- Data Science 1 - Probability 29 - Discrete Distribution Summary (YouTube)
Continuous Probability Distributions
- Data Science 1 - Probability 30 - Continuous Distributions (YouTube)
- Data Science 1 - Probability 31 - Probability Density Functions (YouTube)
- Data Science 1 - Probability 32 - Expectation in Continuous Distributions (YouTube)
- Data Science 1 - Probability 33 - The Uniform Distribution (YouTube)
- Data Science 1 - Probability 34 - The Exponential Distribution (YouTube)
- Data Science 1 - Probability 35 - The Gamma Distribution (YouTube)
- Data Science 1 - Probability 36 - The Normal Distribution (YouTube)
- Data Science 1 - Probability 37 - Standard Normal Distribution (YouTube)
- Data Science 1 - Probability 38 - Inverse CDFs (YouTube)
- Data Science 1 - Probability 39 - Chi-Square Distribution (YouTube)
- Data Science 1 - Probability 40 - T-Distribution (YouTube)
- Data Science 1 - Probability 41 - F-distribution (YouTube)
Bivariate Distributions
- Data Science 1 - Probability 42 - Discrete Bivariate Distributions (YouTube)
- Data Science 1 - Probability 43 - Bivariate Continuous Distributions (YouTube)
- Data Science 1 - Probability 44 - Marginal Distributions (YouTube)
- Data Science 1 - Probability 45 - Conditional Distributions (YouTube)
- Data Science 1 - Probability 46 - Independent Random Variables (YouTube)
Expectation Rules
- Data Science 1 - Probability 47 - Expectation Rules (YouTube)
- Data Science 1 - Probability 48 - Variance Rules (YouTube)
- Data Science 1 - Probability 49 - Covariance (YouTube)
- Data Science 1 - Probability 50 - Conditional Expectation (YouTube)
- Data Science 1 - Probability 51 - Discrete Bivariate Example (YouTube)
- Data Science 1 - Probability 52 - Continuous Bivariate Example (YouTube)
The Central Limit Theorem and Sums of Random Variables
- Data Science 1 - Probability 53 - The Bivariate Normal Distribution (YouTube)
- Data Science 1 - Probability 54 - Sums of Random Variables (YouTube)
- Data Science 1 - Probability 55 - Central Limit Theorem Part 1 (YouTube)
- Data Science 1 - Probability 56 - Central Limit Theorem Part 2 (YouTube)
- Data Science 1 - Probability 57 - More on Sums of Random Variables (YouTube)
- Data Science 1 - Probability 58 - Even More on Sums of Random Variables (YouTube)
Inference from data using probability
- Data Science 1 - Probability 59 - Linking Probability and Inference Part 1 (YouTube)
- Data Science 1 - Probability 60 - Linking Probability to Inference - Part 2 (YouTube)
- Data Science 1 - Probability 61 - Likelihood Ratio (YouTube)
Sampling
- 2.1 Basic Sampling Concepts, Sampling Types, Examples (Vimeo)
- 2.2 Variance of sampling statistic, accuracy & bias (Vimeo)
- 2.3 Simple Random Sampling (Vimeo)
- 2.4 Stratified Sampling (Vimeo)
- 2.5 Cluster Sampling (Vimeo)
- 2.2 PRACTICAL DEMO: Effect of varying sample size on accuracy (Vimeo)
- 2.3 PRACTICAL DEMO: With Replacement & Without Replacement Simple Random Sampling (Vimeo)
- 2.4 PRACTICAL DEMO: Stratified Sampling vs Simple Random Sampling (Vimeo)
- 2.5 PRACTICAL DEMO: Cluster Sampling vs other sampling techniques (Vimeo)
Statistical Inference for One Population
Introduction
- Introduction by Ann Maharaj (YouTube)
Sample Statistics and Sampling Distributions
The material for this topic consists of slides and 2 videos. The videos are based on the material in the slides. Three additional Youtube videos which demonstrate the the concept of the Central Limit Theorem are available in the Additional Materials section.
- 1: 1 Sample Statistics and Distributions (YouTube)
- 2: 1Sampling Statistics and Distributions (YouTube)
- 1. Sampling Distributions (PDF)
Confidence Interval Estimation
The material for this topic consists of 5 videos and the accompanying slides. Also available is the R script containing the code for all the examples. There is also a set of exercises. Two additional YouTube videos which demonstrate the simulation of confidence intervals are available in the Additional Materials section.
- 1: 1Confidence Intervals:1 (YouTube)
- 2: 1Confidence Intervals:2 (YouTube)
- 3: 1Confidence Intervals:3 (YouTube)
- 4: 1Confidence Intervals:4 (YouTube)
- 5: 1Confidence Intervals:5 (YouTube)
- Data-Science-1-Statistical-Inference-Estimation-for-One-Population (PDF)
- 1CI-Examples-R.txt (Change the txt format to R)
Hypothesis Testing
The material for this topic consists of 10 short videos and the accompanying slides. Also available are the R scripts containing the code for all the examples. There is also a set of exercises.
- 1: 1Hypothesis Tests: 1 (YouTube)
- 2: 1Hypothesis Tests: 2 (YouTube)
- 3: 1Hypothesis Tests: 3 (YouTube)
- 4: 1Hypothesis Tests: 4 (YouTube)
- 5: 1Hypothesis Tests: 5 (YouTube)
- 6: 1Hypothesis Tests: 6 (YouTube)
- 7: 1Hypothesis Tests: 7 (YouTube)
- 8: 1Hypothesis Tests: 8 (YouTube)
- 9: 1Hypothesis Tests: 9 (YouTube)
- 10: 1Hypothesis Tests: 10 (YouTube)
- 1HYP_Example1.txt (Change the txt format to R)
- 1HYP_Example2.txt (Change the txt format to R)
- 1HYP_Example3.txt (Change the txt format to R)
- 1HYP_Example4.txt (Change the txt format to R)
- Data-Science-1-Statistical-Inference-Hypothesis-Testing-for-One-Population (PDF)
Data Sets and Exercises
Data Sets for Hypothesis Testing: Examples 2 and 4
Exercises
Data Sets for Exercises C2 and C4
- 1Exercises (PDF)
- hyp1example2-hyp1example2 (Change the txt format to .csv)
- hyp1example4-hyp1example4 (Change the txt format to .csv)
- hyp1exercise2-hyp1exercise2 (Change the txt format to .csv)
- hyp1exercise4-hyp1exercise4 (Change the txt format to .csv)
References
The proofs of the concepts that are referred to in this module, can be found in several mathematical statistics text books. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.
Sample Size Determination
The material for this topic consists of 7 videos and the accompanying slides. Also available is the R script containing the code for all the examples. There is also a set of exercises.
- 1: 1Sample Size:1 (YouTube)
- 2: 1Sample Size:2 (YouTube)
- 3: 1Sample Size:3 (YouTube)
- 4: 1Sample Size:4 (YouTube)
- 5: 1:Sample Size:5 (YouTube)
- 6: 1Sample Size:6 (YouTube)
- 7: 1Sample Size:7 (YouTube)
- Sample Sizes (PDF)
- ISS-Examples.txt (Change the txt format to R)
Additional Materials
1) The Central Limit Theorem was discussed in the topic on Sample Statistics and Sampling Distributions.The first three videos demonstrate the concept of the Central Limit Theorem.
2) Confidence interval estimation of the population mean was discussed in the topic Confidence Interval Estimation
The last two videos demonstrate the simulation of confidence intervals.
Many thanks to Ed Boone for making these videos available.
3) Estimation of the Population Total.
- Central Limit Theorem in R... Part 1 (YouTube)
- Central Limit Theorem in R... Part 2 (YouTube)
- Central Limit Theorem in R... Part 3 (YouTube)
- Confidence Intervals on the Mean in R... Via Simulation (YouTube)
- Confidence Intervals on the means in R... if( ){ }else{ } (YouTube)
- Data-Science-1-Statistical-Inference-Estimation-for-One-Population-Population-Total.pdf (PDF)
Statistical Inference for Two Populations
Introduction
- 2Statistical Inference: Introduction (YouTube)
Statistical Inference: Two Means
- 2Statistical Inference_ Means (PDF)
- 1: 2Statistical inference: Means1 (YouTube)
- 2: 2Statistical Inference: Means2 (YouTube)
- 3: 2Statistical Inference: Means3 (YouTube)
- 4: 2Statistical Inference: Means4 (YouTube)
- 5: 2Statistical Inference: Means5 (YouTube)
- 6: 2Statistical Inference: Means6 (YouTube)
- 7: 2Statistical Inference: Means7 (YouTube)
- 2Mean_Example2.txt (Change the txt format to R)
- 2Mean_Example2.txt (Change the txt format to R)
Statistical Inference: Two Variances
- 2Statistical-Inference-Variances (PDF)
- 1: 2Statistical Inference: Variances1 (YouTube)
- 2: 2Statistical Inference: Variances2 (YouTube)
- 3: 2Statistical Inference: Variances3 (YouTube)
- 2Var_Example1.txt (Change the txt format to R)
- 2Var_Example2.txt (Change the txt format to R)
Statistical Inference: Two Proportions
- 2Statistical-Inference-Proportions (PDF)
- 1: 2Statistical Inference: Proportions1 (YouTube)
- 2: 2Statistical Inference: Proportions2 (YouTube)
- 2Prop_Example.txt (Change the txt format to R)
Data Sets and Exercises
Data sets for Examples 1 and 2 on differences between means, and for Examples 1 and 2 on equality of variances; Exercises; Data sets for Exercises A1, A2, B1 and B2.
- reading.txt (Change the txt format to csv)
- skills.txt (Change the txt format to csv)
- sheets.txt (Change the txt format to csv)
- flights.txt (Change the txt format to csv)
- mercury.txt (Change the txt format to csv)
- store.txt (Change the txt format to csv)
- Exercises-Statistical-Inference-for-Two-Populations.pdf (PDF)
References
The proofs of the concepts that are referred to in this module, can be found in several mathematical statistics text books. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.
Additional Materials
Further examples on inference for population means
- Inference on Means in R... with t.test() (YouTube)
Experimental Design and Analysis of Variance
Introduction
Completely Randomised Design
Slides, R script and 5 short videos.
- Completely-Randomised-Design.pdf (PDF)
- ANOVA1_example.txt (Change the txt format to R)
- 1: Completely Randomised Design (YouTube)
- 2: Completely Randomised Design (YouTube)
- 3: Completely Randomised Design (YouTube)
- 4: Completely Randomised Design (YouTube)
- 5: Completely Randomised Design (YouTube)
Randomised Block Design
Five Videos, accompanying slides, R script
- Randomised-Block-Design.pdf (PDF)
- ANOVA2_example.txt (Change the txt format to R)
- 1: Randomised Block Design (YouTube)
- 2: Randomised Block Design (YouTube)
- 3: Randomised Block Design (YouTube)
- 4: Randomised Block Design (YouTube)
- 5: Randomised Block Design (YouTube)
2-Factor Factorial Design
Five videos, accompanying slides, R script
- 2-Factor-Factorial-Design.pdf (PDF)
- ANOVA2R_example.txt (Change the txt format to R)
- 1: 2-Factor Factorial Design (YouTube)
- 2: 2-Factor Factorial Design (YouTube)
- 3: 2-Factor Factorial Design (YouTube)
- 4: 2-Factor Factorial Design (YouTube)
- 5: 2-Factor Factorial Design (YouTube)
Data Sets and Exercises
Data sets for examples: fabric.csv, tools.csv, lifetime.cvs
Exercises; Data sets for exercises: sales.csv, tyre.csv. products.csv
- lifetime.csv (Change the txt format to csv)
- tools.txt (Change the txt format to csv)
- fabric.txt (Change the txt format to csv)
- sales.txt (Change the txt format to csv)
- products.txt (Change the txt format to csv)
- tyre.txt (Change the txt format to csv)
- Exercises-Experimental-Design.pdf (PDF)
References
The proofs of the concepts that are referred to in this module, can be found in several text books of experimental design. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.
Additional materials
Further Examples on One-Way ANOVA
- One-Way ANOVA in R... Part 1 (YouTube)
- One-Way ANOVA in R... Part 2 (TukeysHSD) (YouTube)
- One-Way ANOVA in R... Part 3 (YouTube)
- One-Way ANOVA in R... Part 4 (Assumptions) (YouTube)
Analysis of Categorical Data
Introduction
Introduction to the analysis of categorical data
- CDA_Introduction.pdf (PDF)
- CDA: Introduction (YouTube)
Categorical Data Analysis: Part 1
Six videos with accompanying slides and R scripts for:
Chi-square and Binomial tests for proportion of one population
Chi-square test and Fishers' tests for equality of proportions of two independent populations.
- CDA-Part-1.pdf (PDF)
- 1CDA: Part 1 (YouTube)
- 2CDA: Part 1 (YouTube)
- 3CDA: Part 1 (YouTube)
- 4CDA: Part 1 (YouTube)
- 5CDA: Part 1 (YouTube)
Categorical Data Analysis: Part 2
Three videos with accompanying slides and R scripts for:
Chi-square test for equality of proportions of more than two independent populations.
Chi-square test of independence.
- CDA-Part-2.pdf (PDF)
- 1CDA: Part 2 (YouTube)
- 2CDA: Part 2 (YouTube)
- 3CDA: Part 2 (YouTube)
- Chisq_Pop3.R (Change the txt format to R)
- Chisq_Ind.txt (Change the txt format to R)
Categorical Data Analysis: Part 3
Two videos with accompanying slides and R scripts for:
McName test for equality of proportions of two related populations.
Cochrane Q test for equality of proportions of more than two related populations.
- CDA-Part-3.pdf (PDF)
- 1CDA: Part 3 (YouTube)
- 2CDA: Part 3 (YouTube)
- Mcnamar_Test.txt (Change the txt format to R)
- Cochrane_Test.txt (Change the txt format to R)
Data Sets and Exercises
Data for example on Cochrane Q test
Exercises on analysis of categorical data
Data for exercise on Cochrane Q test
- Exercises-Categorical-Data-Analysis.pdf (PDF)
- machines.txt (Change the txt format to csv)
- water.txt (Change the txt format to csv)
References
The technical concepts that are referred to in this module, can be found in several statistics textbooks or more specifically textbooks on the analysis of categorical data. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.
Additional materials
Further examples on the chi-square test for proportion and the chi-square test of independence.
- Inference on a Proportion in R... using prop.test() (YouTube)
- Chi Square Test in R (YouTube)
Nonparametric Methods
Introduction
Introduction to Non-Parametric Methods
Non-Parametric Tests
Five videos and accompanying slides and R Scripts for:
Test for the Median of One Population
Tests for Equality of Two Population Medians
Tests for Equality of More Than Two Population Medians
- 1: Non-Parametric Tests (YouTube)
- 2: Non-Parametric Tests (YouTube)
- 2: Non-Parametric Tests (YouTube)
- 4: Non-Parametric Tests (YouTube)
- 5: Non-Parametric Tests (YouTube)
- Non-Parametric-Tests.pdf (PDF)
- Wilcoxon1.txt (Change the txt format to R)
- MannWhitney.txt (Change the txt format to R)
- Wilcoxon2.txt (Change the txt format to R)
- Kruskal.txt (Change the txt format to R)
- Friedman.txt (Change the txt format to R)
Data Sets and Exercises
Data sets for examples; Exercises; Data sets for exercises
- Exercises-Non-Parametric-Tests.pdf (PDF)
- drugs.txt (Change the txt format to csv)
- hwage.txt (Change the txt format to csv)
- ratings.txt (Change the txt format to csv)
- tours.txt (Change the txt format to csv)
- ihours.txt (Change the txt format to csv)
- mfsalary.txt (Change the txt format to csv)
- wfprod.txt (Change the txt format to csv)
- smjscores.txt (Change the txt format to csv)
- tyre.txt (Change the txt format to csv)
References
The technical concepts that are referred to in this module, can be found in several statistics textbooks or more specifically textbooks on the non-parametric methods. In particular, some references are listed below. Proofs of some on these concepts can also be found in various YouTube videos and on various websites on the internet.
Additional Materials
Further examples on non-parametric tests for the:
Median of one population.
Equality of medians of two independent populations.
Equality of medians of more that two independent populations.
- Non-Parametric tests for center in R (YouTube)
Time Series Analysis and Forecasting
All the relevant material for this module is available in the book: Forecasting: Principles and Practice (3rd ed)
Many thanks to Rob Hyndman and George Athanasopoulos, the authors of this book, for making their material available.
Instructors
About the instructor
Ann is an Adjunct Associate Professor in the Department of Econometrics and Business Statistics at Monash University in Melbourne, Australia where she lectured for 30 years from February 1990 until her retirement in July 2020.
Prior to that she lectured for 11 years, first in the Department of Mathematics and then in the Department of Statistics at the University of Durban-Westville in South Africa (now part of the University of KwaZulu-Natal).
Ann is an elected member of the International Statistical Institute (ISI), a member of the International Association of Statistical Computing (IASC), and a member and accredited statistician of the Statistical Society of Australia (SSA).
About the instructor
Delia is Professor in Statistics at, and Dean and Head of, the School of Mathematics, Statistics and Computer Science at the University of KwaZulu-Natal in South Africa.
She has over 30 years of experience in teaching Statistics, and has become a leading figure in Statistics Education circles, nationally and internationally (Chair: South African Statistical Association Education Committee for 17 years; Vice President: International Association of Statistics Education; Master Trainer: Maths4stats for schools).
She has received national and international awards for excellence in building capacity through advocating for studies in data analytics by holding fun activities and workshops for teachers and learners.
Delia’s contribution to this course is a Capacity Building project she lead at her school. A set of videos was made by academics from the Statistics sector.
About the instructor
Edward is Professor of Statistics at Virginia Commonwealth University in Richmond Virginia USA, and has been teaching Statistics for 17 years.
He has been very active in researchin Statistics as has a passion for teaching. Learning should be fun and complex ideas should be presented first from an intuitive approach.
He is currently the Publications Officer for the Section onStatistical Consulting of the American Statistical Association and the President Elect for the Southern Regional Council on Statistics.
