Statistics Terminology and Definitions

Kapil Pant
4 min readAug 25, 2020

--

Read this article for getting to know basics about statistics and its types along with some vocabularies used in statistics.

Pangolin is the only mammal in the world to be covered from head to toe in keratin scales (the same as human finger nails).

What is statistics?

Definition — Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies.

In Layman’s terms — Statistics involve defining a research question, and then translating that research question into a statistical statement that can be tested using data. The first step involves designing the study followed by data collection which refers to the collection of data (target population). The next step being summarizing the data, and the summaries are then used to analyse the data. The analysis can finally be used to generalize things about the target population. Finally, the informative statements are communicated to the general audience.

“There are three kinds of lies: lies, damned lies, and statistics.” — Benjamin Disraeli

Statistics can generally be broken down into two categories namely :

  • Descriptive statistics
  • Inferential statistics
Types of statistics

What is descriptive statistics?

Definition — Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures.

In Layman’s terms — Descriptive statistics is where we summarize a sample of data using plots or numeric summaries such as different types of plots such as bar plot, histograms or numeric summaries such as mean, median, standard deviation, these sort of things.

What is inferential statistics?

Definition — Inferential statistics use a random sample of data taken from a population to describe and make inferences about the population.

In Layman’s terms — Inferential statistics is where we try to infer something about a population. So here we try and use a sample to generalize to a population. It can be broken down into three sorts of areas namely estimation, hypothesis testing and prediction. Before inference we summarize or describe the data and then we can use that to do inference.

How can we explain estimation, hypothesis and prediction?

To understand the above terms, we can think of certain questions that arise under each term.

Estimation — What is the average salary of a CEO?

Hypothesis — Does a CEO who is 6 feet or taller earn more on average than one who’s not?

Prediction — For a particular CEO, what does our model estimate the salary would be?

Basic vocabulary of statistics :

  1. Unit/Subject — These are just the entities on which data is collected. It is sometimes called as person or individual.
  2. Variable — This is a recorded characteristic for a unit or for a person.
  3. Population — Population is a group of interests for a study i.e., who or what we are interested in studying.
  4. Sample — Sample is a subset of the population to study.
  5. Population parameter — The parameter is the thing or the quantity we’d like to know for the entire population.
  6. Sample statistic — The sample statistic is the estimate of the parameter from our sample (Example — We would like to know the mean or average age of the entire population, that’s our parameter. We take a sample of data, say a few hundred individuals and we calculate their mean age, that’s our sample statistic. The sample mean is our best guess at the population parameter).
  7. External validity — External validity refers to the question: Is the estimate we get from our sample generalizable to an external population? In other words, how well can our sample estimate step out and represent an external population.
  8. Internal validity — Internal validity refers to the question: Is our sample estimate or our sample statistic biased and particularly is there any confounding?

Illustration :

Suppose we’d like to know does the risk of depression decrease if someone exercises regularly? And we want to know is this true for university students?

So, we take a sample of 5000 university students from a particular university and we ask them these survey questions. Now we’d like to generalize back to the population of university students in general. So, first let’s think of what is our unit? Our unit is a university student. What are the variables we have? We recorded the variable of exercise (yes or no) and depression (yes or no). Who’s the population that we’d like to generalize to? We’d like to generalize back to the population of university students. What’s our sample? Our sample is the 5000 students from a particular university that we’ve surveyed. What’s our population parameter of interest? Population parameter can be the difference in depression rates for those who exercise vs those who don’t in the entire population. What’s our sample statistic? Our sample statistic can be the difference in depression rates for those who exercise vs those who don’t in our sample. The external validity refers to the question: Is the result from our study generalizable to the general population of university students? The internal validity refers to the question: Is the estimate within our study biased or not?

Summary

This article focuses on the simplest explanation of statistics and its types. Some of the important vocabularies of statistics are illustrated with the help of an example.

--

--

No responses yet