Dangers of combining averages

28 January 2022
Dangers of combining averages

Misinterpreting statistics is not a rare phenomenon and happens everywhere in the world. Very often it happens by the use of averages, as many understand statistics as the “science of averages,” but also with use of percentages, indices and other statistics.

It is an opportunity to recall Galton’s words about the charm of statistics:

It is difficult to understand why statisticians commonly limit their inquiries to Averages, and do not revel in more comprehensive views….. An Average is but a solitary fact, whereas if a single other fact be added to it, an entire Normal Scheme, which nearly corresponds to the observed one, starts potentially into existence.

        Sir Francis Galton, Natural Inheritance (1889), 62-3

Misinterpreting statistics can be done by journalists, various citizens’ associations, and independent researchers. We hope this would never be done by professional statistical organizations, but unfortunately sometimes they may also contribute to the problem.

The previous two posts from this blog (from 8 and 19 October last year) reminded me of one such event, which happened in Bosnia and Herzegovina less than a month before the posts were written.

It was a press release of the Institute for Statistics of Federation Bosnia and Herzegovina (FIS)[1] in which the average monthly consumption expenditure per household in Federation of Bosnia and Herzegovina[2]  was presented. 

In order to bring the concept of household consumption expenditure closer, we will say that it consists of the costs incurred by a resident household on individual goods and services that are used to satisfy their needs or wants. These expenditures cover the things like of food and beverages, clothing and footwear, leisure and restaurants, but also needs such as housing, water, electricity, transportation, health, education and other expenditures not covered by social transfers of the state. Also covered are goods and services the households produce and consume themselves (for example, self produced vegetables from a garden consumed by the household). An estimate of the household final consumption expenditure is calculated as a monetary value (in local currency) of all goods and services the household spent money on for a specific period of time (usually a month or a year).

In this press release, a number for an “average household” was simply constructed by the combination of (only) two averages: the average consumption per household and the average household size. The statistical institute reported that “…the average monthly household consumption with an average number of members (3.0 members) in the Federation of Bosnia and Herzegovina in 2015 amounted to 1,508.04 BAM.” At first glance, there does not appear to be anything wrong with the information presented in such a way. However, there is a problem with this way of creating and interpreting such statistics, which I will explain in following paragraphs.

What kind of problem does this create for those interpreting the statistics?

Although it cannot be concluded from this press release that FIS directly misinterpreted the results of the survey, it is quite certain that the way the results were delivered to the media led interpreters of the statistics (first the media themselves, and then their readers) to the wrong conclusion – equating the reported quantity with the consumption averaged over all households in the population, which are not the same!  The combining of two correct survey estimates – the average monthly consumption per household (1,508.04 BAM) and the average household size (3.0 members) to create the third reported estimate – the average monthly consumption of a three-member household (1,636.25 BAM) – which was then wrongly interpreted by several news agencies as an average household consumption.  The mistake stemmed from the statistical institute mixing two populations: the population of all households and the population of three-member households. The quantity for the latter reported in several media outlets was lower by 7.8% compared to the actual survey estimate, as it was published in the survey bulletin.

What is a source of this problem?

I see two sources of this misinterpretation problem. 

The first source of the problem is related to the inadvertent linking of two similar, but still different, populations, which led users of the reported statistics to fall into a common trap when failing to distinguish between the two populations when combining already available statistics.

The second source of the problem has a broader context and relates to the level of  statistical knowledge of data users. Journalists in Bosnia and Herzegovina have had basic statistics courses within their education to encourage appropriate use of statistics in their work. However, this is a rather subtle mistake to those not used to thinking about it, and it seems that the journalists in this case relied on the authority of the statistical office and simply transferred part of the press release into their reports, which is understandable since official statistics office should be used as a reliable source of statistics in every country. But, in this particular case, journalists accidentally used the reported statistics out of the context and conveyed a misleading conclusion. Even more, in order to  avoid “horror aequi” problem and to enrich their text in relation to the stereotypical and sometimes boring words of statistical press releases, they put the mistaken conclusion in the titles of their reports as a main message to readers.

Here are graphs (produced from the final 2015 HBS dataset) representing the distributions of household consumption expenditure of the two mentioned populations, along with basic summary statistics  describing central tendency and variability.

Dangers of combining averages
Dangers of combining averages

Source: 2015 Household Budget Survey in Bosnia and Herzegovina, author`s calculation

These distributions and statistics are a very good evidence of the problem Megan Higgs explained on collection of averages (or medians) in her recent post for this blog. It is important that, in describing a target population, presenters of statistics avoid combining multiple averages or medians, since this approach sometimes leads to wrong statistical inferences.  And, in line with what Sir Francis Galton  said a long time ago, using only the average is a poor approach to describe a target population and can be misleading.

As a proficient user of data and statistics, it is easy for me to make the above plots to help understand differences between the two household populations – in terms of their consumption expenditure and number of members.  I recognize it is not so easy for most users interested in the statistics.  With the help of the plots and summary statistics, here are a few straightforward conclusions about the data from the Federation of Bosnia and Herzegovina:

Average consumption per household over all households regardless of size, amounts to 1,508.04 BAM. The estimated number of households is 652,129.

Household sizes are between 1 and 13 members and the most common household size is 2 members.

Average consumption for three-member households amounts to 1,636.25 BAM. The estimated number of three-member households is 128,582.

Three-member households, by definition, have no variation in household size (they are all 3!), but have variability in consumption expenditures (standard deviation = 1,082.612 BAM).

Distributions of consumption for both household populations are skewed toward higher expenditures (some high consumption households show up in the right side, or tail, of the distribution).

Distribution of the consumption over the all households population shows more variability (is more spread out) in comparison to the distribution of consumption for only three-member households, which is not surprising given the 3-member household distribution is a subset of the larger population of all households.

Although the average household sizes are almost the same (3.0 members) for both populations, the average consumption per household is lower in the all household population because of following reasons:

  • this average is calculated using the consumption expenditures of all households – sized from 1 member (minimum size) to 13 members (maximum size), as opposed to the restricted group of only 3 member households.
  • the consumption expenditure is related to the household size (the larger the household, the larger the consumption expenditure on average),
  • there are more households with less than 3 members than households with more than 3 members and the consumption expenditures of smaller households play a bigger role in the calculation of the overall average and dragg it towards lesser values.

In summary

If the producers of statistical data were aware of the recommendations of Sir Galton (19th century) and Neil Sheldon (recently in his post on this blog), they would certainly offer users at least the summary statistics of consumption by related household variables to allow for better communication of the data. In doing that, they should present survey results in a consistent and clear manner by taking into account the ability of readers to understand the information coming from the data and not putting them in a situation to have to identify when statistical quantities are combined in a way that results in a misleading statistics.

From the other point of view, journalists face the “horror aequi problem” in their work, and is it their job to check results produced by an official statistical office?  Probably, it is not. Anyway, when explaining statistical results, journalists should give priority to accurate language over the attractiveness of the text they write. 

In order to achieve such goals, it is certainly necessary to work on better communication between data producers and data users, among which journalists play a very important role since they are a medium between statistical institutes and the wide range of individuals many of whom are  not proficient users of data and statistics.

[1] It is not the National Statistical Institute of Bosnia and Herzegowina, but it is a statistical office of one of two Bosnian administrative entities.

[2] Administrative entity covering 51% of the territory of Bosnia and Herzegovina.

Edin Šabanović
Bosnia and Herzegovina