Significant fitness

Last September, in my efforts to recover from a knee injury, I hired a personal trainer. When we made the original plan, it ended on 31 December 2021 and he added the slogan “New Year, New You” to the plan. It’s the new year and, thankfully, big progress. I just committed for another three months and I’m really looking forward to every single upcoming workout. It’s fun! However, like many new things, there are new insights and … surprise surprise, some are statistical.

My first insight is that even though interpreting science is an essential activity for fitness professionals, statistical training does not seem to be required in many degree-granting programs (such as here and here). Why not? Personally, I hold the statistical community at least partially responsible and consider that we have had a general failure to market the value of statistics education for consumers of scientific information. We teach a lot of courses about producing information, data, statistics, models, and insight but far fewer about interpreting and consuming it.

My second statistical insight is that there is a whole world of science consumers out there in the fitness industry. These professionals also have a unique role in science communication. Science that is consumed and communicated by fitness professionals impacts the day-to-day decision-making of millions of individuals through direct communication and word-of-mouth. The situation is similar to that of doctors and nurses in that science is consumed directly from original research or perhaps indirectly through media summaries, and then communicated to individuals to aid in their personal decision-making: what to eat, what medications to choose, what to avoid, etc. It is also similar to that of journalists, who communicate scientific research to the masses, filtering out the most interesting or relevant new findings for their audience. But, in my experience, this large group of fitness professionals receives much less attention from the statistics community.

Photo by Philip Ackermann

Statisticians quietly agree (I think right?) that better statistical understanding can make the world a better place, but it’s not always clear where to invest. The fitness industry includes an entire cadre of health enthusiasts who want to provide the best available decision-making guidance but who may have little statistical training (or enthusiasm) and not even see themselves, explicitly, as science consumers and communicators. In reality, they are often the bridge between science and action and they have the power to choose the best or the worst of what science has to offer. We are not going to reach them with a short course at a statistics conference and we are unlikely to introduce a new statistics course into the curricula of all the relevant degree and certificate programs. We might, however, be able to design an influential lecture, article, or YouTube video. So I ask, if we had the attention of every fitness professional in the world for one hour, what is it we would want to share?

I’d start with some habits of efficient skepticism:

Just because it is published, doesn’t make it true. This one is a gut punch. I don’t mind saying this to budding statisticians or over-confident scientists. I am somehow asking them to imagine a higher standard than simply publication. But saying it so explicitly to science consumers is putting our collective failure on display. Alas, it’s the truth and an essential starting point for professional science consumption.
If it is exactly what you wanted to hear, it probably isn’t true. Call this the “nope, eating chocolate does not lead to weight loss” principle.
If it is really surprising, it probably isn’t true. This is not to say that revolutions in understanding are not possible. Revolutions often begin with one surprising study, but there are many surprising studies and only a very few lead to revolutions in understanding. True revolutions in understanding happen as the cumulative impact of many studies. By the time we start feeling confident in a new finding, it is generally not surprising anymore. Additionally, the mathematics behind statistical testing, on the level of an individual study, imply that if a finding is very surprising, it probably is not true! Why? Imagine a study that finds a positive health impact of choice A over choice B. It could be a choice in any realm such as eating or stretching or taking supplements. The principles of statistical testing work where there is about an even chance that the choice has an impact on health. But, the principles of statistical testing crumble when our existing belief is that an impact of the choice on human health is very unlikely. To say it another way, the principles of statistical testing crumble when the result is very surprising. Recent research also suggests that flashier headlines are simply waived through the review process more easily. If that effect turns out to be real, it would clearly indicate a mis-understanding by scientific reviewers of this particular principle of efficient skepticism.
It is good practice to wait until several studies confirm the same idea before using that idea to provide decision-making guidance.
If it sounds definitive, be extra skeptical. In general, there are no silver bullets in the fitness or nutrition world. The more that is known, the less definitive things tend to become. Though there are plenty of habits which strongly impact risk (smoking, exercising, consumption of saturated fats), there is no one food or one exercise that will definitively change all people’s health outcomes.
Variation is part of the truth. Choice A may, on average, have a positive effect but, for a particular individual, have a negative effect. This idea is impressively difficult to communicate to high-level decision-makers such as managers and policy-makers. Happily, I expect it will be self-evident to fitness professionals, perhaps because they observe the outcomes of so many small decisions or perhaps because the idea that every body reacts differently to the same exercise or nutrition plan is well-ingrained in fitness training.
Follow the money. When a scientist is making a lot of money or promoting their findings in a for-profit context, be skeptical. It can be true that science leads to profit. But, where there are profits, the odds of strong science are reduced. High profile cases of corporate greed in the healthcare industry are not surprising. I have no specific evidence of scientists* and linking to particular news articles could be slanderous so I won’t do it here but consider the longevity industry, supplement industry, etc. Remember snake oil?
*Absence of evidence is not evidence of absence. There are paired implications here. First, a finding of “no statistically significant difference” does not imply that two things are the same. It is well-known that statisticians and scientists must communicate this clearly but mountains of mis-communications remain. Second, a lack of published research on a particular treatment does not indicate that the treatment is not beneficial. Take cupping for example (yup – I have been trying a lot of things to get past this injury). An article in Forbes outraged me and could easily lead to its own post here on Statisticians React to the News. Suffice it to say, the story angrily describes a scientist stating that cupping ”works for me” despite a lack of scientific evidence for its benefits. The risks, which are nearly negligible for occasional simple treatments, are then fairly dramatically over-stated by focusing on rare effects, effects of excessive use, and effects from extreme variations. Do we need to have scientific evidence to express our experience? To state that a massage feels good or that a meal with friends generally lifts our mood? Nope. We can state our own experience with certainty even in the absence of statistical evidence of an effect on the larger population. And, yes, yes, when we state our own experience, we should be careful to avoid confirmation bias, to acknowledge the potential of placebo effects, and to resist extrapolation.

Photo by Andrea Piacquadio

So with these habits of efficient skepticism, we have, hopefully, eliminated the idea that consumption of science is a simple or passive process.

With our remaining time, we would then need to provide some straightforward guidance on what to do, on how best to consume scientific information in such a way as to minimize the odds of making (or encouraging) poor health decisions. I classify the full evaluation of scientific quality and the synthesis of scientific results as “higher scientific skills,” skills requiring more experience than even producing science. The gorey details take time and experience and are well beyond a 1-hour presentation or a blog post. It is, however, totally reasonable to provide a few useful guiding questions for screening scientific quality and applicability.

Guiding questions for screening scientific results

If possible, evaluate the original published scientific study. It is not necessary to sit down and read every word. A lot of useful information can be gained by reading the abstract and scanning the methods and conclusions.

Is the journal prestigious? If the journal is highly prestigious, amazingly enough, you might want to be a little skeptical. Highly prestigious journals tend to publish highly surprising results (see above). If the journal charges high fees to publish, be highly skeptical.
What is the sample size? If the sample size is pretty small (< 100), be pretty skeptical. As the sample size goes up, so should your confidence in the results. As the sample size gets really large (>10,000), check that the reported difference between, say, choice A and choice B would be meaningful in your situation. The reporting of a small difference (even if labeled statistically significant!) does not imply that a large difference also exists.
Who were the study subjects? A study looking at mice does not necessarily pertain to people. A study of people provides results relevant to similar people. Look carefully at the study participants in terms of body size, gender, age, race, affluence, location, health status, access to health food etc. A study looking at paid participants, for example, does not necessarily imply the same results in a fit, affluent population. If a study reports, for example, “we controlled for body size”, it means that among these study participants the reported effect of, say, choice A does not seem to be due to differences in body size. If all body sizes were not included in the study, it does not mean that the results are relevant to all body sizes.
What is the effect? Excellent studies report the effect size explicitly. Is that effect size meaningful to an individual? For many studies, the effect of choice A versus choice B is estimated by subtracting average values for the two groups, reading a value off of a graph, or more complicated methods. It is worth the time to estimate it and think about how meaningful it might be to particular individuals.
What was the range of the results? Is the effect sometimes negative, implying a risk? Was the effect highly variable, indicating that personal choices based on the results might also have highly variable outcomes?
What do you think of the data? If the raw data are available (as they should be), take a look at them instead of relying on averages! If the data are plotted as a scatter of dots around a line, imagine those data without the modeled lines. Would other lines have been just as plausible? Imagine the data without each of the extreme observations – removing them one by one in your mind’s eye. Do the study conclusions hold up or are they dependent on an extreme data point?
What is the risk compared to? Poor science may report a value for choice A without explicitly stating what that effect was compared to. For example, “eating eggs improves health outcomes”. Is that compared to eating donuts? Or compared to a healthy diet which included fewer eggs? The results of the study are only relevant to personal decision-making if the comparison in risk is relevant.

If you are only able to read a summary, such as a report,a newspaper article or a blog post:

Is the original scientific literature linked and/or cited?
How legitimate is the summary source? Do other sources confirm the same results?
Did the writer find quotations from scientific sources not associated with the original research? This is good practice.
Did the writer include a base of scientific information on which the new study results are founded and indicate efficient skepticism (as above)?
Did the writer communicate important information, as above, from the original article (sample size, effect size, study subjects, range of results, useful comparative risk estimates)?
Finally, check your own personal reaction. Does the story report science that you want to be true or that seems surprising? If so, be careful! Put efficient skepticism principles 2 and 3 (above) in action.

I believe that would be a pretty action-packed hour and could easily be extended to a full day, week, year, or beyond (like my fitness training!). But, if there were five minutes left, what would you add to the Significant Fitness Bootcamp?