18 Comments
User's avatar
The Great Santini's avatar

Well, the question matters too. And the hidden assumptions. Are the dark skinned player more likely to actually commit a foul because of playing style. Are there particular players that are much more likely to commit fouls and they happened to be dark-skinned. The implication is that the fouls are being awarded in a prejudiced manner. But, perhaps not. I could run a hypothetical study on violent crime arrest rates by sex. I’d find that men are much more likely to be arrested for violent crimes. Do I then make the argument that we need to arrest more women for violent crimes to achieve equity? Or that we should release men who are arrested for violent crime to achieve equity? Or do I recognize that men are arrested more often for violent crimes because they are more likely to commit violent crimes for a variety of reasons (physical strength, opportunity, societal expectation, testosterone, etc.,etc.)?

Expand full comment
Kirsten's avatar

Thank you, this is great. I love it when different points of view, or interpretation of data or facts, are analyzed on substack.

Expand full comment
Matt Perri's avatar

Regarding statistical thresholds... https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0032734 Joe Mudge, et al. "Setting an Optimal alpha That Minimizes Errors in Null Hypothesis Significance Tests."

Balancing type 1 and type 2 errors in the quest for optimal significance testing seems important. Subject to the caveat that both these error types are important to a specific investigation.

Expand full comment
Catalin Popescu's avatar

Aren't you supposed to decide on which method of analysis you'll use BEFORE starting a study and REGISTER it beforehand? Precisely to avoid this?

Expand full comment
Dell's avatar

Very good exercise! The studies say that statistics count and how they are interpreted matter.

However, IMO, the starting point is being avoided.

Were all teams from the UK?

Most of the judges I see on TV are not White.

What is the race of the referees? And spectators?

Is it Black or White, Hispanic, etc., referees that bias their referring?

Do Asian, Mediterranean, African, Central and South America et al., teams have the same findings?

What are the percent of White and other players and judges of these teams? Any connections?

In hockey there is an appointed tough guy whose job is to rough up opposing players. And might be penalized above average.

Does soccer have the same situation? 1 or 2 players on each team who play rough and known to be penalized.

Are these players White or Black, etc., and what is the racial makeup of the spectators, refs, for the team and their opponents?

Now analyze the 29 teams.

Expand full comment
Bonnie Scheckenbach's avatar

These last 2.5 years have highlighted the most important need to find people who are smarter than you, more experienced than you, and whose judgment you implicitly trust to help sort through the sand that is modern Medicine. I am grateful to have discovered the sensible medicine group

Expand full comment
The Great Santini's avatar

In God we Trust. Everyone else bring data. And, expect that data to get the ‘wire brush treatment’. Check and cross-check. Are the instruments calibrated? When? What do the people closest to the situation think about it? What have they seen? Are there confounders in the situation? Does this line up with our theoretical understanding or contradict it? With prior experience or contradict it? Can we replicate the results? When errors are found are they admitted and corrected? Or covered up?

Expand full comment
HardeeHo's avatar

As I recall from long ago, we try to estimate some effect in a population from which we extract samples. We often assume that the population has some underlying distribution - normal mostly, but there are others as well. In the process of analysis the various techniques rely on assumptions about that population. Most of us simply take for granted that the assumptions are correct. In the red card case perhaps not enough samples?

Expand full comment
Lindy's avatar

To those who want the truth, things like the integrity of the researcher, motives and/or bias (realized or unknown), outside influences, preconceived notions, and well designed clarity of process are just a few of the things that need to be considered. To assume the results of a study are 100% correct and then present the findings to large populations causes untold harm. A list could be made of today’s commonly held “truths.”

Expand full comment
Dmitry's avatar

While it's true (and comforting) that the effect sizes are all in the same ballpark of 1.2, the practical differences between them being statistically significant or not are huge: in one case, the referees in question may well be suspended (or worse) for racial bias, in the other no action will be taken. When Vinay reviews a paper online and the authors say something like: we observed a 2.5% decrease in mortality, though the result was not statistically significant, he always says: "you do not get to say that! your results are not significant - meaning they may be due to random noise!". Unfortunately, significance is far more fungible then the effect size (it is a second-order effect, while effect size is a first-order effect) - different, often very subtle, assumptions on the error term in the model will lead to different standard error estimates, and thus different significance levels. These assumptions are very hard for a non-statistician to track and evaluate. Thus I agree with your original interpretation of the paper - the results are very troubling. This is another, and subtler, form of p-hacking: in the "classic" form we examine different effects until we find one that is significant - here the significance measure (e.g., a t-test) stays the same, and the effect changes. An alternative is to keep the same effect (as in the cited paper), but keep adjusting modeling assumptions until you reach significance. Far harder to catch.

Expand full comment
Chris's avatar

As someone with a graduate degree in one of the applied sciences (exercise science) I can tell you it is hopeless for the average American to be able to interpret scientific studies. Heck, most people with undergrads in an applied science can't do it, never mind people with zero training in it. This is essentially why the CDC & Biden admin was able to dupe most Americans into the idea that masks were effective and everyone needed to be vaccinated. The truth is murkier and requires the ability to analyze the studies rather than just take the CDCs word for it. The answer to almost every question in the applied science is almost always "it depends" and rarely a sweeping "yes" or "no".

Expand full comment
Johnny Dollar's avatar

Not sure what to make of this. What is the 21% figure supposed to show or represent? Does it account for player and/or national traits (ie style of play, tactics etc.)?

Expand full comment
Aaron Abend's avatar

This article underscores that we should make teaching statistics a primary goal of our educational system. If our schools spent more time teaching statistics (and to a broader swath of the school population) we would have better educated patients - and a better health system. The mathematical options in high school seem to focus on geometry, algebra, and calculus - all extremely important for many people who are going to go on to become scientists or engineers. But everyone needs to understand statistics (and the statistics of prior probabilities (Bayes) in particular).

Expand full comment
Butch Skulsky's avatar

Research and Journal reviews was a significant part of our Residency Program. Given what John Mandrola presented here showing the large variation of odds ratios depending on the Analytic approach, supports a critical view and question we developed for reading peer reviewed articles.

'It may be statistically significant but is it clinically significant and relevant?"

Expand full comment
Dan Laub's avatar

Excellent breakdown of the challenges associated with relying on statistical analysis without considering alternative interpretations.

Expand full comment
Zade's avatar

Very interesting. Thanks for this.

Expand full comment
daniel corcos's avatar

In practice, doctors don't have enough time to examine the data. It is an effort that is only necessary when the conclusions are contradictory or questionable.

Expand full comment