Open Access

Debunking mathematically the logical fallacy that cancer risk is just “bad luck”

EPJ Nonlinear Biomedical Physics20153:10

DOI: 10.1140/epjnbp/s40366-015-0026-0

Received: 24 September 2015

Accepted: 30 October 2015

Published: 1 December 2015

Abstract

Tomasetti and Vogelstein recently proposed that the majority of variation in cancer risk among tissues is due to “bad luck,” that is, random mutations arising during DNA replication in normal noncancerous stem cells. They generalize this finding to cancer overall, claiming that “the stochastic effects of DNA replication appear to be the major contributor to cancer in humans.” We show that this conclusion results from a logical fallacy based on ignoring the influence of population heterogeneity in correlations exhibited at the level of the whole population. Because environmental and genetic factors cannot explain the huge differences in cancer rates between different organs, it is wrong to conclude that these factors play a minor role in cancer rates. In contrast, we show that one can indeed measure huge differences in cancer rates between different organs and, at the same time, observe a strong effect of environmental and genetic factors in cancer rates.

Correspondence

Tomasetti and Vogelstein showed that the lifetime risk of cancers of many different types is strongly correlated (0.81) with the total number of divisions of the normal self-renewing cells maintaining organ-specific tissue’s homeostasis [1]. They conclude from this that the majority of variation in cancer risk among tissues is due to “bad luck,” that is, random mutations arising during DNA replication in normal noncancerous stem cells. Generalizing to cancer causation, they claim that “these stochastic influences are in fact the major contributors to cancer overall, often more important than either hereditary or external environmental factors.” In a review by Couzin-Frankel [2] of Tomasetti and Vogelstein’s article supported by an interview of Tomasetti, the above mentioned correlation is interpreted as excluding in large part the role of hereditary or environmental factors in the generation of cancers. Couzin-Frankel claims that Tomasetti and Vogelstein’s results “explained two-thirds of all cancers.”

Here, we show that this conclusion is fundamentally flawed, as it rests on neglecting the influence of population heterogeneity in correlations exhibited at the level of the whole population. Tomasetti and Vogelstein’s results quantify nicely that a large part of the differences in organ-specific cancer risk can be explained by the number of stem cell divisions in different tissues. But the logical fallacy is to extrapolate that, because environmental and genetic factors cannot explain the huge differences in cancer rates between different organs, then these factors play a minor role in cancer rates. In contrast, we show that one can indeed measure huge differences in cancer rates between different organs and at the same time observe a strong effect of environmental and genetic factors in cancer rates.

Tomasetti and Vogelstein’s article generated an important reaction among the scientific community, e.g. see [35], triggering a response from Tomasetti and Vogelstein to these reactions [6]. The present article is the only one, to the best of our knowledge, that addresses Tomasetti and Vogelstein’s work by using a model of populations that deconstructs rigorously the statistical fallacy at the source of their conclusion.

To make our demonstration as clear as possible, we imagine an hypothetical population partitioned into two groups. The first group exhibits a much lower cancer rate than the second group. This may be due to hereditary and environmental factors playing an important role, in addition to the number of stem cell divisions in organs. We show that, for any given organ, a correlation between lifetime cancer risk and the total number of stem cell divisions at the group level (averaged over the whole population) translates into an equal or higher correlation at the level of the whole population. This, however, says nothing about a possible heterogeneity in susceptibilities to external factors such as genetics or environment.

For each of the two groups we consider, we assume that the linear correlation of the type found in Ref. [1] holds:
$$ C_{i}^{(1)} = \beta^{(1)} S_{i}^{(1)} + \epsilon_{i}^{(1)}~, $$
(1)
$$ C_{i}^{(2)} = \beta^{(2)} S_{i}^{(2)} + \epsilon_{i}^{(2)}~. $$
(2)

\(C_{i}^{(1)}\) and \(C_{i}^{(2)}\) are the logarithms in base 10 of the lifetime cancer risks for group 1 and group 2, respectively, for organ tissue i. \(S_{i}^{(1)}\) and \(S_{i}^{(2)}\) are the logarithms in base 10 of the total numbers of divisions of stem cells in group 1 and group 2, respectively, for organ tissue i. \(\epsilon _{i}^{(1)}\) and \(\epsilon _{i}^{(1)}\) are the logarithms in base 10 of the contributions to lifetime cancer risks in the two groups in organ tissue i not explained by stem cell divisions.1 Finally, the coefficients β (1) and β (1) quantify the correlation between \(C_{i}^{(j)}\) and \(S_{i}^{(j)}\), j=1,2, across all organ tissues.

The correlation between \(C_{i}^{(j)}\) and \(S_{i}^{(j)}\) is given by
$$ \text{Corr}[C_{i}^{(j)}, S_{i}^{(j)}] := { \beta^{(j)} \text{Var}\left[S_{i}^{(j)}\right] \over \sqrt{\text{Var}\left[C_{i}^{(j)}\right] \text{Var}\left[S_{i}^{(j)}\right]} } $$
(3)
We also introduce the covariance between \(C_{i}^{(j)}\) and \(S_{i}^{(j)}\) defined by
$$ \text{Cov}\left[C_{i}^{(j)}, S_{i}^{(j)}\right] := \beta^{(j)} \text{Var}\left[S_{i}^{(j)}\right] ~. $$
(4)
The variances of \(C_{i}^{(j)}\) are
$$ \text{Var}\left[C_{i}^{(j)}\right] := \left[\beta^{(j)}\right]^{2} \text{Var}\left[S_{i}^{(j)}\right] + \text{Var}\left[\epsilon_{i}^{(j)}\right]~. $$
(5)
We assume that the correlations
$$ \text{Corr}\left[C_{i}^{(1)}, S_{i}^{(1)}\right] = \text{Corr}\left[C_{i}^{(2)}, S_{i}^{(2)}\right] := \rho~, $$
(6)
are the same in both groups, while the incidence of cancers is much higher in the second group. How is this possible? To make the example simple, we assume that the rate of divisions of the normal self-renewing cells maintaining the homeostasis of a given tissue i is approximately the same for all members of our population, and thus the same in both groups. This amounts to assuming
$$ S_{i}^{(1)}=S_{i}^{(2)} := S_{i}~. $$
(7)
To keep our derivation simple, we assume that the logarithm in base 10 of the contribution to lifetime cancer risks not explained by stem cell divisions, namely \(\epsilon _{i}^{(j)}\) (j=1,2), has a mean value equal to zero and is solely characterised by its variance \(\text {Var}\left [\epsilon _{i}^{(j)}\right ]\). Then, by definition, the corresponding lifetime risk of cancers is \(\tilde {\epsilon }_{i}^{(j)}=10^{\epsilon _{i}^{(j)}}\), j=1,2. The mean value of \(\tilde {\epsilon }_{i}^{(j)}\) is then \(10^{{\ln 10 \over 2} \text {Var}\left [\epsilon _{i}^{(j)}\right ]}, j=1,2\). This shows that the magnitude of lifetime cancer risks not explained by the number of stem cell divisions is controlled only by the variance \(\text {Var}\left [\epsilon _{i}^{(j)}\right ]\), for j=1,2. Then, group 2 exhibits many more cancers than group 1 \(\left (C_{i}^{(2)} \gg C_{i}^{(1)}\right)\) in the following cases:
  1. (a)

    β (2)β (1) (much larger sensitivity to stem cell divisions) while \(\text {Var}\left [\epsilon _{i}^{(1)}\right ]\) and \(\text {Var}\left [\epsilon _{i}^{(1)}\right ]\) remain of the same order of magnitude;

     
  2. (b)

    \(\text {Var}\left [\epsilon _{i}^{(2)}\right ] \gg \text {Var}\left [\epsilon _{i}^{(1)}\right ]\), while the sensitivities β (1) and β (2) to stem cell divisions remain similar;

     
  3. (c)

    β (2)β (1) and \(\text {Var}\left [\epsilon _{i}^{(2)}\right ] \gg \text {Var}\left [\epsilon _{i}^{(1)}\right ]\).

     
Consider the identity linking \(\text {Corr}\left [C_{i}^{(j)}, S_{i}^{(j)}\right ]\) and \(\text {Var}\left [\epsilon _{i}^{(j)}\right ]\) versus β (j) derived from (3) and (5),
$$ \text{Corr}[C_{i}^{(j)}, S_{i}^{(j)}] = \left[ 1 + {\text{Var}\left[\epsilon_{i}^{(j)}\right] \over (\beta^{(j)})^{2} ~\text{Var}[S_{i}]}\right]^{-{1 \over 2}}~. $$
(8)

Case (a) leads to \(\text {Corr}\left [C_{i}^{(1)}, S_{i}^{(1)}\right ] \ll \text {Corr}\left [C_{i}^{(2)}, S_{i}^{(2)}\right ]\), in contradiction with our assumption (6). Case (b) leads to \(\text {Corr}\left [C_{i}^{(1)}, S_{i}^{(1)}\right ] \gg \text {Corr}\left [C_{i}^{(2)}, S_{i}^{(2)}\right ]\), again in contradiction with (6). In fact, expression (8) implies that \(\text {Corr}\left [C_{i}^{(j)}, S_{i}^{(j)}\right ]\) remains unchanged when β (j) is increased arbitrarily while \(\text {Var}[\epsilon _{i}^{(j)}]\) is also increased proportionally to (β (j))2, since Var[S i ] is assumed to be the same in the two groups. Thus, the assumption (6) together with the identity (8) imposes case (c) as the only general possibility for \(C_{i}^{(2)} \gg C_{i}^{(1)}\).

The analysis of Tomasetti and Vogelstein [1] does not distinguish between groups exhibiting different cancer rates. This amounts to considering the total population of the two groups put together. Then, in our hypothetical population, Tomasetti and Vogelstein would observe
$$ C_{i}^{(1)} + C_{i}^{(2)}= [\beta^{(1)} + \beta^{(2)}] S_{i} + \epsilon_{i}^{(1)} + \epsilon_{i}^{(2)}~, $$
(9)
using our assumption (7). In this meta-population, the correlation studied by Tomasetti and Vogelstein [1] is that between \(C_{i}^{(1)} + C_{i}^{(2)}\) and S i :
$$ \text{Corr}\left[C_{i}^{(1)} + C_{i}^{(2)}, S_{i}\right] = { \text{Cov}\left[C_{i}^{(1)}, S_{i}\right] + \text{Cov}\left[C_{i}^{(2)}, S_{i}\right] \over \sqrt{\left(\text{Var}\left[C_{i}^{(1)}\right] + \text{Var}\left[C_{i}^{(2)}\right] + 2 \beta^{(1)}\beta^{(2)} \text{Var}[S_{i}]\right) \text{Var}[S_{i}]} } $$
(10)
From (3), (4), (6) and (7), we deduce
$$ \text{Cov}\left[C_{i}^{(j)}, S_{i}\right] = \rho \sqrt{\text{Var}\left[C_{i}^{(j)}\right] \text{Var}[S_{i}]}~, $$
(11)
which we insert in (10) to obtain
$$ \text{Corr}\left[C_{i}^{(1)} + C_{i}^{(2)}, S_{i}\right] = \rho ~ { \sqrt{\text{Var}\left[C_{i}^{(1)}\right]} + \sqrt{\text{Var}\left[C_{i}^{(2)}\right]} \over \sqrt{\text{Var}\left[C_{i}^{(1)}\right] + \text{Var}\left[C_{i}^{(2)}\right] + 2 \beta^{(1)}\beta^{(2)} \text{Var}[S_{i}]} } $$
(12)
By (5), we have
$$ \text{Var}\left[C_{i}^{(j)}\right] \geq [\beta^{(j)}]^{2} \text{Var}[S_{i}]~~, $$
(13)
which implies
$$ \text{Corr}\left[C_{i}^{(1)} + C_{i}^{(2)}, S_{i}\right] \geq \text{Corr}\left[C_{i}^{(j)}, S_{i}\right] ~, ~~~~~j =1 ~\text{or}~2~, $$
(14)

using definition (6).

The inequality (14), which recovers a standard result in statistics, constitutes our main lever to falsify Tomasetti and Vogelstein’s claim: the correlation between stem cell divisions and cancer risks at the level of the total population is in fact no lower than that found at the individual group level. In plain words, a strong correlation at the population level over all group types is blind to the existence of strong differences in group susceptibilities to cancer associated with other (i.e. environmental or hereditary) factors. In our hypothetical population, one group shows a much higher cancer rate than the other, in the presence of a strong correlation between number of stem cell divisions and total cancer rate, but this does not allow one to conclude that the total number of stem cell divisions is the dominant factor responsible for cancer in both groups (hence making cancer “bad luck”). On the contrary, this result is compatible with a possibly strong influence from other environmental and genetic factors, here embodied in the variable \(\epsilon _{i}^{(j)}\) as well as the possible dependence of β (j) on the same factors. The fundamental point that we are making here relates to the distinction between individual and group risks; for a discussion of this and how it applies to epidemiology and genetics (including a discussion of cancer), see [7].

We stress that our conclusion remains robust when relaxing the simple assumptions used in our hypothetical population. For instance, the demonstration generalizes straightforwardly to more than two groups and even to a continuum. The condition (6) of equal correlations within the two groups can be generalized to different values. And our argument and conclusion remain valid if it would appear that the rate of divisions of the normal self-renewal stem cells may vary between groups.

A part of the conclusion that Couzin-Frankel [2] and Tomasetti and Vogelstein’s [1] draw is thus unwarranted: Tomasetti and Vogelstein’s analysis does not allow one to conclude that the majority of cancers is due to unpreventable “bad luck.” We have just demonstrated that the existence of possibly strong differences in susceptibility to cancers, for instance due to environmental and genetic factors, has no effect on Tomasetti and Vogelstein’s result that a large fraction of the variation in cancer risk among tissues, that is, differences in cancer incidence among different organs, can be explained by the number of stem cell divisions. Tomasetti and Vogelstein’s findings point naturally to the prevalence of mutations during replications. This can explain why certain organs are more affected by cancer than others, but does not address the question of why certain populations or individuals are more affected by cancer than others.

We have demonstrated that the coexistence of several populations with very different cancer rates, for instance due to environmental and genetic causes, is compatible with the empirical evidence of a strong correlation between the total number of cell divisions and cancer risks [1]. One may ask whether our hypothetical population made of two groups with β (2)β (1) and \(\text {Var}\left [\epsilon _{i}^{(2)}\right ] \gg \text {Var}\left [\epsilon _{i}^{(1)}\right ]\) (case (c)) has anything to do with reality. The answer is empirical and requires to extend Tomasetti and Vogelstein’s analysis to different cohorts under various environmental stressors as in the Framingham Heart Study of NIH [8], the China-Cornell-Oxford Project [9] and others [1013]. Case (c) corresponds to a consistently large correlation between number of stem cell divisions and cancer risk and provides an interesting testable hypothesis, namely that controllable environmental factors and/or genetic traits impact both the cancer risks related to stem cell divisions and those that seem unrelated to stem cell divisions. This requires to study conditional correlations, thus extending the unconditional correlation study of Tomasetti and Vogelstein (since no condition on separate groups or cohorts is imposed in their study).

Indications of strong environmental factors are actually observed in figure 1 of Ref. [1]: (i) lifetime lung cancer risk is multiplied by 12 by smoking; (ii) lifetime head and neck cancer risk is multiplied by 6 after Human papillomavirus contamination; (iii) Hepatocellular carcinoma risk is multiplied by 10 after hepatitis C virus contamination; (iv) colorectal cancer risk is multiplied by 12 in the presence of familial adenomatous polyposis. A possible source of confusion may be due to the existence of more than 200 different kinds of cancers according to present taxonomy, with many more subtypes coming in month by month. For the well-known cancer types, epidemiology shows a strong link between environmental and life style factors. For the many other so-called sporadic cancers, epidemiological studies are much less advanced. We hope that the present note will help refocus on the importance of environmental and predisposing genetic factors [9, 1416] and not miss the forest for the trees.

We acknowledge very helpful feedbacks from Thomas Cerny, Jean-Yves Henry, and Christine Sadeghi, and also thank two anonymous reviewers for helpful comments.

Endnote

1 Given the range of lifetime cancer risks from 10−5 to 0.3 and of the total numbers of divisions of stem cells from 106 to 1013, for a linear correlation analysis (Pearson correlation coefficient), Tomasetti and Vogelstein [1] used these logarithmic variables (see their supplementary materials). The relevance of the use of log-variables is further suggested by their definition of the “extra risk score” [1].

Declarations

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

Authors’ Affiliations

(1)
Department of Management, Technology and Economics, ETH Zürich (Swiss Federal Institute of Technology)

References

  1. Tomasetti C, Vogelstein B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 2015; 347(6217):78–81.View ArticleADSGoogle Scholar
  2. Couzin-Frankel J. The bad luck of cancer: analysis suggests most cases can’t be prevented. Science. 2015; 347(6217):12.View ArticleADSGoogle Scholar
  3. Wild C, Brennan P, Plummer M, Bray F, Straif K, Zavadil J.Cancer risk. Role of chance overstated Science. 2015; 2015:728.Google Scholar
  4. Ashford NA, Bauman P, Brown HS, Clapp RW, Finkel AM, Gee D, et al. Cancer risk: Role of environment Science. 2015; 2015:727.Google Scholar
  5. Song M. Giovannucci EL. Cancer risk: Many factors contribute Science. 2015; 2015:728–9.Google Scholar
  6. Tomasetti C, Vogelstein B. Musings on the theory that variation in cancer risk among tissues can be explained by the number of divisions of normal stem cells (http://arxiv.org/abs/1501.05035).
  7. Davey Smith G.Epidemiology, epigenetics and the ’Gloomy Prospect’: embracing randomness in population health research and practice. Int J Epidemiol. 2011; 40:537–62.View ArticleGoogle Scholar
  8. Levy D, Brink S. A Change of Heart: How the People of Framingham, Massachusetts, Helped Unravel the Mysteries of Cardiovascular Disease. Knopf 1 edition (February 1, 2005).Google Scholar
  9. Campbell TC, Campbell TM. The China Study (the most comprehensive study of nutrition ever conducted and the startling implications for diet, weight loss and long-term health), Bendella Books. Texas: Dallas; 2006.Google Scholar
  10. Cairns J.The cancer problem. Scientific American. 1975; 233(5):64–78. doi:10.1038/scientificamerican1175-64.View ArticleADSGoogle Scholar
  11. Pisani P, Bray F, Parkin DM. Estimates of the world-wide prevalence of cancer for 25 sites in the adult population. Int J Cancer. 2002; 97:72.View ArticleGoogle Scholar
  12. Calle EE, Rodriguez C, Walker-Thurmond K, Thun MJ. Overweight, obesity, and mortality from cancer in a prospectively studied cohort of US adults. New Engl J Med. 2003; 348:1625.View ArticleGoogle Scholar
  13. Montesano R, Hill J. Environmental causes of human cancers. Eur J Cancer. 2001; 37:S67.View ArticleGoogle Scholar
  14. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, et al. Environmental and Heritable Factors in the Causation of Cancer –Analyses of Cohorts of Twins from Sweden, Denmark, and Finland. New Engl J Med. 2000; 343(2):78–85.View ArticleGoogle Scholar
  15. Lanzmann-Petithory D. CANCERALCOOL Consommation de boissons alcoolisées (vin, bière et alcools forts) et mortalité par differents types de cancers sur une cohorte de 100 000 sujets suivie depuis 25 ans., in Premier Colloque Final–Programme National de Recherche en Alimentation et Nutrition Humaine (PNRA). Paris: Agence Nationale de la Recherche et INRA; 2009.Google Scholar
  16. Servan-Schreiber D. Anticancer: A New Way of Life. New York: Viking Penguin, Penguin Group (USA) Inc.; 2009.Google Scholar

Copyright

© Sornette and Favre. 2015