MuST 9: Evidence, Inference, and Risk

Breadcrumb Navigation



Day 1 (31 March, 2016)

10:00 - 11:15 Plenary Session
10:00 - 10:15 Welcome: Stephan Hartmann and Barbara Osimani
10:15 - 11:15 Glenn Shafer: "Probability Judgement" (Room M203)
Chair: Jan Sprenger 
11:15 - 11:45 Coffee Break
Session I (Room M203) Session II (Room M209)
11:45 - 12:30 Christoph Jansen and Thomas Augustin: "Probabilistic Evaluation of Preference Aggregation Functions".
Chair: Stephan Hartmann 
Paul Griffiths: "Holding Pathology Hostage".
Chair: Phyllis Illari
12:30 - 13:15 Anne-Laure Boulesteix, Roman Horning, Willi Sauerbrei: "On Fishing for Significance and Statistician´s Degree of Freedom in Biomedical Applications".
Chair: Stephan Hartmann 
Jeffrey Aronson: "Do Mechanisms Constitute Clinical Evidence?".
Chair: Phyllis Illari 
13:15 - 14:45 Lunch
14:45 - 15:30 Daniel Malinsky: "Decision Making under Causal Uncertainty".
Chair: Vincenzo Crupi
Phyllis Illari: "Who´s Afraid of Mechanisms?".
Chair: Jürgen Landes 
15:30 - 16:15 Justin Dallman: "Evidence Principles, Belief, and Credence".
Chair: Vincenzo Crupi
David Teira, Brendan Clarke, Maël Lemoine: "Taking the Risks of Testing Personalized Treatments".
Chair: Jürgen Landes 
16:15 - 16:45 Coffee Break
16:45 - 17:30 Jan Sprenger: “Applying a Measure of Corroboration in Statistical Inference”.
Chair: Brendan Clarke
Daria Jadreškić: "Some Social Aspects of the Discovery, Synthesis and Production of Cortisone in the 1930s-1950s".
Chair: David Teira 

Day 2 (1 April, 2016)

09:45 - 10:45 Plenary Session: Julian Reiss: "In Defence of Statistical Minimalism" (Room M203)
Chair: Paul Griffiths 
10:45 - 11:15 Coffee Break
Session I (Room M203) Session II (Room M209)
11:15 - 12:00 Nevin Climenhaga: "Epistemic Probabilities: A Guide for the Perplexed".
Chair: Seamus Bradley 
Wolfgang Pietsch: "A Causal Account of Analogical Inference".
Chair: Mark Colyvan 
12:00 - 12:45 Bengt Autzen: "A Popperian Doctrine on Probability Revisited".
Chair: Seamus Bradley 
Roland Poellinger: "Confirmation via In Silico Simulation".
Chair: Mark Colyvan 
12:45 - 14:15 Lunch
14:15 - 15:00 Ludwig Fahrbach: "Evidence Amalgamation and Atomism".
Chair: Lorenzo Casini 
Momme von Sidow, Dennis Hebbelmann, Björn Meder: "How Causal Reasoning Can Distort Evidence".
Chair: Roland Poellinger
15:00 - 15:45 Jürgen Landes: "A Multi-Criterial Approach to Amalgamating Evidence".
Chair: Lorenzo Casini 
Thomas Boyer-Kassen: "A Defense of the Precautionary Principle".
Chair: Roland Poellinger 
15:45 - 16:30 Barbara Osimani: "Causal Inference in Pharmacology: towards a Framework for Evidence Amalgamation".
Chair: Lorenzo Casini 
Pierrick Bourrat: "Challenging the Evidence for Species Selection and the Export-Of Fitness Model of Evolutionary Transitions in Individuality".
Chair: Roland Poellinger 
16:30 - 17:00 Coffee Break
17:00 - 17:45 Lorenzo Casini: "A Bayesian Theory of Constitution".
Chair: Barbara Osimani 
Inge de Bal: "Evidence and Extrapolation in Failure Analysis".
Chair: Phyllis Illari
20:00 Conference Dinner (Seehaus im Englischen Garten)

Day 3 (2 April, 2016)

10:00 - 11:00 Plenary Session: Jon Williamson: "Establishing Causal Claims in Medicine" (Room M203)
Chair: Mark Colyvan 
11:00 - 11:30 Coffee Break
Session I (Room M203) Session II (Room M209)
11:30 - 12:15 Christian Wallmann: “Three Methods for Solving the Problem of Inconsistent Marginals in Data Integration”.
Chair: Brendan Clarke 
Bennet Holman: "Pharmacology: An Asymmetric Arms Race".
Chair: Lorenzo Casini 
12:15 - 13:00 Alexander Hapfelmeier: "Exploratory Subgroup Analysis by Recursive Segmentation".
Chair: Brendan Clarke 
Momme von Sydow, Niels Braus: "On Biased Contingency Assessment and Inner-Organizational Dilemmas of Personel Evaluation".
Chair: Lorenzo Casini 


Jeffrey Aronson: “Do Mechanisms Constitute Clinical Evidence?”

“Expert opinion” was included as the lowest form of clinical evidence in early forms of evidence hierarchies. This was a category error. When expert opinion in clinical practice is based on poor evidence it is a poor opinion; when based on high-quality evidence it is a good opinion. It is not in itself evidence. Likewise, to regard mechanisms and “mechanism-based reasoning” as evidential (so-called “mechanistic evidence”) is a category error. This is reflected, for example, in the fact that one can adduce evidence for mechanisms using forms of evidence such as systematic reviews and randomized and observational studies (“evidence of mechanisms”). Thus, to the extent that one views mechanisms in an evidential context, they provide a different class of evidence, if at all, to that provided by systematic reviews and formal studies. The relation between mechanisms and forms of evidence is shown in the figure below.


Different forms of clinical evidence (top orange box) can lead to conclusions of clinical relevance (ochre box); both clinical and non-clinical evidence (orange boxes) can provide evidence of mechanisms (orange arrows).
However, mechanisms (right-hand blue box) do not provide this class of evidence. Instead, they have two roles:

  • to generate hypotheses (blue arrows);
  • to help in interpreting evidence that arises from other sources (purple arrow).

Clinical expertise (left-hand blue box) is the ability of experts to apply their clinical skills, clinical experience, and evidence-based knowledge to the management of practical problems, modified in the light of the views and preferences of others (e.g. patients and carers); it is distinct from expert opinion, although it may rely on opinion based on high-quality evidence.
The higher the quality of the evidence underpinning the mechanism, the more useful mechanism-based reasoning becomes ( Evidence for mechanisms may include clinical or non-clinical evidence; the latter may include in vitro evidence, ex vivo or in vivo animal evidence, in silico evidence (e.g. computer simulations), and in cerebro evidence (thought experiments). Mechanism-based reasoning, like expert opinion, should therefore be placed outside schemes of levels of evidence. It could be replaced by an item such as “Non-evidential information incorrectly treated as if it were evidence”. This would include mechanisms and expert opinion as well as, for example, formal consensus, reports of expert committees, and basic principles. All this raises the more general question of the definition of evidence and different classes

Beng Autzen: “A Popperian Doctrine on Probability Revisited”

Can simpler, nested models have higher probabilities than more complex models? Popper (1959) denies this possibility. In response to Jeffreys’ (1931) postulate that simpler theories should have higher prior probability than more complex theories in a Bayesian analysis, Popper argues that such an assignment of prior probabilities violates the probability calculus. Popper notes that if the event A is a subset of event B, then the probability of A, P(A), cannot be larger than the probability of B, P(B), for any probability measure P. Applying this observation to Jeffreys’ simplicity postulate, Popper argues that a simpler and nested model can therefore not have a higher prior probability than the more complex and encompassing model. A similar point has recently been made, without reference to Popper, in the scientific literature. Templeton (2010) observes that in some prominent phylogeographic studies nested models have higher posterior probability than their encompassing models. The fact that the practice of Bayesian model comparison seems to conflict with Popper’s doctrine on how to assign probabilities to nested models suggests that there is still something to be said on this issue. In this paper I will therefore revisit Popper’s argument and evaluate its relevance for the discussion of Jeffreys’ simplicity postulate. I will argue that Popper’s doctrine relies on a particular understanding of a statistical model and will propose an alternative reading of a model that not only makes sense of the practice of Bayesian model comparison in phylogeography but also offers a novel defence of Jeffreys’ simplicity postulate.


  • Jeffreys, H. (1931). Scientific Inference. Cambridge: Cambridge University Press.
  • Popper, K. R. (1959). The Logic of Scientific Discovery. London: Hutchinson.
  • Templeton, A. R. (2010). Coherent and incoherent inference in phylogeography and human evolution. Proceedings of the National Academy of Sciences of the United States of America 107, 6376–

Lisa Bero: “Systematic Review Methods: How Far Can We Go?”

The methods of systematic review and meta-analysis developed during the 1930s were used to synthesize research in psychology and education. These methods advanced rapidly during the 1980s and 90s as systematic reviews became the foundation for evidence-based medicine. The strengths of systematic review methods are that they identify all evidence relevant to a particular question, assess the bias in the evidence, and summarize the results. Today, systematic reviews are being required as the “evidence base” for decisions related to animal toxicology, environmental risk assessment, dietary guidelines and public policy. Systematic reviews of medical interventions are being challenged with how to incorporate health information from data sources as varied surveillance to mobile health apps. This talk will explore issues that arise when expanding systematic review methods into new areas, including how to define evidence, determine what biases are relevant and assess them, and combine diverse data

Pierrick Bourrat: “Challenging the Evidence for Species Selection and the Export-Of Fitness Model of Evolutionary Transitions in Individuality”

In this talk, I identify one major problem with the view that individual-level selection and species-level selection are two ontologically distinct processes of selection acting independently from each other. After having presented the evidence for species selection using a simple example, I demonstrate that it is in fact an artefact created by measuring fitness at two levels over different periods of time. Once fitness, at each level, is measured over the same period of time, individual level selection and species level selection merely represent one and the same process occurring over different periods of time.
Using the same argument, I argue against the model of evolutionary transitions in individuality (such as the transition from uni- to multicellular organisms) developed by Michod and colleagues, and extended by Okasha, commonly referred to as the “export-of-fitness view”. In this model as a transition from a uni- to multicellular mode occurs, Okasha argues that one could not measure the fitness of the multicellular organisms in terms of the fitness of the cells they are composed of. I show that once both fitnesses at the unicellular level and at the multicellular level are measured over the same period of time, one could in principle measure the fitness of multicellular organisms in terms of the fitness of the cells composing it.
I conclude, contrary to Okasha, that the notion of levels of selection, when concerning both species selection and evolutionary transitions in individuality, is an epistemic rather than an ontological

Thomas Boyer-Kassem: “A Defense of the Precautionary Principle”

How should one rationally take decisions under risk when faced with serious damage outcomes? A famous decision rule is the precautionary principle (PP), for instance expressed in the 1992 UN Rio Declaration: “Where there are threats of serious or irreversible damage, lack of full scientific certainty shall not be used as a reason for postponing costeffective measures to prevent environmental degradation”. In this formulation or others, the PP has been the subject of a large critical literature from scholars in many fields. Here, I will defend it against two recent arguments which have not been contradicted yet and that are among the most crippling ones: (1) the PP is incoherent (Peterson 2006), and (2) the PP is subject to a contextualist puzzle (Carter and Peterson, 2015). (1) Peterson (2006) considers several (formal) explications of the PP that, according to him, any advocate of the PP would at least endorse. He then shows that each of these explications is logically incoherent with well-established general principles of decision theory. I claim that his explications of the PP do not entail our intuitions of what the PP says, i.e. that his explications are inadequate. To show it, I present a counterexample with two possible actions X and Y , that lead to various outcomes according to the state of the world that actually obtains. Some of these outcomes are fatal in Peterson’s sense, some are not. I contend that anyone following the PP would intuitively judge Y preferable to X, but that Peterson’s explications of the PP require that X should be preferred to Y . More generally, I analyze the source of the failure of Peterson’s explications in the fact that his grid of analysis does not allow for several levels of fatality in outcomes, whereas we intuitively do so and use them to rank outcomes and decide for actions. Overall, Peterson’s incoherence argument against the PP does not get off the ground because it relies on an inadequate explication of the PP. (2) Carter and Peterson (2015) point to the fact that the standard interpretation of the PP is epistemologically contextualist: the knowledge condition that must be met (e.g. “lack of scientific certainty” in the Rio declaration) depends on the severity of the anticipated damage — the more severe it is, the less certain we need to be. The problem, according to Carter and Peterson, is that the severity of the damage can be evaluated differently by opposing parties, like an industry and an environmental body. A “favouring rule” has to be supplemented to solve the conflict, but Carter and Peterson argue that no one is suitable. I reply instead that the conflict is solved by considering an impersonal evaluation, and I defend it against possible objections, namely that it is ill-defined and cannot be implemented.


  • Carter, J. Adam and Martin Peterson (2015), “On the Epistemology of the Precautionary Principle”, Erkenntnis 80:1–13.
  • Peterson, Martin (2006), “The Precautionary Principle Is Incoherent”, Risk Analysis, 26(3):595–601.
  • UN General Assembly (1992), “Rio Declaration on Environment and Developmenttop

Anne-Laure Boulesteix, Roman Hornung, Willi Sauerbrei: “On fishing for Significance and Statistician´s Degree of Freedom in Biomedical Applications”

In this talk I discuss a major general problem related to practical statistical analysis with a particular focus on biomedical applications. There are usually plenty of conceivable approaches to statistically analyze data that both make sense from a substantive point of view and are defensible from a theoretical perspective. The data analyst has to make a lot of choices, a problem sometimes referred to as “researcher’s degree of freedom”. This leaves much room for (conscious or subconscious) fishing for significance: the researcher (data analyst) sometimes applies several analysis approaches successively and reports only the results that seem in some sense more satisfactory, for example in terms of statistical significance. This may lead to apparently interesting but false research findings that fail to get validated in independent studies. In this talk I discuss different strategies to address this major problem. A possible strategy, validation, is applied at the researcher level. The researcher may essentially apply as many approaches as he/she wants and possibly select one of them based on its positive results, as long as he/she validates these results using independent validation data. The price to pay is a loss of statistical power when performing statistical tests. Another strategy, increased development of guidance documents, may be applied at the level of the scientific community. It calls for more studies whose aim is to help researchers choose their statistical analysis approach. Such studies may reduce the multiplicity problem beforehand, instead of correcting for it afterwards as in the validation strategy. In reality, guidance documents for analysis will probably never eliminate the researcher’s degree of freedom, and validation will probably never be applicable/sensible everywhere. A mixture of both strategies may be a happy medium, while careful reporting following guidelines, increased acceptance of negative research findings by journals, and publication of pre-specified analysis protocols as well as data and code for the purpose of reproducibility may also contribute in combating fishing for significance. This talk is based on a paper forthcoming in: Ott, Max; Pietsch, Wolfgang; Wernecke, Jörg, “Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data”, Wiesbaden:

Annamaria Carusi: "The Socio-Technological Epistemology of In Silico Medicine"

In silico medicine is the translational edge of systems biomedicine, that takes forward and develops further the computational resources associated with systems biology, rather than its focus on gaining understanding of complex systems. The computational resources of large data bases and supercomputing for data processing, modelling and simulation bring into scientific possibility systems characterised by dynamic non-linear causality, feedback, and cross-level networks of interactions. The programme of research and development that is labelled ‘in silico medicine’, has emerged out of drives to develop informational and computational resources and infrastructure for biological and physiological applications. This paper focuses on one such project, the Virtual Physiological Human (VPH). Initially funded by the European Commission FP7, the VPH has consistently described itself as a platform for developing methodologies and technologies ‘to enable collaborative investigations of the human body as a single complex system’ . From the outset, potential medical applications in diagnosis and treatment were highlighted as motivations for investment in this area of technology development; drug and device development and safety testing are domains that have emerged as particularly strong areas for in silico medicine, in part because of the pharmaceutical industry and regulatory bodies in computational methods and models. Developing computational methods and models for drug safety and toxicity testing raises several epistemological, pragmatic and ethical issues, many of which are well-known from other areas of pharmacology. In this talk, I focus on the epistemological issues raised specifically by the construction and validation of computational models; in fact, I aim to show that the challenged faced by in silico modelling for drug testing are in the domain of social epistemology as much as they are in that of formal epistemology. Two of the main challenges faced by computational modelling and simulation are variability and uncertainty; however these two challenges have aspects that have to do with the nature of the biological phenomena being modelled, and aspects that have to do with the social frameworks within which modelling and simulation are carried. I aim to show how these are connected, in the conceptual framework of the Model-Experiment-Simulation system, that is a system of interconnected data, methods and models, but also a system of interconnected scientists and engineers, with their own cultures, priorities and biases. I discuss the implications of the MSE system for the conceptualisations of validation that will enable in silico modelling and simulation to be a viable contributor to drug development and testing.


  • Carusi, A. (2014) ‘Validation and Variability: Dual Challenges on the Path from Systems Biology to Systems Medicine’, Studies in the History and Philosophy of Science, Part C Biological and Biomedical Sciences, 48, 28-37.
  • Carusi, Burrage and Rodriguez (2013) Model Systems in Computational Systems Biology. Juan Duran and Eckhart Arnold (Eds.): Computer Simulations and the Changing Face of Scientific Experimentation, Cambridge Scholars Publishing, p.118-144.
  • Carusi, Burrage & Rodriguez (2012) Bridging Experiments, Models and Simulations: An Integrative Approach to Validation in Computational Cardiac Electrophysiology, American Journal of Physiology-Heart and Circulatory Physiology, vol. 303 no. 2 H144-H155


Lorenzo Casini: "A Bayesian Theory of Constitution"

We develop a Bayesian theory of constitution that identifies as constituents those spatiotemporal parts of a phenomenon whose causal roles contain the phenomenon’s causal role. The proposal accomplishes two goals: first, it formally analyses the notion of constitution present in theories of mechanistic explanation in a way that avoids the pitfalls of the currently dominant theory of constitution, viz. Craver’s (2007) mutual manipulability theory; second, by drawing on the conceptual resources of Bayesian networks, it paves the way for a Bayesian methodology for constitutional

Nevin Climenhaga: "Epistemic Probabilities: A guide for the Perplexed"

The epistemic probability of A given B – notated P(A|B) – is the degree to which B supports A, or makes A plausible. It constrains rational degrees of belief, in that, if P(A|B) = n, then someone with B as their evidence ought to be confident that A to degree n. In this paper I address the question of what determines the values of epistemic probabilities. I divide this question into two parts, which I call the structural part and the substantive part.

The structural question asks what probabilities’ values are determined by the values of other probabilities, and what probabilities’ values are not determined by the values of other probabilities. These latter probabilities are the basic quantities out of which other probabilities are built, the ‘atoms’ of probability theory. Given values for them, we can compute values for all non-basic probabilities. The substantive question then asks how the values of basic probabilities are determined.

I defend an answer to the structural question I call Explanationism. This view identifies as basic the conditional probability of an atomic proposition given relevant explanatorily prior propositions. We first construct a Bayesian network in which we draw a node for each variable (i.e., a partition of atomic possibilities) in our language, and draw an arrow from one variable to another iff the value of the first directly influences the value of the second. A conditional probability P(A|B) is then basic iff A is an atomic proposition and B is a conjunction of values for all parents of A in our Bayesian network.

I defend Explanationism against a view I call Orthodoxy, on which the unconditional probabilities of world-states are basic. (World-states are conjunctions in which one member of each variable appears once.) Orthodoxy is implicit in orthodox mathematical probability theory, in which the set of world-states is treated as the sample space, and values between 0 and 1 are assigned to each world-state such that the sum of these values is 1. Kolmogorov’s axioms then allow us to determine any other probabilities in terms of the probabilities of world-states.

I present two arguments against Orthodoxy and in favor of Explanationism. The first is that the probabilities that Explanationism identifies as basic are precisely those which we find ourselves better able to perceive the values of, both in idealized thought experiments and in real-life applications of probability. This perception seems to be based on a grasp of the immediate relations between the propositions involved in the probability, rather than an implicit grasp of world-state probabilities. The second argument is that plausible substantive methods such as the Principle of Indifference deliver more accurate results when combined with Explanationism than when combined with Orthodoxy.

After presenting these arguments in favor of Explanationism, I explore some implications of the view. In particular, I argue that it helps us make progress in several debates in philosophy of science, including the problem of the priors and the relation between inference to the best explanation and probability

Justin Dallman: "Evidence Principles, Belief, and Credence"

One widely held view in the literature on our world-directed mental states is that a high degree of confidence in a proposition is not enough to secure outright or "full" belief. It has gone unnoticed that endorsing this **separation** thesis poses a problem for the theory of evidence. This paper develops the problem and possible responses.

The puzzle becomes apparent when we consider two natural sufficient conditions on all-things-considered evidence that have been advanced and defended in the evidence theory literature. Roughly stated for the sake of conciseness, it follows from every Bayesian confirmation measure that if learning proposition e rationally requires that one raise (lower) one's credence in p, then e is evidence for (against) p. On the other hand, it has also been suggested that a sufficient condition on e's being evidence for (against) p is that learning proposition e rationally permits one to come to believe (cease believing) p.

These are both plausible conditions on all-things-considered evidence. However, under separation they do not seem co-tenable. Separation, and the cases that typically motivate it, support the idea that one can learn some e that both rationally raises (lowers) one's credence in some p, while rationally permitting that a subject to come to cease believing (believe) p. But, in cases like these the sufficient conditions entail that e is *both* all things considered evidence for p and all things considered evidence against p. But, since propositions can be evidentially relevant in at most one of those ways, these cases constitute a real challenge to our common understanding of the notion of *evidence* under which both conditions initially seemed plausible.

The full length paper examines the motivations for **separation** in detail including considerations of the logic of rational belief like rational belief agglomeration, pragmatic encroachment, and the difficulty of forming rational belief on the basis of statistical evidence alone. In each case, the same considerations that motivate **separation** are shown to support the kinds of cases that give rise to the puzzle.

It is also argued both sufficient conditions are well motivated. Under **separation**, belief might entail having a high confidence, but some further constraints need to be met in order to believe. Some constraints that have been suggested in the literature are that the confidence be *stable*, that it *resist reconsideration*, or that the confidence-haver be *willing to act as if the proposition obtained in the context*. But, each of these is valuable for different applications, and in the case of the former constraints **epistemic applications** - for example to aid in maximizing expected accuracy as bounded agents. So, I argue that the best solution to the puzzle involves distinguishing multiple notions of evidence for which the sufficient conditions respectively

Inge de Bal: "Evidence and Extrapolation in Failure Analysis"

This paper is about extrapolation in failure analysis and the evidence needed to warrant this. Failure analysis is a part of engineering that deals with the analysis of (causes of) failures in artefacts. When an artefact is not able to perform its intended function, failure analysts study the specific circumstances that lead to this failure. They do not simply look to explain what happened in this specific situation. They also aim to produce knowledge that helps to prevent similar problems in the future. As Petroski says:

“When failures do occur, engineers necessarily want to learn the causes. Understanding of the reason for repeated failures — structural or otherwise — that jeopardize the satisfactory use and therefore the reputation of a product typically leads to a redesigned product.” (Petroski 2001, p.13).

In other words, they look for ways to use the knowledge about causal relations in one specific context, to draw conclusions regarding causal relations in other contexts. These situations range from other instances of the same artefact, over similar artefacts, and even to very distinct artefacts. One of their goals is furthermore to find ways to alter design plans.
This type of inference, where we start from knowledge of causal relations in one context and, based on this, draw conclusions regarding relations that hold in other contexts, is a type of extrapolation. A lot of work has been done on extrapolation in the philosophy of social and biomedical sciences. One important work on this topic is Daniel Steel’s “Across the Boundaries. Extrapolation in Biology and Social Science”. Steel defines extrapolation as follows:

“[...]one begins with knowledge of a causal relationship in one population and endeavour to reliably draw a conclusion concerning a relationship in a distinct population.” (Steel 2008, p.3).

Steel is specifically referring to extrapolation in biology, where researchers start from knowledge about causal relations about animal organisms (base) and draw conclusions regarding causal relations about human organisms (target). In his book, Steel presents an account of the (type of) evidence needed to warrant these extrapolations.
In this paper, I will adapt Steel’s framework to fit the extrapolations made in failure analysis and shed light on the evidence they need. Clearly failure analysis does not deal with animals and humans, but with artefacts. So Steel’s definition is not straightforwardly applicable to engineering, nor is the notion of ‘population’. Still, in order to evaluate whether the inferences are warranted, we need to know the base and target of extrapolations in failure analysis. Using several case studies, I will present an account of what this type of extrapolation entails, how we can characterize it and what its goal is. Based on these findings, I explore how the extrapolation is possible and what evidence failure analysts put forward to warrant this inference. I will argue that, like with the biomedical sciences, failure analysts use mechanistic evidence to warrant their extrapolations. Finally, I evaluate whether this evidence is always sufficient and discuss relevant limitations and

Ludwig Fahrbach: “Evidence Amalgamation and Atomism”

It took chemists and physicists more than a century to gather enough empirical evidence to convince the scientific community of the truth of atomism. By 1916 the evidence was so strong that reasonable scientific doubt was no longer an option (Perrin 1916). The confirmational relation between the evidence of that time and atomism has a rich structure and constitutes an important testing ground for accounts of scientific inference. A number of philosophers have provided reconstructions of various aspects of this confirmational episode (see Glymour 1980, Salmon 1984, Mayo 1996, Achinstein 2001). These are valuable accounts, but none of them offers a reconstruction of what was arguably the most important good-making feature of the evidence for atomism: the existence of around nine entirely different methods that led to the very same value for Avogadro’s number N of 6x1023. (Perrin reports 13 different methods, but some of them are similar to each other.) These methods refer to entirely different phenomena such as Brownian motion, the blueness of the sky, radioactive decay, black body radiation, electrolysis, and so on. As Perrin (1916) puts it: “Our wonder is aroused at the very remarkable agreement found between values [of N] derived from the consideration of such widely different phenomena… The real existence of molecules is given a probability bordering on certainty.”
For this case of amalgamation of evidence I aim to offer a Bayesian analysis. As a measure of the strength of evidence I use the so-called Bayes factor Pr(E|A)/Pr(E|~A). Here “A” denotes atomism which states that matter consists of particles and that the number of particles in one gram mole is 6x1023. E is the conjunction of the results E1, …, E8 of eight of the nine methods. (One method serves to fix the free parameter N.)
Values of the two likelihoods can then be determined as follows. Regarding Pr(E|A), it is plausible that each Ei is entailed by atomism plus the respective auxiliaries, hence Pr(E|A) equals one. (The auxiliaries can be taken to be part of the background knowledge). Determining a value for Pr(E|~A) is more involved. Two steps have to be distinguished. First, I interpret the independence of the eight different methods to imply Pr(E|~A) = Pr(E1|~A) x … x Pr(E8|~A). In my paper, I compare this probabilistic interpretation of independence of evidence with some other probabilistic interpretations found in the literature (e.g., Myrvold 2003, Lange 2004). Second, each Pr(Ei|~A) can be estimated to be lower than 1/10. In my paper, I justify this upper bound with an argument that uses Benford’s law.
Put together we arrive at a value for Bayes factor of at least 108. A Bayes factor of this magnitude constitutes extremely strong evidence. It means, for example, that all priors of atomism bigger than around 1 / 1 million lead to posteriors very near to 1. To compare, Harold Jeffreys (1961) suggests that values of the Bayes factor between 10 and 100 correspond to “strong” evidence and values bigger than 100 correspond to “decisive” evidence. So, given the evidence of 1915, the Bayesian model suggests very strong confidence in atomism. The model is very simple, but it relies on several modelling assumptions and idealizations, the justification of which require careful discussion.


Paul Griffiths: “Holding Pathology Hostage”

Tim Lewens (2015) comments that the concept of pathology should not be “hostage to evolutionary enquiry”. In this talk I disagree, and discuss some ways in which recent advances in biology ought to change ideas about about health and disease. In particular, the health of an organism and the quality of its environment are entwined conceptually, as well as causally.

Alexander Hapfelmeier: “Exploratory Subgroup Analysis by Recursive Segmentation”

The identification of patient subgroups with notable clinical outcome is a matter of high concern in the analysis of clinical data. For example, prolonged or shortened survival could be observed for patients with specific characteristics and biomarkers under no or standard care (= prognostic) or under a certain treatment (= predictive). A method that aims at the identification of such subgroups and provides an explicit subgroup definition is the Patient Rule Induction Method (PRIM). However, it requires a lot of user involvement and prior knowledge about the focused subgroup (i.e., subgroup size and average outcome), which may be hard to provide. Alternatively, recursive partitioning (trees) is a fully automated method but only indirectly defines subgroups through the description of the conditional distribution of the outcome. A new method called 'recursive segmentation’ is proposed to combine the advantageous features and to lessen the deficiencies of the aforementioned algorithms. It is essentially a PRIM-like method for exploratory subgroup analysis/identification based on iterative application of recursive partitioning. Therefore, it does not require prior-knowledge, directly aims at subgroups with notable outcomes, and is embedded in the elaborate framework of recursive partitioning. The latter includes methods such as CART, conditional inference trees and the globally optimal evolutionary trees. Simulation and application studies show that recursive segmentation is able to uncover relevant subgroups with models of decreased

Bennet Holman: “Pharmacology: An Asymmetric Arms Race”

The significance of the 1962 Kefauver-Harris amendments, passed in response to the Thalidomide disaster, is typically seen as updating the regulatory mandate in the USA to include considerations of effectiveness. Because the American model was adopted in nearly every developed country by 1978, understanding this development in pharmaceutical regulation is indispensable for understanding current practices in assessing whether new drugs should gain market entry.

I first discuss this from the perspective of the history of pharmacology to show that the standard account makes both errors of commission and omission. First, I show that pharmacologists at the FDA were already taking efficacy into consideration prior to 1962. I then argue that the true significance of the legislation was mandating that decisions of safety and efficacy must be made by “experts qualified by scientific training and experience,” to declare (by regulatory practice) that pharmacologists (rather than doctors) were the qualified experts, and to codify the three phase trial as the required evidential standard. Additionally, I shall argue that far from being merely a sophistication of research methodology, the act was specifically constructed in response to undesired promotional methods and the resulting erosion of the reliability of drug evaluation.

Having established these historical points, I will turn to what I take to be there philosophical import. Namely, that any revision to scientific methodology is part of a dynamic process that affects both epistemic and commercial interests. Thus, any methodological alteration that significantly reduces profits of pharmaceutical companies (as the 1962 legislation did), in turn creates a huge financial incentive for pharmaceutical companies to counteract the effect of the reforms.

I argue that this dynamic is best conceived of as an example of what game-theorists call an "asymmetric arms race". Such a dynamic is typified by a series of moves and countermoves between competing parties who are adjusting to one another's behavior, in this case between those who seek to make medical practice more responsive to good evidence and those whose primary motivations are instead commercial in character. This model stresses that in addition to the standard philosophical worry of avoiding inferential errors, reformers must also seek to minimize manipulation.

In closing, I show how the frustration of commercial aims by the passage of the 1962 legislation has led to another major aspect of the current landscape of professional pharmacology: The contract research organization (CRO). Studies show that evidence produced by CROs is more methodologically rigorous (by standard measures of assessing rigor), but systematically favors the interests of industry (and thus plausibly less epistemically reliable). Just as the 1962 legislation was measure to frustrate commercial ends of industry that were not serving the veritistic aims of reformers, the rise of CROs is a countermeasure to circumvent these restrictions. Assuming arms race model is correct, an appreciation of this dynamic becomes crucial for any evaluation of evidence or comparison of inferential regimes, as it sets the stage for understanding the current state of play and provides new criteria for assessing potential

Phyllis Illari: "Who´s Afraid of Mechanisms?"

The aim of this paper is present the core motivation for attempting to characterise quality of evidence of mechanism linking C and E, when attempting to establish whether C causes E in the health sciences. I will also begin to identify a useful approach to such a characterisation.

The first part of the paper will explain why one might – epistemically – value evidence of mechanism linking C and E, as complementary to evidence of correlations between C and E in populations, such as is gained from Randomised Controlled Trials or observational studies. Based on the work of Clarke, Illari, Gillies, Russo and Williamson (2014), the crucial idea is that evidence of mechanism helps address the major weakness of evidence of correlation, i.e., the problem of confounding, or the possibility that C and E are in fact common effects of a third variable, D. In reverse, if you are unsure whether the effect of the mechanism you have identified might be ‘masked’ by the effects of unidentified mechanisms also linking C and E – the major weakness of evidence of mechanism – seeking evidence of a correlation in a population. This means evidence of both correlation and of linking mechanism is complementary in an important way.

The second part of the paper will then focus on two sources of scepticism about evidence of mechanism. The overriding aim is to identify important insights to inform a positive characterisation of evidence of mechanism. The first source is the startling absence of evidence of mechanism from most current medical evidence 'hierarchies', with particular reference to the GRADE (2009) system of the National Institute for Care and Health Excellence in the UK. I will examine a mixture of political, historical, and methodological concerns, to disentangle two important lessons: evidence of mechanism should not mean either a mere story about mechanism, which is merely a hypothesis, or knowledge of a complete mechanism, which we almost never have. The second source is various pluralist philosophers, of which I focus on Dupre (2012, 2013). Again, I disentangle two insights: that a characterisation of evidence of mechanism must not be vacuous, nor rigid in ways inappropriate to the life sciences.

In the final part of the paper, I use the four insights to criticise Craver’s (2007) mutual manipulability account of constitutive relevance, interpreted as a story about mechanism discovery. I argue instead that a heuristic approach, deriving from Bechtel and Richardson (1993) is valuable, and indeed offers a reinterpretation of other work by Craver (2006), and by Darden (2006).top

Daria Jadreškić: "Some Social Aspects of The Discovery, Synthesis and Production of Cortisone in the 1930s-1950s"

Since their first medical use in 1948, cortisone and its synthetic analogues remain one of the most widely prescribed medications in the world. Cortisone belongs to the group of steroid hormones of the adrenal cortex, first isolated from bovine glands in the 1930s by biochemist Edward C. Kendall in the US and, independently, Tadeus Reichstein in Switzerland at about the same time. Physiologic research of an unknown therapeutic agent began in 1929 at the Mayo Clinic with rheumatologist Philip S. Hench who hypothesized the emergence of an antirheumatic substance in conditions such as jaundice and pregnancy. It was in 1941 with the American involvement in World War II when adrenal research became an internationally competitive effort. In 1942 Lewis H. Sarett from Merck and Company worked in Kendall's laboratory with the goal of developing large-scale synthetic methods for the compounds that were chosen for the initial studies because of their structural simplicity. When federal funding waned, as the effectiveness for war-related uses became doubtful, only Kendall and his associates at Mayo and Merck continued the research. Lewis Sarett synthesised compound E (cortisone) in 1946 in 37 steps and by 1948 sufficient dosage was available for clinical testing conducted by Hench. It showed great success in patients with rheumatoid arthritis and subsequent tests were equally successful in reducing inflammation. It is now known that intracrine metabolism of cortisone to cortisol sustains local amplification of glucocorticoid action at sites of inflammation throughout the body.
Discovery, synthesis and therapeutic application of cortisone present a paradigm for modern translational medicine.
Edward Kendall noted in his 1950 Nobel lecture that there had been one good reason why the manufacture of cortisone should be expanded and five reasons why it should not. The one good reason was the demand for cortisone, and opposed to this were (1) the uncertainty of the future, (2) the uniqueness: no compound as complex as cortisone had ever been made on a factory scale, (3) the cost, (4) the patent situation, and (5) the quest for possible alternatives. In this paper I will focus on the conditions that made this basic/applied/clinical interface possible: the rise of steroid chemistry, simultaneous individual accomplishments, continuous cooperation among scientists, military competitiveness, and cooperation among pharmaceutical companies. I will contextualize the listed reasons and explain the social background of the cortisone

Christoph Jansen and Thomas Augustin: "Probabilistic Evaluation of Preference Aggregation Functions"

One of the big issues in Social Choice Theory is the question of how to combine the preferences of a group into one "fair" social preference.

Unfortunately, Kenneth Arrow's (in)famous result from 1951 startlingly demonstrates the impossibility of determining general rules for aggregating the individual preferences of a number of group members (such as medical experts discussing a plausibility ranking of a number of possible diagnoses) into a group preference consistently with a set of three (very appealing) axioms (namely "Pareto's principle", "Independence of irrelevant alternatives" and "No dictatorship").

Eversince, a lot of effort has been put on avoiding this impossibility. Here, mainly two different classes of approaches were followed: Restricting the domain of the aggregation rule on "feasible" preference profiles (e.g. on single-peaked preferences or preferences in accordance with an expected utility model) or weakening/modfying the axioms by taking into account some underlying cardinal notion of utility (e.g. value difference functions).

In our contribution, we intend to follow a different approach: Taking the (potentially) ordinal structure of the preferences and the free will of the involved decision makers seriously, we avoid making any restricting assumptions concerning the domain of the aggregation rule or the cardinal structure of the individual preferences. Instead, we set up a probabilistic model on the space of all possible preference profiles that is driven by the estimated (data or experience-based) degree of similarity (with respect to some similarity measure). Using this model then makes it possible to evaluate and compare different preference aggregation functions via their expected similarity to the individual preferences. In this way, our approach allows us to explicitly take into account additional information on the correlation structure on the space of preference profiles.

Against this background, we first discuss the appropriateness of different (known as well as new) similarity measures for preference profiles (e.g. (generalized) intersection-similarity, leave-one-out similarity) and propose a set of axioms characterizing some minimal requirements for such similarity measures.

Afterwards, we apply our model in order to compare some common preference aggregation functions known from social choice theory (e.g. Borda election, Condorcet election) and demonstrate how the quality of an aggregation function depends upon the underlying similarity structure. Particularly, this allows us to choose between different aggregation functions by additionally taking into account subject matter

Jürgen Landes: "A Multi-Criterial Approach to Amalgamating Evidence"

Evidence in medicine has been a hot topic in recent years. Special interest has been paid to the questions "What constitutes evidence in medicine?", "How does one aggregate evidence for medical decision making?" and "What kind of evidence is most important?". This talk introduces a ranking method for comparing health interventions termed HIerarchical Decision AiD (HiDAD) and addresses, to various degrees, all these three questions.

Based on a hierarchical understanding of evidence and a multi-criteria decision perspective, HiDAD facilitates the amalgamation of an entire diverse body of evidence. Scope and limitations of HiDAD will be discussed.


Yang Liu: “Choice is a Wild Card”

One controversial subject in the philosophy of probability concerns the issue of whether or not it is meaningful for a Bayesian decision maker to assign subjective probabilities to his or her own pending actions. I tend to agree with Gaifman (1999), Ismael (2012), Levi (1996), Price (2012), Spohn (1977, 2012), among others, in saying that it is not. Call this the standing view. As it is widely acknowledged, Bayesian subjective interpretation of probability is behavioristic in character, where numerical probabilities (and utilities) are usually constructed through a systematic representation theorem, where the agent’s preferences over various actions are represented by their expected utilities. An important aspect of this construction involves how the agent’s beliefs are viewed from a different/third-person perspective: the agent’s probabilistic estimations (and utility judgments) are often said to be elicited from his or her coherent choosing. This of course does not mean that the agents cannot apply this mechanism to themselves. In ascertaining her genuine belief about, say, tomorrow’s weather condition, an agent may take a detached view and entertain a series thought experiments to see what maximum price pX she is willing to pay to enter a bet which pays $X if it rains and nothing if not. Her probabilistic belief p about the weather condition can be read off from her opinion about the bet just as a third person would. However, this betting method breaks down when it comes to predicting her own pending actions (Levi, 1996; Spohn, 1977). The key difference between the two scenarios is that, unlike in the first case where the weather condition is external to both parties, the agent’s pending actions are internal and under her full volitional control. There is a clear asymmetry between the two epistemic stances when it comes to evaluating the agent’s own performances, in which case the agent can no longer take a detached/third-person view in assigning probabilities to her pending actions, at least, not in the classical Bayesian framework. In other words, the traditional third-person betting interpretation of probability does not apply to the agent’s own actions. The epistemic asymmetry hence creates a probabilistic gap for first-person choice credences. Choice is indeed a wild card. In this talk, I will provide a brief survey of the debate with a focus on the phenomena of self-reference and first-person/third-person asymmetry involved in the context of rational decision making. I will also attempt some responses to the criticisms against the standing view given by Hájek (2015); Rabinowicz (2002).


  • Gaifman, H. (1999). Self-reference and the acyclicity of rational choice. Annals of Pure and Applied Logic 96(1-3), 117 – 140.
  • Hájek, A. (2015). Deliberation welcomes prediction. Manuscirpt.
  • Ismael, J. (2012). Decision and the open future. In A. Bardon (Ed.), The Future of the Philosophy of Time, pp. 149–168. Routledge.
  • Levi, I. (1996). Prediction, deliberation and correlated equilibrium. In The Covenant of Reason : rationality and the commitments of thought, Chapter 5. Cambridge University Press. 1997.
  • Price, H. (2012). Causation, chance, and the rational significance of supernatural evidence. Philosophical Review 121(4), 483–538.
  • Rabinowicz, W. (2002). Does practical deliberation crowd out self-prediction? Erkenntnis 57(1), 91–122. Spohn, W. (1977). Where Luce and Krantz do really generalize Savage’s decision model. Erkenntnis 11(1), 113–134.
  • Spohn, W. (2012). Reversing 30 years of discussion: Why causal decision theorists should one-box. Synthese 187(1), 95–

Daniel Malinsky: "Decision Making under Causal Uncertainty"

Much of the mathematical machinery of causal effect estimation has been developed with the intent, at least in part, of guiding decisions about interventions. Which interventions are worth doing, and what can we expect of the results? These questions are particularly salient in policy-related sciences - including pharmacology, epidemiology, economics, sociology, the environmental sciences, and various other areas - where policy goals can be achieved by multiple different interventions, each of which has a different associated cost. Over the years various techniques have been developed for estimating causal structure from observational data, which can subsequently be used to make point estimates for causal effects of interest.

This paper discusses the idea of causal uncertainty, as distinct from the more familiar statistical uncertainty. Usually statistical uncertainty for some quantity of interest is represented by confidence intervals or, if Bayesian techniques are used, by credible intervals. In both cases, the uncertainty should decrease as the number of observations increases. Causal uncertainty is different, because it does not typically decrease with larger sample-sizes; causal uncertainty is a ubiquitous feature of causal inference because of the underdetermination of causal structure by observational data.

We may distinguish between two kinds of causal uncertainty which crop up in learning causal structure from observational studies: uncertainty in measured structure (UMS) and uncertainty in hidden structure (UHS). UMS arises because conditional independence facts can usually only identify a Markov equivalence class of causal structures over the measured variables -- some causal facts like whether X is a cause of Z or whether Z is a cause of X will be impossible to disambiguate from the observed statistical independencies. UHS arises because in many domains researchers have substantial uncertainty about hidden causal mechanisms, i.e., factors which are not measured but which are causally relevant to multiple variables of of interest. Sometimes, we can identify the presence or absence of hidden confounders from conditional independence information, but often we can only learn that some variable X is a cause of Y, without being able to rule out the possibility that there is also some hidden factor U which is a common cause of both X and Y. This hinders estimation of intervention effects, because the data cannot isolate the strength of the causal effect of X on Y from the correlation possibly induced by U.

So, causal uncertainty leaves open the possibility of a range of different values for the causal effect of X on Y, and the range does not shrink with larger sample sizes because of a fundamental underdetermination problem. Which causal effect estimate should a researcher report then? I will consider what kind of decision rule(s) may be appropriate in the face of causal uncertainty. I argue that reasoning from indifference to “average” causal effects is a bad idea, because such a strategy depends on exactly how the researcher defines the space she is indifferent over. Alternatively, I consider a kind of “worst-case” reasoning which has some attractive properties in the context of cost-benefit

Mousa Mohammadian: "From Peirce´s Abduction to Lipton´s Inference to the Best Explanation: How Two Historical Developments Fill the Gap"

The relationship between Peirce’s abduction and Lipton’s inference to the best explanation (henceforth IBE) has been viewed in opposite ways. Some have argued that they are basically the same and others think they are utterly different. I think that neither of these extremes is true. In this paper I argue that Lipton’s IBE is the natural result of Peirce’s abduction after two historic developments in the philosophy of science, i.e., emergence of the notions of underdetermination and the distinction between the context of discovery and the context of justification. The paper consists of four sections.

§1. I briefly discuss three main interpretations of Peirce’s abductive inference:

I. Abduction is a way of discovering new explanatory hypotheses.
II. Abduction is a method of justification of explanatory hypotheses.
III. Abduction leads to judgments about the comparative pursuitworthiness of rival explanations.

§2. I propose my own reading of Peirce’s abduction. I argue that abduction is a double-phase inferential process. Phase one is inventing candidate explanations for the observed phenomenon. This is done by what Peirce calls insight and describes as a mysterious inborn mental faculty. Phase two is ranking the invented explanations based on economic considerations for further studies and tests. Then it is time for induction, i.e., empirical testing (and probable rejecting) of the explanation with the highest rank which is continued until we find the unique true explanation.

§3. Here I show that:
1.My view, with some qualifications, embraces the first and the third abovementioned interpretations. In particular, I argue that
1.1. Reichenbach’s view of discovery—mentioned in his famous ‘context of discovery’—is to a great extent similar to the phase one of Peirce’s abductive inference.
1.2. The goal of phase two—i.e. ranking candidate explanations—is determining the comparative pursuitworthiness of rival explanations.
2.Justification is the function of induction and cannot be done by abduction at all.

§4. I argue that by introducing Reichenbach context distinction and Duhem-Quine underdetermination of theory by empirical data to Peirce’s inductive inference, we naturally come to Lipton’s IBE.
Reichenbach famously argues that the context of discovery is a subject-matter of psychology but not epistemology. Now if we admit his context distinction and its immediate ramification, and if I am true that phase one of Peirce’s abduction corresponds with Reichenbach’s context of discovery, then it should be eliminated from epistemology of science altogether. This is exactly what happens in Lipton’s IBE where no discussion about the invention of explanations can be found.
And if we introduce the problem of underdetermination to Peirce’s account, empirical test cannot eliminate all rival explanations but one. Therefore, in contrast with what Peirce imagined, having the highest rank and passing the empirical tests are insufficient for an explanation to be adopted as the only empirically adequate explanation. It also needs to be the best explanation—whatever “best” means here—among all its empirical equivalents.
Thus, after introducing the context distinction and the underdetermination, and only after that, abductive inference becomes an inference to the best

Barbara Osimani: “Causal Inference in Pharmacology: Towards a Framework for Evidence Amalgamation”

Philosophical discussions on causal inference in medicine are stuck in dyadic camps, each defending one kind of evidence or method rather than another as best support for causal hypotheses. Whereas Evidence Based Medicine advocates invoke the use of Randomised Controlled Trials and systematic reviews of RCTs as gold standard, philosophers of science emphasise the importance of mechanisms and their distinctive informational contribution to causal inference and assessment. Some have suggested the adoption of a pluralistic approach to causal inference, and an inductive rather than hypothetico-deductive inferential paradigm. However, these proposals deliver no clear guidelines about how such plurality of evidence sources should jointly justify hypotheses of causal associations. In this paper, we develop the pluralistic approach along Hill’s (1965) famous criteria for discerning causal associations by employing Bovens’ and Hartmann’s general Bayes net reconstruction of scientific inference to model the assessment of harms in an evidence-amalgamation

Wolfgang Pietsch: "A Causal Account of Analogical Inference"

There is widespread skepticism, whether analogical reasoning constitutes more than a heuristic tool for hypothesis generation. By contrast, I argue in my proposed contribution that a difference-making approach to causation naturally supplies an epistemological basis for analogical inferences and in particular implies a similarity measure that is superior to those discussed in the literature so far.
I rely on John Maynard Keynes’ terminology of positive, negative, unknown, and hypothetical analogy (1921, Ch. XIX; cp. also Bartha 2013, §2.2). Also, I work within a two-dimensional framework for analogy as endorsed by several authors, introducing the important distinction between horizontal relations accounting for similarities in circumstances between source and target, and vertical relations concerning different types of relationships between those circumstances, e.g., causal or deductive (Hesse 1966, Bartha 2010, Norton 2011).
Arguably, the main epistemological challenge of analogical reasoning consists in establishing an adequate measure of similarity between target and source phenomena. In fact, many authors claim that the unreliable nature of analogical inferences directly derives from the contextual and subjective nature of similarity itself. Remarkably, all major accounts of analogy differ with respect to the choice of similarity measure. Sometimes, the number of corresponding properties is counted and weighted against the number of differing properties. As another example, the influential structure-mapping theory of Dedre Gentner suggests a measure in terms of structural similarity (1983).
I argue that the measure of similarity implied by the difference-making account of causation, as sketched in Pietsch (2015, Sec. 3), is superior to these and other approaches at least in the context of causal vertical relations. The crucial advantage is that this difference-making account builds on well-defined notions of causal relevance and irrelevance of circumstances, allowing for a substantial amount of objectivity. Note that both concepts must always be related to a context or background of constant conditions. In its simplest version, the difference-making account works under the assumption of determinism. Then, an analogical inference essentially holds if the negative analogy (i.e., the conjunction of all circumstances therein) is causally irrelevant with respect to a background constituted by the positive analogy.
Various complications can arise. For example, it may not be fully known whether the negative analogy is irrelevant. Then, the analogical inference will be valid only with a certain probability. Furthermore, there may be an unknown analogy. If this unknown analogy is known to be irrelevant with respect to B, then the analogical inference will of course be valid. If the negative analogy is irrelevant, but the unknown analogy is known to be relevant, then the analogical inference will be valid with the probability that the unknown analogy (or at least the relevant circumstances therein) belongs to the positive analogy. To guarantee a certain amount of reliability for such probabilistic inferences, an objective notion of probability is employed that stands in the tradition of writings by Cournot, Mill, and von Kries as well as modern authors like Strevens, Rosenthal, and Abrams among others.


  • Bartha, Paul. 2010. By parallel reasoning: The construction and evaluation of analogical arguments. New York: Oxford University Press.
  • Bartha, Paul. 2013. “Analogy and analogical reasoning.” Stanford Encyclopedia of Philosophy (Fall 2013 Edition). 
  • Gentner, Dedre. 1983. “Structure-Mapping: A theoretical framework for analogy.” Cognitive Science 7: 155–70.
  • Hesse, Mary. 1966. Models and analogies in science. South Bend, Il: Notre Dame University Press.
  • Keynes, John Maynard. 1921. A treatise on probability. London: Macmillan.
  • Pietsch, Wolfgang. 2015. “The causal nature of modeling with big data.” Philosophy & Technology. Online first:

Roland Poellinger: "Confirmation via In Silico Simulation"

In medicine and pharmacology, in silico trials have become an important means for prediction and represent a valuable alternative to standard experimental procedures (such as randomized clinical trials), when these are unethical, costly, or inaccessible for other reasons. Contrary to recent arguments advocating that (even Monte Carlo) computer simulations represent models simpliciter (cf., e.g., [Beisbart & Norton, 2012]), I will critically examine and build on the concept of analog simulation, as introduced in [Dardashti et al., 2015], to locate genuine hypothesis confirmation via in silico simulation within a Bayesian reconstruction of scientific inference. I will conclude with remarks on how to fruitfully utilize syntactic isomorphisms between computational models and target systems for scientific discovery.


Julian Reiss:  "In Defence of Statistical Minimalism"

It’s not a secret that statistical inferences are made against a backdrop of assumptions about the data generating process. Nor is it a secret that we seldom have good reason to believe that the assumptions are met in most observational contexts, especially, but not only, in the social sciences. Empirical researchers examining these contexts tend not to pay much attention to the dependence of the validity of inferences on the truth of the background assumptions, however. Worse, or so I shall argue, none of the techniques the literature offers to ameliorate the problem work. The two-fold goal of this paper is to draw attention to this problem and offer an alternative, statistical minimalism, that seeks to make valid (causal) inferences while relying on a minimum of statistical background assumptions.


Glenn Shafer: “Probability Judgement”

The mathematical theory of probability derives, historically and conceptually, from the idea of betting. Yet numerical probability judgements play a dominant role in the assessment of evidence. How do we get from betting to evidence? In this talk, I argue that evidence is linked to betting by the principle that probability judgements are valid to the extent that an opponent who is allowed to use them as betting rates cannot multiply the capital he risks by a large factor. This game-theoretic Cournotian principle underlies the use of experience in betting, the use of conditional probability, and the use of Dempster’s rule of combination for belief

Jan Sprenger: “Applying a Measure of Corroboration in Statistical Inference”

Testing a point null hypothesis is a standard inference technique in science. Although null hypotheses are often of high theoretical importance (Gallistel, 2009, “The importance of proving the null”, Psychological Review), classical statistics does not provide a measure of the extent to which they are corroborated by evidence. In previous research, I have demonstrated the impossibility of constructing such a measure in confirmation-theoretic terms if plausible adequacy constraints are to be satisfied (Sprenger, 2016, “Two impossibility results for Popperian corroboration”, forthcoming in BJPS). This follow-up paper takes a more constructive line and proposes a concrete measure of corroboration, based on partitioning the space of alternative hypotheses. This measure is defended axiomatically and subsequently applied as a measure of evidence in point null hypothesis testing. Furthermore, I argue that it provides a convincing resolution of Lindley's paradox and that it squares well with the role of effect size in theory

David Teira, Brendan Clarke, Maël Lemoine: "Taking the Risks of Testing Personalized Treatments"

For the last five decades, medical treatments have been tested by pharmaceutical regulators with randomized clinical trials (RCTs). Regulatory agencies, such as the American Food and Drugs Administration (FDA) or the European Medicines Agency (EMA), require two positive (phase III) trials as proof of the safety and efficacy of a treatment before patients are granted access to it . Phase III trials are large, often involving thousands of patients.

Molecular medicine is changing our very concepts of disease and cure and forces us to rethink the sort of regulatory standard that we expect treatments to meet in order to consider them safe and effective. A molecular diagnostic of the genetic aberrations in each individual tumor opens the door for targeted treatments: drugs that selectively inhibit the products of these altered genes (Schilsky, 2014). There are about a dozen such drugs available (Tursz and Bernards, 2015) and many more should come.

However, we are often speaking of evidence that comes from very few patients, as compared to phase III trials. Oncological treatments have been so far tested, like any other drug, in RCTs in which patients are usually not selected according to their genotypes. In addition, these phase III trials are not just large, but also long: they involve comparing a treatment with a standard alternative, following patients to a predesignated endpoint after the administration (overall survival, after 5 years).

Testing targeted therapies thus poses an epistemic dilemma for pharmaceutical regulators: should they stick to large and long trials, when there are so few patients to test targeted treatments? Or should they decide on the basis of quicker tests? We are going to maintain that small phase II trials provide enough grounds for regulatory agencies to grant advanced access to targeted treatments if we observe the following principles: (i) since a small size can amplify the effect of potential biases, we need to make sure that additional debiasing methods are incorporated to secure the impartiality of the test; (ii) we should restrict the access to the therapies to patients who have the proper biomarkers, under individual informed consent agreements about the possible side effects; and (iii) we later need to conduct larger trials to validate the advanced access.

The current system was designed to provide massive consumer protection at a point when our understanding of the biology of cancer was still relatively poor and statistical tests gave the only solid evidence about treatment effects. Nowadays, with targeted therapies, risks are hedged in a way that allows patients (if well informed) to make decisions for themselves, instead of deferring on pharmaceutical

Momme von Sydow, Niels Braus: "On Biased Contingency Assessment and Inner-Organizational Dilemmas of Personel Evaluation"

There are “inner-individual dilemmas” (von Sydow, 2015), where people seem to pursue local goals at the expense of their overall utility. To some extent, this resembles social dilemmas (in game theory, for instance, public good games) where several people focusing on their individual payoff may cause the deterioration of a group’s overall outcome In contrast to social dilemmas, differing interests of the individuals do play no role in inner-individual dilemmas. Therefore it is commonly assumed for the latter that the agents should simply optimize globally, but even in this cases there is evidence that they do not always do so. We are here concerned with an intermediate case of what we may call an “inner-organizational dilemma”. This is reminiscent of a social dilemma in involving several people, but in another respect it resembles an inner-individual dilemma, since often at least one individual who is in charge of heading or coordinating other individuals has the official goal to optimize globally. In our experiments, for instance, participants are in the role of human resource managers, who should be interested in rewarding workers who optimize the overall payoff of the company. If they work in the interest of the company, they should not promote those who optimize only their specific payoffs while ignoring the overall pay off. Linking to the cooperation and altruism debate, we explore whether the managers take into account that an employee who contributes individually less than most others, nonetheless may causally be responsible for high overall profit. The results of the experiments suggest that people often tend to focus on directly comparing individuals without considering the overall contribution to a group. This may lead to the tragic result that those employees may get rewarded or selected that are not good for a company but who are only optimizing their own individual payoff. Such phenomena may pose a central problem for optimal incentive structures (personnel evaluation), advancement of employees (personnel promotion) and job offers (personnel selection). Such phenomena may crucially affect a organization’s success. Additionally, we will briefly relate the concept of Inner-Organizational Dilemma to the debates on causal induction, self-regulation, and temporal

Momme von Sidow, Dennis Hebbelmann, Björn Meder: "How Causal Reasoning Can Distort Evidence"

Probability judgments between causal events are normally assumed to rely on correspondence-based induction. However, we investigate whether induction psychologically may be influenced by coherence-based causal inferences. It has been proposed that if reasoning plays a role in induction, a psychological validity of the assumptions of a Bayes nets approach (the Markov condition), should lead to a distorted interpretation of observed data (von Sydow, Meder, Hagmayer, 2009, von Sydow, Hagmayer, Meder, Waldmann, 2010). For probabilistic causal relations mismatches between structural implications and the available data may lead to distortions of empirical evidence. We have shown, for instance, that people may use the generative local causal relations A → B and B → C to infer a positive indirect relation between events A and C, despite data showing that these events are actually independent or even negatively correlated. We report experiments using sequential learning tasks and a betting procedure (Hebbelmann & von Sydow, 2014) and experiments based on overview formats (von Sydow, Hagmayer, Meder, 2015), where the violation of transitivity of a chain is caused by mixing different subclasses of events for which different relations hold. Our results demonstrate that at least in causal chains people are influenced by transitive reasoning, even if the data is

Christian Wallmann: “Three Methods for Solving the Problem of Inconsistent Marginals in Data Integration”

Data integration combines data from different studies by building a joint distribution of all variables that are measured in at least one study. A study on acute kidney disease may, for instance, measure clinical variables (age, gender, co-morbidities etc.), another may measure pathological variables (creatine testing, protenuria testing etc.) and a third variables from imaging procedures. The joint distribution yields correlations between all of these variables, even if they are measured in different datasets (cross-dataset correlations). One key obstacle for integrating data is what I call the problem of inconsistent marginal probabilities. Inconsistent marginal probabilities occur if variables occur in different datasets but their relative frequencies in the different datasets disagree. In this case, there exists no probability function that jointly satisfies all of the constraints given by the datasets. Such inconsistent datasets are very common. They may arise from different uncontrolled background variables in different studies, chance or pure differences in the sample sizes. For illustration, consider the following example: We write V DSi for the set of variables that occur in the i-th dataset. The probability Pi(Xk, Xj , ...) is estimated by the relative frequency of (Xk, Xj , ...) in the i-th dataset. Let V DS1 = {X1, X2, X3}, V DS2 = {X2, X3, X4}, V DS3 = {X1, X2, X4}. We aim to determine a joint probability distribution of all variables P(X1, X2, X3, X4). Ideally, we would use the joint probabilities P1(X1, X2, X3), P2(X2, X3, X4), P3(X1, X2, X4) to estimate P(X1, X2, X3, X4). However, both joints, P1(X1, X2, X3) and P3(X1, X2, X4), uniquely determine a marginal joint P1(X1, X2), P3(X1, X2). Most likely, P1(X1, X2) and P3(X1, X2) are inconsistent marginals and will disagree. Consequently, there is no probability function P that satisfies both constraints, P(X1, X2, X3) = P1(X1, X2, X3) and P(X1, X2, X4) = P3(X1, X2, X4). One approach to solve the problem of inconsistent marginals is to build a weighted average of the marginal joints P1(X1, X2) and P2(X1, X2). The joint distribution P(X1, X2, X3, X4) is then estimated from those new marginals. I call this approach the weighted marginal approach. The disadvantage of this approach is obvious: Moving to marginal joints P(X1, X2), we discard information about dependencies between variables. A second approach, the convex hull approach, considers the convex hull H of the (most likely mutually inconsistent) probability functions P ∗ 1 , P∗ 2 , P∗ 3 on (X1, X2, X3, X4) that satisfy P1(X1, X2, X3) = P ∗ 1 (X1, X2, X3), P2(X2, X3, X4) = P ∗ 2 (X2, X3, X4) and P3(X1, X2, X4) = P ∗ 3 (X1, X2, X4). The integrated joint probability function may then be, for instance, identified with the probability function that has maximum entropy within H. A third approach, the missing data approach, considers the whole field of data integration as a special case of missing data. Using, for instance, regression, we are able to estimate the value of X4 given the values of X2, X3 from Dataset1 and that of X4 given those of X1, X2 from Dataset3. By inputing these values, we obtain complete datasets for the joint P(X1, X2, X3, X4). In this talk, we assess the performance of the three approaches by simulating different scenarios with the help of computational

Jon Williamson: "Establishing Causal Claims in Medicine"

Arguably, in order to establish a causal claim one normally needs to establish both that the putative cause and putative effect are appropriately correlated and that there is some underlying mechanism that can account for this correlation. This paper argues that this thesis explains two key aspects of causal epistemology. First, it explains why statistical trials such as RCTs are not always required to establish causal claims. Second, the epistemological thesis explains how animal models can help establish causal claims in humans. More generally, it provides a framework for understanding how one can extrapolate causal claims from one population to another.