# Some Confirmation Measures

Here’s a problematic line of thought about the main competitors in confirmation theory and Bayesianism in particular, arising from Branden Fitelson’s nice paper here. Consider the following theories of confirmation. First, the law of likelihood preferred by likelihoodists such as Sober:

(LL) Evidence E favors hypothesis H1 over hypothesis H2 if and only if H1 confers greater probability on E than H2 does.

Second, he weak law of likelihood:
(WLL) Evidence E favors hypothesis H1 over hypothesis H2 if Pr(E |H1) > Pr(E |H2) and Pr(E |~H1) &#8804 Pr(E |~H2).

Third, an old-fashioned view:
(‡) E favors H1 over H2 if and only if Pr(H1 |E) > Pr(H2 |E).

We can eliminate the third one quickly.

But, (‡) is an inadequate Bayesian theory of favoring, because the underlying notion of confirmation on which it is based ignores probabilistic relevance. Consider any case in which E raises the probability of H1, but E lowers the probability of H2. Intuitively, this is a case in which E indicates that H1 is true, but E indicates that H2 is false. In such a case, it seems obvious that E provides better evidence for the truth of H1 than for the truth of H2. And, as a result, it seems clear that, in such cases, we should say that E favors H1 over H2. Obviously, any contemporary, relevance-based Bayesian theory of favoring will have this consequence. . . . Unfortunately, (‡) does not have this consequence. In fact, according to (‡), E can favor H2 over H1 in such cases, which is absurd. For this reason, nobody defends (‡) anymore.

Here’s a concrete example Branden uses. Where we sample randomly from the natural numbers between 1 and 10 inclusive, let E be the disjunction of 1,2,8,9,10, H1 be the disjunction of 1,2 and H2 be the disjunction of 2,3,4,5,6,7,8,9. Here Pr(H1/E) = 2/5, compared to Pr(H1)=1/5; Pr(H2/E)=3/5, compared to Pr(H2)=4/5.

So the issues now surround (LL) and (WLL).

For contemporary Bayesians, confirmation is a matter of probabilistic relevance. Thus, degree of confirmation is measured using some relevance measure c(H,E) of the “degree to which E raises the probability of H”. Here are the three most popular Bayesian relevance measures of non-relational confirmation:
Difference: d(H,E) =df Pr(H |E) − Pr(H)
Ratio: r(H,E) =df Pr(H |E)/Pr(H)
Likelihood-Ratio: l(H,E) =df Pr(E |H)/Pr(E |~H)

The general form of the Bayesian account of favoring, using any of these measures, is:
Evidence E favors hypothesis H1 over hypothesis H2, according to measure c, if and only if c(H1,E) > c(H2,E).

One other point to note before getting to my point(!). Jim Joyce has shown here that all these Bayesian confirmation measures commit their defenders to (WLL), thereby arguing that (WLL) is something like a core Bayesian commitment.

Given all this, consider two of Branden’s examples.

Let E be an ace, H1 is heart ace, and H2 is club or spade ace. It is clear in this example that E favors H2 over H1, but the likelihoods do not sustain this answer, since the probability of E on either H1 or H2 is 1. So (LL) is false. But note as well that Pr(E |~H2) = 4/50 > Pr(E |~H1) = 4/51, so (WLL) does not succumb to the example.

Consider a second example. E = the card is a spade, H1 = the card is the ace of spades, and H2 = the card is black. In this example, Pr(E |H1) = 1 > Pr(E |H2) = 1/2 , but it seems absurd to claim that E favors H1 over H2, since the evidence entails H2 but not H1. Note here that not only (LL) succumbs, but (WLL) as well.

Fitelson shows that the ratio version of Bayesianism is equivalent to (LL), so it is undermined by the first example. That leaves the difference and likelihood-ratio versions. The value of d in the first example is 1/4 – 1/52 for H1 and 1/2 – 1/26 for H2, leaving us to conclude that E confirms H2 more than H1, just as we should see. In the second example, the value of d for H1 is 1/13 – 1/52, and for H2 is 1 – 1/2, showing that E favors H2 over H1.

For the value for l, both examples cause problems, since the denominator value is zero in both cases, leaving the value for l undefined in both examples, and thus leaving it not true that the evidence favors either hypothesis over the other. Perhaps this problem could be solved by abandoning a Kolmogorov treatment of probability in favor of a R&#233yni-Popper theory. Even then, however, the second example will create trouble.

As a result, the two examples show more than that (LL) is false, undermining likelihoodism. They also show that (WLL) is false, and thus given Joyce’s result that (WLL) is the core commitment of Bayesianism, Bayesianism is in trouble, in a way independent of the usual complaints about Bayesian reliance on priors. How bad a conclusion is this?

#### Some Confirmation Measures — 14 Comments

1. Consider the first example of Branden’s addressed by Jon. Here it is again:
(I’ve fixed it slightly to make it clearer and more accurate.)

Let E be an ace, H1 is heart ace, and H2 is a club ace or a spade ace. It is clear in this example that E favors H2 over H1, but the likelihoods do not sustain this answer, since the probability of E on either H1 or H2 is 1. So (LL) is false. But note as well that Pr(E |~H2) = 2/50 < Pr(E |~H1) = 3/51, so (WLL) does not succumb to the example. (Note: not “4/50” and not “4/51” and I’ve changed the order of the inequality, since 2/50 < 3/51.)

The defender of (LL) may respond to this by arguing that it is not at all clear that E “favors” H2 over H1 on “purely evidential” grounds. Rather, E may intuitively seem to favor H2 over H1 because you are sneeking in information about the prior probabilities of H2 and H1. To see the point, suppose E comes from a deck consisting of 20 aces of hearts, 1 ace of clubs, 1 ace of spades, and an assortment of 20 other kinds of cards.
Then Pr(E |H2) = Pr(E |H1) = 1 still. But the intuition that E favors H2 more than H1 is broken. So, the defender of (LL) argues, even for a straight deck, E intuitively “favors” H2 over H1 only insofar as we understand “favoring” to involve “higher posterior probability” (which takes account of both likelihoods and prior probabilities). The obviously higher posterior probability in Branden’s example was corrupted your intuitions about how the evidence “on its own” favors one hypothesis over another. When the situation is changed so that the prior probabilities of H2 and H1 are different (e.g. if we don’t know the makeup of the deck, or we know the make-up is different than normal), there is no longer any good intuitive reason to think that E automatically favors H2 over H1. So (LL) has not been counter-exampled after all.
In any case, this is what a likelihoodist like Royall would say. I guess this is more a disagreement with Branden’s claim than a disagreement with Jon.

2. Jon also discusses a second example of Branden’s. Here is the example:

Consider a second example. E = the card is a spade, H1 = the card is the ace of spades, and H2 = the card is black. In this example, Pr(E |H1) = 1 > Pr(E |H2) = 1/2 , but it seems absurd to claim that E favors H1 over H2, since the evidence entails H2 but not H1. Note here that not only (LL) succumbs, but (WLL) as well.

One more point in defense of (LL) before going to the main issue about (WLL).
The likelihoodist defenders of (LL) need not be very impressed by this example. They are mainly after a criterion for measuring how much E favors a hypothesis over “competing” alternative hypothese — pairs of hypotheses that cannot both be true. But H1 and H2 in this example are compatible. So this example may not bother them much.

Does (WLL) really succumb here? I don’t see a problem.
Jon says:

For the value for l, both examples cause problems, since the denominator value is zero in both cases, leaving the value for l undefined in both examples, and thus leaving it not true that the evidence favors either hypothesis over the other. Perhaps this problem could be solved by abandoning a Kolmogorov treatment of probability in favor of a Réyni-Popper theory. Even then, however, the second example will create trouble.

How does example 1 cause a problem for measure l? Assuming a normal deck, we have l(H1|E) = 51/3 < 50/2 = l(H2|E).

In the case of example 2 we do have l(H2|E) “undefined” because of a 0 in the denominator. But there is an easy and obvious fix for this. (Notice that l(H|E) is only undefined when P(E |~H) = 0 — and this happens just when
P(H | E) = 1, provided that P(E)>0 and 1>P(H)>0). Notice that the measure l measures incremental support of H by E on a scale that begins at 0 and increases without bound. So the obvious move is to define l(H|E) = “omega”, which is assumed to be larger than any finite value — it is “maximal evidential support” (and will always correspond to P(H|E) = 1).

Now, with this fix, for example 2 we have
l(H1|E) = P(E|H1)/P(E|~H1) = 1/(12/53) = 53/12 <
“omega” = (1/2)/0 = P(E|H2)/P(E|~H2) = l(H2|E).
So E favors H2 over H1, as desired.

Does this take care of Jon’s worries, or am I missing something?

2. Jim, very nice comments here! And they teach me that I ought to proofread…

So, first on the proofreading. You’re exactly right about the fractions in the first case, and in the quoted passage where I say “examples” what I meant was “conjuncts”. Hard to see how I made that mistake, but there it is! I meant the two conjuncts of the right side of (WLL).

On to substance, though. Your reply on behalf of likelihoodism is very interesting, and it looks like a pretty good response to the example as stated. But why can’t we just change the example, then, so that E includes the information that the deck is an ordinary one from which an ace is drawn? After all, the fractions you get that cause the problem for (LL) assume a standard deck, and changing E in this way will raise the problem all over again, won’t it?

On the second example, I was hoping someone would suggest something like this! (And I was hoping to find out if some confirmation theorists were lurking here!) But I still have two worries. First, when we measure l using examples where H entails ~E, won’t we want that measure to differ from the measure when we have ~H entailing ~E? It worries me that we’ve now got a measure that identifies 1/0 with 0/1.

But maybe this misreads the proposal. Maybe the proposal is to reject the definition of l given and replace it with another definition, one which accepts the definition of l as given, except when it involves dividing by 0, in which case, it assigns a special value to l. Even if that’s OK, why isn’t the proposal ad hoc?

3. Jon,

You are right that the proposal about the measure l(H|E) is really a proposal to amend the usual definition. The idea is to replace:
l(H|E) = P(E|H)/P(E|~H) with
l(H|E) = P(E|H)/P(E|~H) if P(E|~H) > 0,
= ω if P(E|~H) = 0, where ω is understood to be greater than any real number.
This is, I think, what those who champion l usually have in mind anyway, though they usually don’t spell it out. It isn’t adhoc because as values of P(E|~H) approach 0 (for fixed values of P(E|H)), the ratios P(E|H)/P(E|~H) blow up towards infinity — and this is just a natural way of accomodating that.

The measure l is not a symmetric measure of the influence of E on H. It ranges from 0 (where E falsifies H) to ω (where E falsifies ~H) with its midpoint at 1 (where E provides the same incremental support for H as it does for ~H. So, to get a symmetic measure, it is common to employ the log of l rather than l itself. This measure has the value 0 when E provides the same support for H as for ~H. It has the value -ω when E falsifies H, and the value +ω when E falsifies ~H. This is just a re-scaling of l to a more symmetric scale.

4. Jon,

I take your suggestion about fixing the first example (so that it works against (LL)) to be this:

Let E be ‘the card drawn is an ace’, H1 be ‘the card drawn is the heart ace’, and H2 be ‘the card drawn is the club ace or the spade ace’, and let B say that the card is drawn at random from a standard deck.

Then P(E|H2 & B) = P(E|H1 & B) = 1. So according to (LL) E is exactly as much incremental confirmation for H2 as for H1, given B. This seems counter-intuitive. It seems intuitively that E should confirm H2 more than it confirms H1, given B.

The likelihoodist’s reply is that one’s intuitions get messed up here because it is easy for the intuitions to confuse “incremental confirmation” with the “total support” for H1 and H2, which is based on likelihoods together with prior probabilities. That is

P(E|H2&B) / P(E|H1&B) = 1 but,

P(H2|E&B) / P(H1|E&B) = [P(E|H2&B) / P(E|H1&B)] x [P(H2|B) / P(H1|B)]
= [P(H2|B) / P(H1|B)] = (2/52) / (1/52) = 2/1 = 2.

So it is not E that makes a difference — that makes the total degree of confirmation of H2 twice as large as the total degree of confirmation of H1. Rather, it is the prior probabilities that makes all of the difference. E itself provide no additional support for H2 over H1. For, before E was even taken into account we already had that

P(H2|B) / P(H1|B) = (2/52) / (1/52) = 2/1 = 2

i.e., we already had that H2 is twice as probable as H1.
So E makes no difference, just as (LL) claims!

What do you think?

5. Jon and Jim,

It’s great to hear such serious dialogue about this paper! Thanks for the terrific comments! I am working on a new draft of this paper now, so this is really useful for me (the paper will eventually appear in final form sometime early next year). Here are a few remarks for the thread.

The first issue Jon raises about what happens to l which E entails H is a good one, and Jim’s suggestion to extend the scale of l to the extended real line is a nice way to fix the problem. I was sloppy about this in the draft. What I prefer to do here is to use instead of l the following measure which is ordinally equivalent to l (and so is equivalent full stop as far as I’m concerned):

l*(H,E) = (Pr(E | H) – Pr(E | -H)) / (Pr(E | H) + Pr(E | -H))

This measure is on a [-1,1] scale, but is a strictly increasing function of l (it’s tanh[l/2] to be exact!). No extended real line is needed. This alternative to l was advocated by Kemeny and Oppenheim (in the context of providing a “Carnapian” inductive logic that is sensitive to relevance). I use l* elsewhere, but I decided just to use l here because it is simpler than l*, which makes for easier comparisons in this context. But, I will certainly clarify this in the final version.

The second point you guys discuss about the Leeds example is very important. Jim’s reaction is exactly what I should have said the likelihoodist would say. And, by the way, I am sympathetic to Jim’s way of putting it (I may have to crib that for the final version, Jim!). The fact that the priors do all the work there is why I think it’s not as compelling an example as the second example I discuss where (both intuitively and theoretically) it’s the entailment that’s doing the work and not the priors.

I should add in closing that the final version of the paper will have a much more forceful version of the dilemma I pose for Likelihoodism. I think I can now argue convincingly that EITHER Likelihoodism just IS a version Bayesian confirmation theory (and not the RIGHT one!), OR Likelihoodism and Bayesianism are *completely* incommensurable — i.e., no example could ever indicate that one of them is true and the other false (in EITHER direction!). The radical incommensurability in the second horn is what makes this a much stronger and more compelling dilemma for Likelihoodism. I’ll let you read the final version in December to get the full story on this. But, here’s a taste.

Likelihoodists often talk about “what the hypothesis says about E, relative to corpus K”. Bayesians who accept (LL) — Bayesian Likelihoodists — unpack this as quantity as Pr(E | H & K), which is a conditional probability and so IS invertible via Bayes’s Theorem (and, priors are crucial here). Non-Bayesian Likelihoodists, on the other hand, unpack this as “the probability for E that is ENTAILED by the conjunction H&K”. This is NOT a conditional probability. And, this is why non-Bayesian likelihoodists have no truck with priors, and do not NEED them in fact. They’re NOT talking about something that is governed by Bayes’s Theorem at all. One way to think about this is that Likelihoodists are really talking about a probability model “H&K” and what *un*conditional probability E has IN THAT MODEL. So, the “inverse” of this “probability of E” does not exist — there is just no inverse to take [here, see Royall’s official definition of (LL) — it’s NOT in terms of conditional probability!]. But, in this case, there is no example that could even serve to contrast Likelihoodism (in this strongly non-Bayesian sense) and Bayesianism (Likelihoodist or otherwise). The Likelihoodist model and the Bayesian model are incommensurable. What I argue in the current draft is only that IF Bayesianism and Likelihoodism can be contrasted, then it’s not obvious who wins, since examples that adjudicate are rather subtle and the salient considerations are tricky. But, I now think this only applies to the in-house debate between Bayesian likelihoodists (like Milne and Howson and Urbach, for instance), and Bayesian on-likelihoodists (like me and most Bayesians). I now am convinced that the extramural debate between Bayesians and non-Bayesian Likelihoodists is simply not amenable to rational adjudication (at least, not without a lot of new conceptual work to create a common space in which they can render conflicting judgments). Anyway, I’d better stop for now. Thanks again for the stimulating discussion!

6. Jim, very nice explanation, in both cases. And I like the real omega symbol! My only remaining worry about your explanation of the first case is the need for a language in which the claim “an ace is drawn from a fair deck” can be decomposed into a conjunction of the sort you give.

Matt McGrath sent me an email about (LL) and (WLL), about how to handle skeptical hypothesis. For simplicity, let E be evidence, and consider two hypotheses: H and E&~H. Matt says, “If H doesn’t by itself entail E, there is a danger of E&~H being better supported by E than H. I’m thinking, of course, of skeptical sorts of examples. Let H be some real world hypothesis which doesn’t entail E, where E states our perceptual evidence. Surely we want to say that E is better evidence for H than for the skeptical hypothesis E&~H.” Though I’d post his question to see if anyone wants to reply to it…

What a disturbing conclusion, though, Branden! I mean the one about the extramural debate not being amenable to rational adjudication. I can think of some ways in which this is true that wouldn’t trouble me–for example, if it means that there won’t be any particular cases that compel rational assent that one of the two perspectives is mistaken, which seems to be at least close to what you have in mind. Maybe the incommensurability won’t remain when you try to use the two theories in the more general account of what is confirmed by your total body of evidence? Or maybe the adjudication just has to proceed at a higher level, in terms of the theoretical virtues of the two approaches.

7. Jon,

On the McGrath case, I must be missing something. None of the relevance theories are consistent with E confirming E & ~H less strongly than H does. This is because E & ~H entails E, which implies that E and E & ~H are positively correlated under ANY Pr. And, E & ~H entails ~H, which means that H and E & ~H are negatively correlated. So, ALL relevance theories will say that E confirms E & ~H more strongly than H does. Is this intended as a criticism of BOTH (LL) AND (WLL)? The only theory that says otherwise (that I consider) is the one that compares posteriors. But, that’s why it’s a nonstarter. Perhaps I’m missing something here.

And, yes, on the Bayes/non-Bayes adjudication question, I think that a new theoretical framework will be needed to make useful comparisons between the two. As it stands, no single probability model can be used to adjudicate. That’s what I meant by my “no contrastive examples” claim (unless it’s Bayesian versions of both accounts we’re comparing, in which case there will be single probability models in which the two theories give conflicting judgments — like the Monty Hall Problem that I discuss in my paper). I don’t want to claim this can’t be done, but I think people who’ve been talking about the foundations of statistical inference in this context have just been talking past each other for years.

8. Jon,

To get the decomposition, let B say ‘the card is drawn at random from a standard deck’, let E say ‘the card is an ace’, let H1 say ‘the card is the ace of hearts’, and let H2 say ‘the card is the ace of clubs or the card is the ace of spades’. This takes care of what’s needed for the example I think.

Branden, I’m glad you found my “likelihoodist response” to be on target. Feel free to crib it! Also, l* looks like an interesting measure.

Jon, off the top of my head, Matt’s question looks closely related to the problem of irrelevant conjunction (which Branden and I have done some work on). I’ll have to think more about it.

9. OK, I see the McGrath claim now (I misread it before). It doesn’t say there are cases in which H confirms E & ~H better than E does. It says there are cases in which E confirms E & ~H better than E confirms H. This does sound like an irrelevant conjunction problem (Jim’s reading more carefully than me, as usual, here!). And, yes, with all proposals I discuss in the paper, this IS possible.

10. Very nice, guys, now I get to read some more–this time about irrelevant conjunctions! That’s neat; gives me something to do while I recover from my knee surgery…

Oh, Jim, I didn’t mean that I thought there was a problem here. I meant something like the Goodman worry that what’s atomic and what’s complex can be inverted, as in grue/bleen cases. The solution above works because our language treats the evidence I cited as a conjunction, so what I was wondering is if one could come up with a case for a language where the same kind of case would occur with respect to a claim that was atomic for that language. Not that I have such a case; I was just wondering out loud…

11. To give credit where credit is due: I first learned about the problem I mentioned to Jon in a paper by Michael Huemer. I just tracked down the cite for the paper: “The Problem of Defeasible Justification”, Erkenntnis 54 (2001): 375-97. It’s a very nice paper.

12. Jon, Branden, and Matt,

I think I gave you a bum steer. Although there are elements of the irrelevant conjunction problem in the problem posed by Matt (from Michael Huemer), that’s not the best way to look at it. It turn out to be a really interesting problem — and it has a very cool Bayesian solution. Formally it’s a bit complicated to spell out, but only a bit. Once you see it, intuitively it makes really good sense, I think.

Now that I’ve wetted your appetites, I’m going to leave you hanging.
I’ll write it up in the next few days, and then post it as a new thread. I hope that’s OK, Jon.

13. Jim, that’s great; I love to see posts by others, and it’s very good for the continuing development of the blog. It’s an important issue, too, since the problem is strongly connected with explaining how our ordinary evidence justifies us in believing that we are not in a demon world, or a brain in a vat.