John just sent me a really neat paper entitled “Probable Probabilities.” The abstract should entice reading the full manuscript:

In concrete applications of probability, statistical investigation gives us knowledge of some probabilities, but we generally want to know many others that are not directly revealed by our data. For instance, we may know prob(P|Q) (the probability of P given Q) and prob(P|R), but what we really want is prob(P|Q&R), and we may not have the data required to assess that directly. The probability calculus is of no help here. Given prob(P|Q) and prob(P|R), it is consistent with the probability calculus for prob(P|Q&R) to have any value between 0 and 1. Is there any way to make a reasonable estimate of the value of prob(P|Q&R)?

A related problem occurs when probability practitioners adopt undefended assumptions of statistical independence simply on the basis of not seeing any connection between two propositions. This is common practice, but its justification has eluded probability theorists, and researchers are typically apologetic about making such assumptions. Is there any way to defend the practice?

This paper shows that on a certain conception of probability — nomic probability — there are principles of “probable probabilities” that license inferences of the above sort. These are principles telling us that although certain inferences from probabilities to probabilities are not deductively valid, nevertheless the second-order probability of their yielding correct results is 1. This makes it defeasibly reasonable to make the inferences. Thus I argue that it is defeasibly reasonable to assume statistical independence when we have no information to the contrary. And I show that there is a function Y(r,s,a) such that if

prob(P|Q) = r, prob(P|R) = s, and prob(P|U) = a (where U is our background knowledge) then it is defeasibly reasonable to expect that prob(P|Q&R) = Y(r,s,a). Numerous other defeasible inferences are licensed by similar principles of probable probabilities. This has the potential to greatly enhance the usefulness of probabilities in practical application.

The full manuscript can be accessed at http://oscarhome.soc-sci.arizona.edu/ftp/PAPERS/probable%20probabilities-simple.pdf . As always, discussion encouraged.

This looks like a really interesting paper. I’m just getting into it, but there’s one small correction that John might want to make on p. 2. He writes:

To illustrate, suppose we know that PROB(P) = .7 and PROB(Q) = .6. What can we conclude about PROB(P & Q)? All the probability calculus enables us to infer is that 0 ≤ PROB(P & Q) ≤ .6.We know a bit more than that. From the information given we can deduce that .3 ≤ PROB(P & Q) ≤ .6, which admittedly isn’t as much as we might like to have but is still more than John claims. The lower bound is firm because from {P, Q} we can derive the claim that (P & Q), and both premises are essential. The theorem on the accumulation of uncertainties therefore applies: the uncertainty (1 minus the probability) of (P & Q) cannot be any greater than the sum of the uncertainties of the premises.

Hmm … I’m having difficulty understanding this bit:

Suppose a problem is described by logical compounds of a set of simple propositions P1,…,Pn. Then to be able to compute the probabilities of all logical compounds of these simple propositions, what we must generally know is the probabilities of every conjunction of the form PROB((~)P1&…&(~)Pn). The tildes enclosed in parentheses can be either present or absent. These n-fold conjunctions are called Boolean conjunctions. Given all but one of them, the probability calculus imposes no constraints on the probability of the remaining Boolean conjunction.If I’m reading this right, then John is describing a partition in P1, …, Pn. But in that case, wouldn’t the probability of the “last one” be one minus the sum of the probabilities of the others?

Again, this is just a nit; his major point — that we can know quite a lot about the probabilities of individual propositions but still be largely ignorant of the probabilities of compounds built up out of them — is well taken.

John’s point, and Tim McGrew’s correction, generalize; for the reason that there is no single set of connectives for probability logics is that the probabilistic relationship(s) between two events, A, B, determine(s) the probability of the joint event A cap B. There is very little logical structure to provide guidance, and what little there is was hit upon in his correction. To illustrate, consider:

If A and B are

independent, then prob(A & B) = prob(A) x prob(B);If A and B are

mutually exclusive, then prob(A & B) = 0;If A and B are

positively correlated(If A entails B), then prob(A & B) = prob(A).If A and B are

negatively correlated(If A entails not-B), then prob(A v B) = min(1, prob(A) + prob(B)).Some interesting things follow:

(1) A pair of events may be mutually exclusive but not negatively correlated, so

there is no interdefinability of & and v in a probabilistic logic.

(2) Probability logic is

nota type of multi-valued propositional logic, since the connectives & and v cannot be characterized by the lattice properties of boolean & and v (i.e., & := min [f(A), v(B)]; v := max[f(A), f(B)], where ‘f’ is a generic valuation function; substitute ‘prob’ for ‘f’ here).(3) Two events A, B, can each have positive probability but the prob(A &B) = [0,1]; in words, we can have precise probability assessments for each of A and B, but be completely ignorant about the probability of (A and B).

From the point of view of a probabilist, there is a temptation to claim that there simply is no logical structure to probability logics and that the notion of a probability logic (progic) is nonsensical: to get anything out of calculations with probabilities, one has to make substantive assumptions about the relationships between the events we wish to reason about. In some cases these assumptions are warranted, in others they are not. This point (perhaps) crystalizes the difference between bayesian

statisticsand bayesianepistemology: it is not necessarily a failure of statistical analysis to admit that we don’t always have information about joint distributions.But the point about progics have no interesting struction is too rash, and Tim McGrew’s comment points out why. To generalize, for arbitrary events A, B, represented in a single probability structure M, if prob(A) and prob(B) are defined in M, then:

(i) prob(A & B) lies within the interval [max(0, prob(A) + prob(B) -1), min(prob(A), prob(B)], and

(ii) prob(A v B) lies within the interval [max(prob(A), prob(B), min(prob(A)+ prob(B), 1)].

These are weak bounds, and trvialize quickly; but, they are not necessarily trivial, and they are derived without any substantive assumption about the relationship between the events A and B other than that they be represented in the algebra M. (This bound holds for lower/upper probability defined on M, too.)

On second thought, change (3) to the open interval (0,1) and ‘completely ignorant’ to ‘almost completely ignorant’.

More on (2): Suppose binary operations & and v satisfy the following three laws:

L1

[commutative]A & B = B & A; A v B = B v A,L2

[associative]A & (B & C) = (A & B) & C; A v (B v C) = (A v B) v C,L3

[absorption law]A & (B v A) = A v (B & A) = A,then the ordering mentioned in (2) falls out, since L1-L3 define a lattice in terms of meet and join. Note that this is more general than the truth tables for boolean & and v, includes intutionistic logic, most multi-valued logics (that I know of). It does not characterize & and v in either linear or relevant logic, however.

And (pure) probability logics do not satisfy L3. (Exercise: Let A = 0.6, and B=0.7. All pairwise identities of L3 fail, using (i) and (ii).)

Moreover, and this is neat, notice how A & (B v A) and A v (A & B) smear the value for A. This is happening with *point* values. This is behavior similar to Seidenfeld and Wasserman’s dilation result, which is supposed to be a problem for interval-valued probability and conditionalization. But we get it here with precise values. Yes, we introduce intervals when we evaluate (A v B), and (A & B), respectively; but the reason we get to these values is not by embracing imprecise probability theory, but by insisting upon pure logical connectives.

We can avoid this by assuming independence, which most probability logics do because the system I’ve described here, though pure, is not terribly usefly. But, it does throw some light on the non-triviality of independence assumptions.

(3) is still not right. There is another point about working with sets of measures, convex sets of measure, or simply a single measure, but (3) starts out on the wrong foot to make that work out.