It is generally agreed that quantum physics (quantum mechanics and field theory) is peculiar and unintuitive. Things just don’t happen the way we expect. Why is this? There are several reasons, but here I want to focus on one of the most important: how uncertainty is expressed in quantum physics.
Now probability is a measure of uncertainty that is familiar to us, so let us start with that. Before we can start thinking about probability, we need to establish what it is. Probability theory assumes that there are a set number of possible outcomes, and assigns a number to each of those outcomes which in some way is related to the confidence we have of that outcome being realised.
So we start with a set of states. For example, in a coin toss there are two states, heads and tails. When rolling a six sided dice, there are six states, numbered 1 to 6. Each of these states represents one possible outcome. These are examples of discrete states, but we can also expand probability theory to continuous states (for example, any real number) by borrowing the concept of the infinitesimal from calculus. The first axiom of probability is that we can describe the possible outcomes in terms of such a set of states. The next axioms are the properties that this set must have.
 The basis states are irreducible, i.e. we can’t split them into smaller, more fundamental, regions. For example, when rolling the dice, we could combine the possible outcomes into three sets: {1,2},{3,4} and {5,6}. We can perform computations using these sets, but they are not irreducible because they can be split up into more fundamental units, namely, {1},{2},{3},{4},{5},{6}. These outcomes are, however, irreducible because they can’t be reduced any further.
 The basis states are orthogonal, i.e. that there is no overlap between them. For the example of the dice, if we chose one set of results A as the outcomes {1,2,3,4} and the set of results B as the outcomes {4,5,6}, then it would be possible to be in both A and B at the same time (namely if we roll a 4), i.e. the two sets overlap, so they are not orthogonal. In classical probability, orthogonal states mean that you can’t be simultaneously found yourself two different states after the same ”roll” at the same time.
 The basis states are complete, i.e. every possible outcome is included. The set of results {1},{2},{3},{4},{5} is not complete, because it is also possible to roll a 6.
 The basis is unique, i.e. there is only one natural way in which we can describe the system. For example, the only way we can describe the possible outcomes is to say that there is one state representing heads, and the other representing tails. They are the only options.
We should ask how well this language idea of a limited number of possible outcomes and states fits in with different metaphysical systems. After all, we want to eventually apply probability theory to make physical predictions, so we are going to need to ask whether the assumptions the physics is built on are consistent with the premises of probability. These assumptions follow from our metaphysical thought.
Start with Aristotle. The idea of a set of basis states very nicely with Aristotelian thought. Aristotle's philosophy is also built around the idea that things can exist in several discrete states. These states are called potentia (the two terms are not synonymous  there is more to potentia than just a listing of states  but the potentia can be defined in so that they satisfy the properties we need for these basis states). Each potentia implies the observable properties of the being in that state. If you know the potentia, then you can compute the properties. Motion is movement from one potentia to another.
To give the example of the dice roll. Each possible final resting place of the dice is represented by one potentia. Each intermediate state after throwing it is represented by another potentia. So, in our hand before we roll it, the dice is in one potentia, which then moves through a sequence of different ones as it passes through the air, until it comes to rest in its final state. The motion of the dice through this sequence of potentia can be traced by understanding the natural tendencies of each potentia (i.e. if it is in one state, it has a tendency to move into another state in the next moment of time), and also its interactions with the air and the surface of the table and so on. Obviously in classical physics we would use Newton's laws to determine which potentia will be occupied by the dice next; for the Aristotelian that is just one possible way in which the details of actualisation can be described.
The Aristotelian can then take the various potentia that describe the dice at rest, set them together according to the number shown, and those are the basis states for the final outcome of the dice roll. He then would apply standard probability theory to that basis, and perhaps derive the probabilities based on his knowledge of how the dice was first thrown and how it was held.
Aristotle requires these states to be orthogonal  only one potentia can be actualised at any given time, and they need to be complete, so every possible motion is described by his system. He doesn't need them to be irreducible or unique. For example, he allows for substances to be compound. The potentia of a compound substance are not irreducible, because the substance contains simpler parts, but neither is it just the sum of its parts. The form of water is not just hydrogen and oxygen added together, but it has its own distinct potentia and properties distinct from the two gases. Equally, I don't see why Aristotle would need the potentia to be a unique basis (although obviously Aristotle would not have thought about things in this way, so we can't ask him). However, Aristotelian thought is not inconsistent with uniqueness and irreducibility; it just doesn't require them.
What about more modern philosophies? Empiricism immediately has difficulties, because these states are not directly observed from experiment. They are a theoretical construct, and empiricism denies the possibility of theoretical knowledge. Thus even if we construct such a basis of states as a bookkeeping convenience, we have no way to link them to actual properties of the real world. But without that link, we can have no confidence that any predictions we make would be applicable.
Nominalists are, I think, in even bigger trouble. To say that an object can exist in different states is to say that all these potentia are related together in some way. Together, they form a set. That set is a philosophical universal, since it links different objects of the same type. But universals are precisely what nominalists deny. Once again, the nominalist could say that we just introduce these possible states as a convenience, but the same problem of linking this convenience to the real world applies. Probability theory is used to make predictions, and those predictions are of the form "How many outcomes of a particular type will we find after we perform the experiment a large number of times?" We are grouping together outcomes into types; those types are universals, and that rules out nominalism.
The mechanical philosophy reduces the universe to just matter in motion. Thus there are certain fundamental corpuscles of matter which can be described by location, momentum, and maybe a few other properties. These corpuscles are seen as being indestructible, and compound objects are simply the sum of their parts. The laws of physics in mechanism are deterministic, and if one knew precisely all the locations and locomotions of all the particles in the universe at one moment in time, then one could in principle predict where they are at every other moment in time (however, since we don't know with certainty, there is still room for seeking probabilistic answers). All of nature is described by mechanical principles. Mechanism thus leads naturally to the axioms of completeness, uniqueness, irreducibility, and orthogonality. Where it struggles is with the whole idea of basis states altogether; a mechanist doesn't need to and therefore does not tend to think in terms of these states. The state notation is not inconsistent with the mechanical philosophy, but neither is it directly implied by it.
The idealist, who sees things perhaps as a giant computer simulation, is in the opposite difficulty to the empiricist. He has no trouble drawing up the list of states, but without some physical principle external to his intellect has difficulty saying what it means for one of the states to be occupied (or the potentia to be actualised). The intellectual system which the idealist states is all there is can't be complete; there needs to be something beyond it to make things actual.
So, of these five major philosophical systems, only the Aristotelian and the mechanical seem consistent with the underlying assumptions of probability theory we have discussed so far. They directly imply some of the premises we have come up with so far, and don't contradict the others.
In probability theory, we need to assign numbers to each state which in some way represent our certainty or uncertainty of that particular outcome coming to pass. So how do we assign a number to these states? This number is known as the probability, and it is defined using four axioms.
 The number is real (i.e. doesn’t have any contributions from the square root of minus one in it).
 The number assigned to each state is zero or greater.
 The sum of the numbers assigned to each state is one.
 The probability of achieving an outcome A or of achieving an B, where A and B are orthogonal states is the probability of A plus the probability of B.
These axioms lead to an intuitive result. Why? because they are also satisfied by a frequency distribution. Suppose that we roll the dice a large number of times, and record how many times we get each outcome. At the end of the exercise, we divide each number by the total number of rolls. This gives us a frequency distribution. The frequency distribution ultimately arises from counting things, which is something we naturally do.
The difference between probability and frequency is that probability is something we compute while frequency is something we measure. Some people, as is well known, confuse the two. They treat a probability distribution as the limit of a frequency distribution after an infinite number of measurements. There are many reasons why this is foolhardy, but to my mind the most important is that by confusing a probability distribution with a frequency distribution, they lose the ability to compute the distribution in advance. Frequentists have, by defining the term probability to be a virtual synonym of frequency, denied themselves the use of their reasoning abilities. We have two tools available to us: reason and experience (alternatively theory and experiment); and frequentists have basically said from the outset that it is impossible to obtain a theoretical knowledge. This is the philosophy of empiricism (or perhaps a particularly hard form of empiricism), and it is refuted by the simple observation that theoretical physicists have correctly calculated things before they were measured by experiment.
Now, if probability is a form of computation, it is a form of reasoning. Every chain of reasoning goes back to premises, axioms and definitions; and ultimately these can’t be proven by reasoning alone. So every probability is conditional upon the assumptions built into the model used for the calculation. Let us say we are tossing a coin. If we make the assumption that the coin is balanced, we will assign a probability of 0.5 to it coming up heads. If it is weighted in a particular way, we will assign a probability of 0.75. Both of these calculations of the probability are perfectly legitimate, even though they give different results. However, once we take a particular coin, it is in practice weighted in a particular way. We can measure how it is weighted, and calculate from this a probability for it coming up heads. We can then test our calculation (and therefore the premises it was built on) by performing the experiment a large number of times, and comparing our prediction against the result. More precisely, we calculate the probability of the measured result occurring under various different premises. Using a theorem, Bayes' theorem, we can then extract the probability for each set of premises for the nature of the coin given the observed set of measurements. Induction thus cannot lead to certainty that one set of premises is correct, but it can allow us to systematically express how certain we should be about them all. This allows us to eliminate numerous possibilities for all practical purposes.
Probability is thus a numerical measurement of uncertainty parametrised in such a way that it follows the same rules as counting states. And since counting is everybody’s first introduction to numbers and how most people naturally think about numbers, probability is thus a very natural and intuitive way of expressing uncertainty. Anything else would just be weird.
So let us consider the standard quantum experiment, the two slit experiment. We have a source S which emits a beam of particles, towards a barrier with two slits, A and B. The particles then hit a screen at position X. Our understanding of the initial conditions and the physics is denoted as p. It doesn’t matter for this example what that physics is. It could be Newton’s physics, Aristotle’s, Einsteins, or something entirely different; the only stipulation is that it accurately describes the paths of the particles.
Now if the particle passes through slit A, there is a certain probability that it will hit the screen at position X, which we denote as P(XAp). This is expressed as a probability rather than a certainty; maybe we are uncertain about the initial conditions or the mass of the particle or the forces acting on it, so we are not certain about the final result X. Equally the probability that the particle will hit X after passing through slit B is P(XBp). The probability that it will pass through slit A is P(Ap) and the probability that it will pass through slit B is P(Bp).
From Bayes’ theorem, we write


Add these two equations together, and we get,

We now use the definition of probability to write that,

where P(A∪BXp) denotes the probability that it passes through either slit A or slit B if it hits the screen at X given the initial conditions and physics denoted by p. But it must pass through either slit A or slit B, so this probability is just 1.
The end result of this is
 (1) 
which basically states that the probability of the particle hitting the screen at X is the sum of the probability of it hitting the screen at that location after passing through slit A and the probability of it hitting the screen at that location after passing through slit B.
Probabilities can be used to predict the frequency distribution. So suppose that we are firing 1000 tennis balls out of our source. Our screen consists of a series of buckets, and P(X) represents the probability that the ball goes into a particular bucket. Suppose that our theory predicts that half of them pass through slit A and half of them pass through slit B. We confirm these predictions by performing the experiment and counting how many balls pass through each slit. It comes close enough to a 5050 split that we believe our theory. Suppose further that the theory predicts that every 4 balls out of 500 that pass through slit A hit the screen at position X and every 10 balls out of 500 that pass through B hit the screen at position X. We confirm this by coving up slit A, firing 1000 balls and counting how many hit the screen at position X, and it comes close enough to the predicted number that we accept the theory. We cover up slit B, do the same thing, and again everything looks good.
Now we are ready to perform the actual experiment. Our theory predicts that on average we should expect to get 14 balls hitting X. We leave both slits uncovered, and get ready to do the experiment. Obviously, we don’t have that many samples (1000 isn’t a big enough number for probabilities and frequencies to converge), so we wouldn’t expect to get the predicted result if we just do the experiment once. So we will set up to perform the experiment several thousand times and take the average. That average should be close enough to the expected result of 14 that we confirm the theory.
And if we performed this experiment with tennis balls, we would get a result near enough to 14 that we believe the theory.
But now let us replace the tennis balls with electrons. Again, we predict and confirm that half the electrons go through slit A and half through slit B; cover up slit B and 8 out of a thousand electrons go into X, and so on. The set up works out in exactly the same way as with the tennis balls.
So, when we perform the full experiment, we confidently expect to get the result 14.
Except we don’t. We find that the answer is 6.
Now this is weird. 4 fewer electrons hit X when both slits are open than when slit B alone was open. It just doesn’t make sense to our intuition.
Could it be that there is some difference in the physics that describes electrons and the physics that describes tennis balls? Some difference hidden within the condition p that makes all the difference? Certainly the physics describing electrons is very different from the physics describing tennis balls. But that is completely irrelevant. We didn’t need to know what p was in the algebra above; and given that we have individually confirmed each of the numbers on the right hand side of equation (1) (to a good enough accuracy) no matter what p is, as long as P(Ap) = ½ and so on, our prediction for the fraction of electrons hitting the detector at X, P(Xp) cannot be affected. That number just comes from the mathematical laws of probability.
So what we have to conclude is that standard probability theory is wrong, or at least that it doesn’t apply in quantum physics. At least one of the assumptions of probability theory listed above is invalid. And this is why quantum physics is weird.
Quantum physics is still a mathematical representation of reality. We still want to represent our uncertainty for each possible outcome with a number. But that number is not a probability. It violates the axioms of probability. It is therefore not proportional to an expected frequency after a huge number of measurements.
Our intuition concerning uncertainty is based on counting descrete objects. We can’t really imagine anything else. We can’t imagine quantum physics. But we can still understand it intellectually, in the same way as always: we list premises and work through to conclusions.
So which of the axioms of probability listed above are violated in quantum theory? I’ll discuss that next time.
Reader Comments:
I had some further thoughts regarding your exposition of QM as a nonclassical way of representing uncertainty. To me the idea that the wavefunction represents our uncertainty in some way (though I must confess I know not what that means, except to say that we can calculate classical probabilities from it using the Born rule) seems to require that the wavefunction does not, fundamentally, reflect any objective property of the quantum system, but only reflects our knowledge in relation to the system. But this doesn't seem right. How can objective properties of a system (the energy levels of the hydrogen atom, for example) be determined by something which is fundamentally subjective, merely a representation of our uncertainty (the wavefunction obeying the Schrodinger equation)? Moreover, there is a thought experiment I have read that suggests the wavefunction is in fact objective (aha, found it, it is on page 10 of https://arxiv.org/abs/quantph/0408113).
So I think it is much more natural to view the wavefunction not as a representation of knowledge, but as a representation of the potentiality of the system to produce a given result if it were subjected to a "measurement," where the quotes there are because it's not at all obvious in the quantum case that such experiments reveal preexisting values. This potentiality is an objective property of the system. But that means that in contexts other than measurement, the states that probability theory directly applies to in the quantum case are not the various possible outcomes, but instead are the various possible wavefunctions that the system might have (along with any other components of the ontic state, e.g. values of hidden variables).
Incidentally, from an Aristotelian perspective the view of the wavefunction as representing potentiality seems to lead pretty naturally to Alexander Pruss' suggested "Travelling Forms" interpretation of QM  from the hindsight of having read his paper  where the forms constitute the one actual world out of the many branches of the wavefunction. (In fact, I think Pruss' interpretation can be refined to view the wavefunction as an abstraction from the potentialities of the substances that exist in the actual world, which significantly downgrades the ontological status of the empty branches. The result looks like something in between a Bohmian theory and a clarified version of the Copenhagen interpretation, interestingly.)
Objectivity and Subjectivity
My interpretation (at least as it has evolved over time; I am not sure how well I expressed it in these early posts) is a little more complex than saying that the amplitude is merely a parametrisation of our knowledge of the system. I would say that there are two places where it comes in. I would define the amplitude as basis + state. In each basis, there are numerous possible discrete states of the system. But there are also different possible bases (i.e. different possible ways of representing the system). An exact (i.e. nonsuperposition) state in one basis is a superposition in a different basis. Every superposition in one basis can be regarded as an exact state in a different basis. In classical physics, every possible state can be represented in terms of a single basis.
1) The real (objective) state of the system is expressed in terms of an amplitude (i.e. state + basis). Thus even if we had perfect knowledge, we would still need to use the state + basis or amplitude representation.
2) On top of this, we have our uncertainty of the states of the system, which is also expressed as an amplitude. In this sense, the amplitude's primary use is to predict the results of experiments.
The amplitude we use in our calculations is a combination of these two factors. Obviously in this post I have been concentrating on (2), and haven't really discussed point (1). How much of the amplitude is due to subjective uncertainty and how much due to the underlying objective state depends in part on which observable we are interested in. For example, when considering the location of the particle, it is primarily a matter of our own uncertainty (2). Spin, on the other hand, contains an element of (1): there are different noncommuting states in objective reality, as well as in our subjective uncertainty. So, once we have measured the particle to be in a particular spin state, it will be in that state, and objectively in a superposition of states along other axes.
I don't think that the thought experiment you mentioned contradicts this understanding. Point (1) allows the system to be in a definite quantum state in reality. We don't, of course, know what state it is in until we either perform a measurement or open the envelope containing the result, so until then we have both subjective uncertainty (2) and an objective quantum state (1).
The subjective uncertainty helps to understand EPR type experiments. My favourite example is when we have a particle decay into two entangled spin 1/2 particles. My explanation is that these are emitted in some objective quantum state, in the sense of (1). We don't know what that is, so before making any measurements, we also have type (2) uncertainty. That's true for both particles. Along any axis, we predict that there is a 50% chance of measuring it spin up and 50% chance of measuring it as spin down. We measure the spin of the first particle in some particular direction. The process of measurement forces the particle to be aligned along that axis, so the original state of the particle is lost. But the measurement still gives us some knowledge of the original wavefunction of our particle, and thus also of its entangled partner. And that changes our predictions for any experiments measuring the spin of that second particle. So in this type of scenario, the amplitude contains data of both types (1) and (2). It avoids having the state of the second particle instantaneously changing when we measure the state of the first particle, which doesn't make much sense to me but seems difficult to avoid unless we allow the amplitude to represent in part our "subjective" uncertainty of the system.
On the other hand, we can also consider the double slit interference experiment. Here the relevant information is only which slit the particle went through (The particle will also have an objective amplitude in the sense (1), but we don't have any knowledge of what that is, so our predictions are dominated by the subjective uncertainty). We don't know which path it took, so here we parametrise our uncertainty using the amplitude (as described by this post).
Those are only two examples, but I believe that the combination of (1) and (2), together with the idea that amplitude rather than probability is the way we ought to be parametrising uncertainty in a quantum system, can allow us to formulate a workable interpretation of quantum physics.
I put "subjective" in quotes above because that's not the language I would use. I would rather say that amplitudes are always conditional (analogous to conditional probabilities). We assume a set of initial data (maybe our knowledge of the initial state of the system), and calculate the amplitude that a particular result will occur. That amplitude is contingent on the data we put into the calculation. So in the EPR example, our prediction for the measurements of the entangled partner is "If there are two entangled particles, and if the measurement of the spin on the first one along some axis is spin up, then the amplitude for a spin up measurement of the spin of the second particle along this axis is X". One doesn't have to perform the experiment to make that sort of calculation. The statement we are making is not "The amplitude is X" but "The amplitude is X conditional on this prior data." That statement is objective. So the amplitude is both a statement of uncertainty and objective. (This is analogous to the interpretation of probability as an extension of logic; see https://plato.stanford.edu/entries/probabilityinterpret/#LogPro)
The amplitude refers to the wavefunction. The Hamiltonian operator itself, which describes the evolution of the wavefunction, is entirely objective. The energy levels of the system are determined by the Hamiltonian. They define the list of possible states, and all observers agree on this. The amplitude comes in when we try to identify which energy state is occupied by the electron at a given moment of time. We can probably figure this out in principle (although I can imagine that it will be difficult) if we have observed the photons emitted by the atom. So that, too, could be objective knowledge. Reading off the momentum from the energy eigenstate is harder, particularly in Schroedinger wave mechanics (which doesn't have Lorentz symmetry linking energy and momentum), but let's suppose that we can do this as well. But if we then tried to measure the location of the electron from the energy/momentum eigenstate, then we run into the problem that it is undetermined. We know it in one basis (the momentum basis), which implies that it is in a superposition in the conjugate location basis. There are two ways I might interpret this:
1) If we allow momentum states to be physical (rather than a bookkeeping exercise), then the electron is in a definite momentum state. When we attempt to measure the location, interaction (via decoherence) with the measuring apparatus forces it into a definite eigenstate of the location operator. This process is indeterminate, and the amplitude allows us to predict the likelihood for each option. [Of the two options, this is my preference.]
2) If we treat momentum states to be a useful fiction (with only location states corresponding to reality, since we only observe objects at distinct locations),
then the electron amplitude corresponding to each energy/momentum state parametrises our uncertainty corresponding to its actual location.
So with regards to whether it is more natural to regard the wavefunction as a representation of our knowledge or a representation of an underlying objective state (which is related to the potentiality to give a particular measurement outcome), my answer is that it plays both roles. The amplitude formulation is powerful enough to do that.
Post Comment:
All fields are optional
Comments are generally unmoderated, and only represent the views of the person who posted them.
I reserve the right to delete or edit spam messages, obsene language,or personal attacks.
However, that I do not delete such a message does not mean that I approve of the content.
It just means that I am a lazy little bugger who can't be bothered to police his own blog.
Weblinks are only published with moderator approval
Posts with links are only published with moderator approval (provide an email address to allow automatic approval)
against "nonclassical probability"
Hello Dr. Cundy, wondering if you would be willing to respond to a humble layperson's thoughts here...
It seems to me that the conclusion that quantum mechanics fails to obey classical probability theory is premature. My first thought is that the failure of classical probability is not just unintuitive but may be incoherent: similar to how the idea that classical propositional logic must be replaced by "quantum logic," where the usual rules of inference fail, seems to me to be incoherent. One can formulate alternative logical systems, but classical logic is always more fundamental  I don't see how one can get around the laws of noncontradiction and excluded middle as necessary truths. Analogously, classical probability theory seems to be more fundamental than "quantum probability theory;" when we want to compute outcomes, we always translate back into something that is in the language of classical probability theory by taking the squared absolute values of the quantum amplitudes. Moreover, we can reason about quantum mechanics with classical probability. (e.g. there can be situations where there is a certain probability that an electron has this wavefunction and a certain probability that is has a different wavefunction  these are what "mixed states" are, no?)
My second thought is that the argument you have made to infer that quantum mechanics does not obey classical probability theory ignores quantum contextuality. In your equation (1):
P(Xp) = P(XAp)P(Ap) + P(XBp)P(Bp)
The probabilities that you are adding up to get the theoretical result, 14, do not correspond to those written in (1). The probabilities that you are adding up are instead:
P(XAp')P(Ap') + P(XBp')P(Bp')
where p' is a different experimental setup than p, namely, one where we have a whichpath detector set up to see where the electron goes, or where you have one of the slits blocked to ensure that it goes through the other one.
In the quantum case, the probabilities that 4 particles hit X when they go through slit A and 10 particles hit X when they go through slit B aren't applicable to the experiment where we don't know which path the particles take, and this isn't because classical probability doesn't apply here, but because quantum theory doesn't make a prediction about which path the particle takes when there is no whichpath measurement being made. That sounds incredibly weird, but only because orthodox quantum theory is incredibly vague about what is going on at the microscopic level, speaking as if there is a particle travelling when really all that the mathematics talks about is a wavefunction.
You can instead look at a theory which does predict actual trajectories for the particles to see that classical probability theory works perfectly fine for the doubleslit experiment: Bohmian mechanics. Bohmian mechanics has unambiguous meanings for P(XAp), P(Ap), P(XAp'), etc and maintains a clear distinction between p and p'. And equation (1) is satisfied perfectly well by the values Bohmian mechanics gives to all those probabilities.
Now, I'm not commenting here to argue for something like Bohmian mechanics. Rather, my point is that it seems to me that the mere possibility of Bohmian mechanics clearly shows the contextuality of quantum mechanics, which I suggest is the real source of the discrepancy between expectation and experiment that you highlight here, rather than the failure of classical probability.
Thanks for your time!