The Quantum Thomist

Musings about quantum physics, classical philosophy, and the connection between the two.
What must we do?

Why is quantum physics so weird?
Last modified on Sat Jul 13 18:38:25 2019

It is generally agreed that quantum physics (quantum mechanics and field theory) is peculiar and unintuitive. Things just don’t happen the way we expect. Why is this? There are several reasons, but here I want to focus on one of the most important: how uncertainty is expressed in quantum physics.

Now probability is a measure of uncertainty that is familiar to us, so let us start with that. Before we can start thinking about probability, we need to establish what it is. Probability theory assumes that there are a set number of possible outcomes, and assigns a number to each of those outcomes which in some way is related to the confidence we have of that outcome being realised.

So we start with a set of states. For example, in a coin toss there are two states, heads and tails. When rolling a six sided dice, there are six states, numbered 1 to 6. Each of these states represents one possible outcome. These are examples of discrete states, but we can also expand probability theory to continuous states (for example, any real number) by borrowing the concept of the infinitesimal from calculus. The first axiom of probability is that we can describe the possible outcomes in terms of such a set of states. The next axioms are the properties that this set must have.

  1. The basis states are irreducible, i.e. we can’t split them into smaller, more fundamental, regions. For example, when rolling the dice, we could combine the possible outcomes into three sets: {1,2},{3,4} and {5,6}. We can perform computations using these sets, but they are not irreducible because they can be split up into more fundamental units, namely, {1},{2},{3},{4},{5},{6}. These outcomes are, however, irreducible because they can’t be reduced any further.
  2. The basis states are orthogonal, i.e. that there is no overlap between them. For the example of the dice, if we chose one set of results A as the outcomes {1,2,3,4} and the set of results B as the outcomes {4,5,6}, then it would be possible to be in both A and B at the same time (namely if we roll a 4), i.e. the two sets overlap, so they are not orthogonal. In classical probability, orthogonal states mean that you can’t be simultaneously found yourself two different states after the same ”roll” at the same time.
  3. The basis states are complete, i.e. every possible outcome is included. The set of results {1},{2},{3},{4},{5} is not complete, because it is also possible to roll a 6.
  4. The basis is unique, i.e. there is only one natural way in which we can describe the system. For example, the only way we can describe the possible outcomes is to say that there is one state representing heads, and the other representing tails. They are the only options.

We should ask how well this language idea of a limited number of possible outcomes and states fits in with different metaphysical systems. After all, we want to eventually apply probability theory to make physical predictions, so we are going to need to ask whether the assumptions the physics is built on are consistent with the premises of probability. These assumptions follow from our metaphysical thought.

Start with Aristotle. The idea of a set of basis states very nicely with Aristotelian thought. Aristotle's philosophy is also built around the idea that things can exist in several discrete states. These states are called potentia (the two terms are not synonymous -- there is more to potentia than just a listing of states -- but the potentia can be defined in so that they satisfy the properties we need for these basis states). Each potentia implies the observable properties of the being in that state. If you know the potentia, then you can compute the properties. Motion is movement from one potentia to another.

To give the example of the dice roll. Each possible final resting place of the dice is represented by one potentia. Each intermediate state after throwing it is represented by another potentia. So, in our hand before we roll it, the dice is in one potentia, which then moves through a sequence of different ones as it passes through the air, until it comes to rest in its final state. The motion of the dice through this sequence of potentia can be traced by understanding the natural tendencies of each potentia (i.e. if it is in one state, it has a tendency to move into another state in the next moment of time), and also its interactions with the air and the surface of the table and so on. Obviously in classical physics we would use Newton's laws to determine which potentia will be occupied by the dice next; for the Aristotelian that is just one possible way in which the details of actualisation can be described.

The Aristotelian can then take the various potentia that describe the dice at rest, set them together according to the number shown, and those are the basis states for the final outcome of the dice roll. He then would apply standard probability theory to that basis, and perhaps derive the probabilities based on his knowledge of how the dice was first thrown and how it was held.

Aristotle requires these states to be orthogonal -- only one potentia can be actualised at any given time, and they need to be complete, so every possible motion is described by his system. He doesn't need them to be irreducible or unique. For example, he allows for substances to be compound. The potentia of a compound substance are not irreducible, because the substance contains simpler parts, but neither is it just the sum of its parts. The form of water is not just hydrogen and oxygen added together, but it has its own distinct potentia and properties distinct from the two gases. Equally, I don't see why Aristotle would need the potentia to be a unique basis (although obviously Aristotle would not have thought about things in this way, so we can't ask him). However, Aristotelian thought is not inconsistent with uniqueness and irreducibility; it just doesn't require them.

What about more modern philosophies? Empiricism immediately has difficulties, because these states are not directly observed from experiment. They are a theoretical construct, and empiricism denies the possibility of theoretical knowledge. Thus even if we construct such a basis of states as a book-keeping convenience, we have no way to link them to actual properties of the real world. But without that link, we can have no confidence that any predictions we make would be applicable.

Nominalists are, I think, in even bigger trouble. To say that an object can exist in different states is to say that all these potentia are related together in some way. Together, they form a set. That set is a philosophical universal, since it links different objects of the same type. But universals are precisely what nominalists deny. Once again, the nominalist could say that we just introduce these possible states as a convenience, but the same problem of linking this convenience to the real world applies. Probability theory is used to make predictions, and those predictions are of the form "How many outcomes of a particular type will we find after we perform the experiment a large number of times?" We are grouping together outcomes into types; those types are universals, and that rules out nominalism.

The mechanical philosophy reduces the universe to just matter in motion. Thus there are certain fundamental corpuscles of matter which can be described by location, momentum, and maybe a few other properties. These corpuscles are seen as being indestructible, and compound objects are simply the sum of their parts. The laws of physics in mechanism are deterministic, and if one knew precisely all the locations and locomotions of all the particles in the universe at one moment in time, then one could in principle predict where they are at every other moment in time (however, since we don't know with certainty, there is still room for seeking probabilistic answers). All of nature is described by mechanical principles. Mechanism thus leads naturally to the axioms of completeness, uniqueness, irreducibility, and orthogonality. Where it struggles is with the whole idea of basis states altogether; a mechanist doesn't need to and therefore does not tend to think in terms of these states. The state notation is not inconsistent with the mechanical philosophy, but neither is it directly implied by it.

The idealist, who sees things perhaps as a giant computer simulation, is in the opposite difficulty to the empiricist. He has no trouble drawing up the list of states, but without some physical principle external to his intellect has difficulty saying what it means for one of the states to be occupied (or the potentia to be actualised). The intellectual system which the idealist states is all there is can't be complete; there needs to be something beyond it to make things actual.

So, of these five major philosophical systems, only the Aristotelian and the mechanical seem consistent with the underlying assumptions of probability theory we have discussed so far. They directly imply some of the premises we have come up with so far, and don't contradict the others.

In probability theory, we need to assign numbers to each state which in some way represent our certainty or uncertainty of that particular outcome coming to pass. So how do we assign a number to these states? This number is known as the probability, and it is defined using four axioms.

  1. The number is real (i.e. doesn’t have any contributions from the square root of minus one in it).
  2. The number assigned to each state is zero or greater.
  3. The sum of the numbers assigned to each state is one.
  4. The probability of achieving an outcome A or of achieving an B, where A and B are orthogonal states is the probability of A plus the probability of B.

These axioms lead to an intuitive result. Why? because they are also satisfied by a frequency distribution. Suppose that we roll the dice a large number of times, and record how many times we get each outcome. At the end of the exercise, we divide each number by the total number of rolls. This gives us a frequency distribution. The frequency distribution ultimately arises from counting things, which is something we naturally do.

The difference between probability and frequency is that probability is something we compute while frequency is something we measure. Some people, as is well known, confuse the two. They treat a probability distribution as the limit of a frequency distribution after an infinite number of measurements. There are many reasons why this is foolhardy, but to my mind the most important is that by confusing a probability distribution with a frequency distribution, they lose the ability to compute the distribution in advance. Frequentists have, by defining the term probability to be a virtual synonym of frequency, denied themselves the use of their reasoning abilities. We have two tools available to us: reason and experience (alternatively theory and experiment); and frequentists have basically said from the outset that it is impossible to obtain a theoretical knowledge. This is the philosophy of empiricism (or perhaps a particularly hard form of empiricism), and it is refuted by the simple observation that theoretical physicists have correctly calculated things before they were measured by experiment.

Now, if probability is a form of computation, it is a form of reasoning. Every chain of reasoning goes back to premises, axioms and definitions; and ultimately these can’t be proven by reasoning alone. So every probability is conditional upon the assumptions built into the model used for the calculation. Let us say we are tossing a coin. If we make the assumption that the coin is balanced, we will assign a probability of 0.5 to it coming up heads. If it is weighted in a particular way, we will assign a probability of 0.75. Both of these calculations of the probability are perfectly legitimate, even though they give different results. However, once we take a particular coin, it is in practice weighted in a particular way. We can measure how it is weighted, and calculate from this a probability for it coming up heads. We can then test our calculation (and therefore the premises it was built on) by performing the experiment a large number of times, and comparing our prediction against the result. More precisely, we calculate the probability of the measured result occurring under various different premises. Using a theorem, Bayes' theorem, we can then extract the probability for each set of premises for the nature of the coin given the observed set of measurements. Induction thus cannot lead to certainty that one set of premises is correct, but it can allow us to systematically express how certain we should be about them all. This allows us to eliminate numerous possibilities for all practical purposes.

Probability is thus a numerical measurement of uncertainty parametrised in such a way that it follows the same rules as counting states. And since counting is everybody’s first introduction to numbers and how most people naturally think about numbers, probability is thus a very natural and intuitive way of expressing uncertainty. Anything else would just be weird.

So let us consider the standard quantum experiment, the two slit experiment. We have a source S which emits a beam of particles, towards a barrier with two slits, A and B. The particles then hit a screen at position X. Our understanding of the initial conditions and the physics is denoted as p. It doesn’t matter for this example what that physics is. It could be Newton’s physics, Aristotle’s, Einsteins, or something entirely different; the only stipulation is that it accurately describes the paths of the particles.

Now if the particle passes through slit A, there is a certain probability that it will hit the screen at position X, which we denote as P(X|Ap). This is expressed as a probability rather than a certainty; maybe we are uncertain about the initial conditions or the mass of the particle or the forces acting on it, so we are not certain about the final result X. Equally the probability that the particle will hit X after passing through slit B is P(X|Bp). The probability that it will pass through slit A is P(A|p) and the probability that it will pass through slit B is P(B|p).

From Bayes’ theorem, we write

P(X |Ap )P (A|p) = P(A|Xp )P (X|p)

P (X|Bp)P(B |p) = P(B|Xp )P (X|p).

Add these two equations together, and we get,

P (X|Ap)P(A |p)+ P (X |Bp)P(B |p) = P (A|Xp)P(X |p) + P(B|Xp )P (X |p).

We now use the definition of probability to write that,

P(A|Xp )+ P(B|Xp ) = P(A ∪ B|Xp),

where P(AB|Xp) denotes the probability that it passes through either slit A or slit B if it hits the screen at X given the initial conditions and physics denoted by p. But it must pass through either slit A or slit B, so this probability is just 1.

The end result of this is

P (X |p) = P(X |Ap )P(A|p)+ P(X |Bp )P(B|p),

which basically states that the probability of the particle hitting the screen at X is the sum of the probability of it hitting the screen at that location after passing through slit A and the probability of it hitting the screen at that location after passing through slit B.

Probabilities can be used to predict the frequency distribution. So suppose that we are firing 1000 tennis balls out of our source. Our screen consists of a series of buckets, and P(X) represents the probability that the ball goes into a particular bucket. Suppose that our theory predicts that half of them pass through slit A and half of them pass through slit B. We confirm these predictions by performing the experiment and counting how many balls pass through each slit. It comes close enough to a 50-50 split that we believe our theory. Suppose further that the theory predicts that every 4 balls out of 500 that pass through slit A hit the screen at position X and every 10 balls out of 500 that pass through B hit the screen at position X. We confirm this by coving up slit A, firing 1000 balls and counting how many hit the screen at position X, and it comes close enough to the predicted number that we accept the theory. We cover up slit B, do the same thing, and again everything looks good.

Now we are ready to perform the actual experiment. Our theory predicts that on average we should expect to get 14 balls hitting X. We leave both slits uncovered, and get ready to do the experiment. Obviously, we don’t have that many samples (1000 isn’t a big enough number for probabilities and frequencies to converge), so we wouldn’t expect to get the predicted result if we just do the experiment once. So we will set up to perform the experiment several thousand times and take the average. That average should be close enough to the expected result of 14 that we confirm the theory.

And if we performed this experiment with tennis balls, we would get a result near enough to 14 that we believe the theory.

But now let us replace the tennis balls with electrons. Again, we predict and confirm that half the electrons go through slit A and half through slit B; cover up slit B and 8 out of a thousand electrons go into X, and so on. The set up works out in exactly the same way as with the tennis balls.

So, when we perform the full experiment, we confidently expect to get the result 14.

Except we don’t. We find that the answer is 6.

Now this is weird. 4 fewer electrons hit X when both slits are open than when slit B alone was open. It just doesn’t make sense to our intuition.

Could it be that there is some difference in the physics that describes electrons and the physics that describes tennis balls? Some difference hidden within the condition p that makes all the difference? Certainly the physics describing electrons is very different from the physics describing tennis balls. But that is completely irrelevant. We didn’t need to know what p was in the algebra above; and given that we have individually confirmed each of the numbers on the right hand side of equation (1) (to a good enough accuracy) no matter what p is, as long as P(A|p) = ½ and so on, our prediction for the fraction of electrons hitting the detector at X, P(X|p) cannot be affected. That number just comes from the mathematical laws of probability.

So what we have to conclude is that standard probability theory is wrong, or at least that it doesn’t apply in quantum physics. At least one of the assumptions of probability theory listed above is invalid. And this is why quantum physics is weird.

Quantum physics is still a mathematical representation of reality. We still want to represent our uncertainty for each possible outcome with a number. But that number is not a probability. It violates the axioms of probability. It is therefore not proportional to an expected frequency after a huge number of measurements.

Our intuition concerning uncertainty is based on counting descrete objects. We can’t really imagine anything else. We can’t imagine quantum physics. But we can still understand it intellectually, in the same way as always: we list premises and work through to conclusions.

So which of the axioms of probability listed above are violated in quantum theory? I’ll discuss that next time.

Why is quantum physics so weird (Part 2)?

Reader Comments:

Post Comment:

Some html formatting is supported,such as <b> ... <b> for bold text , < em>... < /em> for italics, and <blockquote> ... </blockquote> for a quotation
All fields are optional
Comments are generally unmoderated, and only represent the views of the person who posted them.
I reserve the right to delete or edit spam messages, obsene language,or personal attacks.
However, that I do not delete such a message does not mean that I approve of the content.
It just means that I am a lazy little bugger who can't be bothered to police his own blog.
Weblinks are only published with moderator approval
Posts with links are only published with moderator approval (provide an email address to allow automatic approval)

What is the middle letter of the word cat?