One of the fundamental challenges in the philosophy of science is the problem of induction. (Philosophical) induction is the process of trying to uncover premises from conclusions. Deduction is the opposite, reasoning from premise to conclusion. Deduction is secure; we can, with correct logic, always say that if the premise is true, then so is the conclusion. But the problem is that the opposite is not always the case: contradictory premises can lead to the same conclusion. Thus, even if the conclusion of the scientific argument is correct, one cannot be certain that the induced premises are right. Some other set of ideas could lead to the same predictions.
One of the goals of science is to come to an understanding of how the universe evolves. The question of why is ultimately one of philosophy, although scientists can guide and give hints to the philosophers. In short, we want a theory that can make accurate predictions of how physical objects will behave in certain circumstances. The primary tools available are abstraction and experiment. We first of all abstract physical qualities, and present them in a numerical form. We then make measurements under controlled circumstances, and measure how those numbers change as the circumstances change. Obviously, this means that science is restricted to what can be abstracted numerically; but as Pythagoras, Plato and Bradwardine originally hoped, and we can now confirm, that includes a great deal of what we seek to understand. Not everything, but a large proportion of it.
The early scientists worked by making careful measurements, plotting them on a graph, and finding a mathematical function which fit the data. We can think of things such as Boyle's law of gases, Snell's law of refraction, Kepler's laws of the solar system, Hooke's law of extension under tension, and so on. These mathematical functions are then used to predict how similar materials will behave in other circumstances than in the laboratory. This process is, of course, pure induction. It is a little complicated because the experiments don't have perfect precision, and therefore neither do the formulae used to fit to them, so there is always some uncertainty in the final result and thus any predictions. However, it is known how to deal with this uncertainty; instead of a definite result, we express it as being within certain bounds, and use the mathematics of probability to deal with the consequences. This is important, but it doesn't undermine the overall approach.
But now we come to the problem. How do we know that we fit to the right mathematical function? All we know is that the function runs through the points we measured; it need not be the simplest form that does so. Interpolation (guessing where the line runs between the measurements) is usually reasonably safe, although not always. Extrapolation (guessing the course of the line beyond the points), on the other hand is occasionally OK, but far more frequently very dangerous. Induction of this sort, without a prior theory to guide the extrapolation (which, of course, if we have no other principles than induction from experiment to guide us, we don't have), should never be relied on. There is just too high a risk of missing something important by selecting too simple a curve to explain the data. If the data looks boring, it might be that nature is boring, but more likely it is because we have missed something important.
Consider, for example, the plot above. This is a resonance plot, common in particle physics. The experimental set-up is that a pion and a proton are shot at each other with various different centre of mass energies. A detector is set up pointing at the point of collision, but with the line connecting it to the collision at a large angle from the directions of the beam. So if the particles pass by each other without interacting, then nothing will hit the detector. But they might "collide" with each other; and if they do they can scatter at various angles, and some of these angles will be just right to hit the detector. Of course, the theory that describes this scattering is relativistic (Dirac) quantum mechanics rather than a Newtonian picture, so we can expect things to be a little bit weird when we go down to the details, but the underlying idea is similar.
The cross-section, on the y-axis, counts how many particles hit the detector. The x-axis gives the centre of mass energy of the incoming beam. The black dots are the actual experimental measurements that were made in this real life example. But, since I am using this as an example of the perils of induction, suppose that the experimenters were a little less meticulous and only measured at the energies where I have super-imposed the squares over the image.
The shape of the curve is of interest because peaks in the height of the curve occur at just the right energies for a new particle to be created. The energy of the incoming beam corresponds to the mass of the new particle, in this case a Delta-Boson. Everywhere else, the pion and proton just bounce off each other, similar to what we would expect in classical physics. Occasionally they are deflected from their original path, and will hit the detector with a certain frequency. That frequency has a small dependence on the energy. But when they have just the right energy (or near enough), something different can happen. The pion and proton combine to form a Delta-Boson. Because all the energy goes into the Delta-Boson mass (and the total momentum is zero), the new particle will be more or less stationary. Then, a short time later, the Delta decays back into a pion and proton. These can emerge from it in any direction with equal likelihood, and therefore more particles will hit the detector than in the case of the standard collision. Thus we get a peak in the cross-section at the energy which corresponds to the mass of the Delta.
Obviously the experimenters here were very meticulous and made a lot of measurements. But suppose instead that they had only made measurements corresponding to the blue squares, and had no theory to guide them. What would they conclude, if they used the standard methods of induction? The curve up to that point looks very smooth. They would fit it with a quadratic equation or something similar, and come up with a wonderful "law of scattering" which completely misses all of the exciting stuff. They would conclude that there is no evidence for creation of a Delta Boson. And that is the problem with extrapolation. What might look like a simple set of data need not be outside the range of measurement.
Even interpolation doesn't always help. Suppose that they decide to take a few more measurements at lower energies. Most of the time they will hit the peak structure, which might give them a hint that something strange is happening. But they could be unlucky, and measure at the energies corresponding to the red squares. Again they will come to completely the wrong conclusion.
Obviously in real life the experimenters took enough measurements to map out the real shape of the curve. But, even so, there is always a small distance between the data points; one can't measure at every possible energy. How can they know that they haven't missed exciting variations from the simple curve between those measurements?
And this is the problem of induction, and why experiment alone is never enough. If you see nothing exciting, you can never be sure that you haven't missed out on something crucial between the gaps of your measurements. You can measure more, but it will never be enough to provide the needed surety.
Of course, the situation is improved by the concept of falsification. While you can never be sure that your proposed answer is right, you can at least be certain that a large number of possible answers are wrong. There is a process of elimination. And that is good, as far as it goes. But it is still not enough to make us certain.
Even better is the method of prediction: you make predictions from your model outside the range of your data, or filling in the gaps of your data. If the predictions are wrong, you know your model is wrong. If they are right, then you can be more confident. The important point with prediction is that there is no possible way in which you can use the new data to help shape your model; with old data there is always the possibility to jig the model until it fits the data. With prediction, you at least get your theory out in the open, and will either look a fool or a genius depending on the result of the next experiment. But even this method can't give us certainty. See the red squares above.
So induction is good, but not good enough. But fortunately, physicists today don't need to rely on induction alone, because we have something else to guide us. We have advanced enough to, in part, use deduction.
Before going further, I need to discuss the means by which physicists abstract from the real world to their representation of it. I think the best way of thinking about it is in terms of the mathematical process of mapping. A map is just a connection between two lists of data. For example, we can have a map between fruits and colours; on one hand we have a list of fruit and on the other a list of colours, and the mapping connects each fruit in the list with a colour. Some mappings are many to one, others one to one. A one to one mapping is always reversible. One can also map from a subset of set A to the entirety of set B. In this case, every item in set B can be used to directly represent something in set A, though some aspects of A are not covered by the representation.
For example, in physics, we create a mapping between physical locations and moments in time and points in a geometrical space. To be successful, this assumes certain things about physical space/time, but the success of physical theory suggests that the assumption is not unreasonable. We could just work with the geometrical representation, but it is easier to perform a second mapping between the geometrical space and an algebraic space. It is easier to perform algebraic calculations than working in pre-Cartesian geometry. This second mapping assigns a set of numbers to every point in the geometrical space, and consequently these numbers also represent physical space time. However, there is no unique way of performing this second mapping. We have to create a rule relating each point in space/time with a set of numbers. This rule is called a coordinate system, it is an arbitrary choice we have to make in the construction.
We don't only map points in space time to an algebraic representation; we can identify the states or potentia of physical particles, and create an algebraic representation of those states and the properties of particles inhabiting those states. With physical particles existing in space/time, there is clearly going to be a relationship between the algebraic representation of the particles and the algebraic representation of space time.
Once we have the algebraic representation, we can then crank the handle, do a bit of mathematics, and calculate other qualities of the particles. This calculation depends on certain laws and rules, which we call the laws of physics. These laws connect one point in the representation to another. Maybe we calculate where the particle will be located at some future time, or how it might interact with other particles, or some of its observable properties, and so on. This is still all done in the algebraic representation of reality. But because there is a one to one mapping between the algebraic representation and the physical reality, we can use the same mapping that got us from reality to the premises of our calculation, to return from the calculation's conclusions in the representation back to physical reality. Every point in the abstract algebraic space corresponds to something in the real world. That then enables us to compare against experiment.
The coordinate system is something we introduced as part of the abstraction. It is not part of the real world. Therefore our final results when we go back to the real world cannot depend in any way on our choice of coordinate system. This limits what the laws of physics could be.
Now there are two ways in which physical results could be coordinate independent:
- The fundamental equations of the laws of physics appear different in different coordinate systems. This gives a weak condition on the laws.
- The fundamental equations representing the laws of physics are the same whatever coordinate system we choose. This gives a strong condition on what the laws could be.
If we take the second option, that doesn't mean that everything will be the same in every coordinate system. The locations of particles, the numbers we give to distinguish their different states or potentia, for example, would still differ from one coordinate system to another. What it would mean would be that (for example) whatever equivalent we have for Newton's laws of motion would be the same no matter which coordinate system we choose.
For example, Newton's laws of motion are the same for coordinate systems which are distinguished by a translation. If I choose one coordinate system, and my friend another which only differs from mine in that his numbers are always five greater than mine, then, if Newton's laws of motion describe physics as I parametrise it, then they would also describe physics as my friend observes it. However, there are other possible ways in which the two coordinate systems can differ where Newton's laws are not invariant. For example, if my friend associates each point with a number that is the cube of the number I use. Newton's second law cannot then simultaneously be the right law of physics in both my and my friend's coordinate systems. However, there are other possibilities, such as Einstein's theory of general relativity, whose key equations are independent of the choice of coordinate systems, and can cope with weird things such as accelerating reference frames or cubed coordinates.
Of course, Newton's laws are a bit passé amongst physicists. Today, we tend to think of things in terms of principles of least action. The action is a quantity that is a function of the position and locations of all the particles in the universe, at every point in space and every moment in time. If you move a particle from one location to another, then the action will take a different value. The route by which particles travel through the universe also affects the value of the action. The principle of least action, used in pre-quantum physics, states that the route that the particles take is the one that minimises the action. In general the value of the action will also depend on which coordinate system we choose.
Now we haven't yet specified what the mathematical form of the action is, but that's OK, because we are still talking about general principles rather than going into specifics. There is one mathematical form of the action which leads to a set of physical laws identical to Newton's laws of motion; take a different another mathematical form, and minimising the action will give you Einstein's equivalents. Basically, whatever classical (deterministic) equations of physics you can conceive of, there is some action whose minimisation within a given coordinate system will be equivalent to those laws.
The action also plays a role in quantum physics. Classical physics is deterministic: each particle can only take one route from A to B. In quantum physics, on the other hand, the particle can take any of a large number of paths to get from one place to another. We don't know which path it is going to take, so when considering how likely it is that a particle starting at A will finish at B we need to consider every possible path. However, the routes are not all of equal likelihood. What I call the likelihood (and everyone else the amplitude) for each path depends on a mathematical function, which is related to the action (the weight is the complex exponential of the action; in quantum physics this serves as the definition of the action). So in quantum physics the action you get when you suppose that a particle takes one particular path from A to B is related to the likelihood that the particle will take that path. It is most probable (each path has an equal magnitude of its likelihood but not an equal probability) that the particle will take a path close to the minimum of the action, which is why classical physics is such a good approximation to the real thing. But once we get down to a small enough length scale, where quantum effects dominate, we no longer see that is the case.
And now we come, at last, to symmetry.
Take a piece of paper, and draw grid lines on it representing Cartesian coordinates. Place a square tin on the paper, and draw a line around its base. Now rotate the tin around its centre through any angle that isn't a right angle or a multiple of right angles. Draw another line around its base. The two lines will not coincide.
Now repeat the process with a cylindrical tin. This time, as long as you find the exact centre of the tin, you will find that the line you draw around its circumference will not change after the rotation.
That's all fairly obvious and hardly seems revolutionary, but mathematicians have taken this idea, and used the word symmetry to describe it. Something has a symmetry if it, after a particular non-trivial transformation, such as a rotation, it remains unchanged. The cylinder has a circular symmetry. The square doesn't, but is only symmetric for rotations through right angles.
Instead of rotating the cylinder, we can get the same effect by rotating the paper in the opposite direction. Whether we rotate the paper or the square tin, we will still get the same pattern of lines drawn on the paper at the end of the day. The paper has grid-lines representing our coordinate system. So a rotation of the paper represents a change of the coordinate system. So we can equally say that the line representing the circumference of the cylinder is unchanged under a rotation of the coordinate system. So something has a symmetry if its mathematical form is unchanged after a transformation of the coordinate system.
Now in physics, the coordinate system is something we choose when we construct the mapping from the physical system to the algebraic representation. Which coordinate system we select is entirely arbitrary, we only have to be consistent throughout the calculation.
But that isn't much use. Suppose that I select a coordinate system, X. And my friend Bob selects a different coordinate system, Y. If we are to communicate with each other, we need some means of converting from one coordinate system to another. But that is always possible, through some mathematical transformation; this transformation could be a rotation, as in my example with the cylinder, or something more complicated. But (baring certain assumptions related to the topology and dimensions of the space; if we are both correctly mapping from reality these caveats aren't an issue) this is always possible.
But Bob and I might find that some of our equations have the same mathematical form despite our different coordinate systems. Something unchanged under a change in coordinate systems is a symmetry, so in this sense we can say that this part of the laws of physics are symmetric.
Now the most important thing in physics is the action, which as I said is related to the likelihood that a particle will take a particular path from A to B. Now the likelihood is related to the probability, and the probability is a count of how many particles arrive at B. This is something we measure in the real world; it transfers directly from our abstraction to reality. But if the likelihood is something we can (indirectly) measure, then it can't depend on our choice of coordinate system. Which means that the action also can't depend on our choice of coordinate system. Which means that the action must be symmetric under a wide range of coordinate transformations. And there are very few specific mathematical forms for the action which satisfy this criteria. Thus we know that the true laws of physics must be one of those handful of options. And the only experiment I needed to perform to reach this conclusion was the one that told me that the world is quantum rather than classical, and uncertainty parametrised by likelihoods rather than probability.
Now obviously I have simplified this argument for this post, and the full story, once we go into the mathematical details, has a few more complications than I have implied. In particular, likelihood is not the same as the probability we measure, and there are transformations of the likelihood which leave probability untouched; which means that the result is a tiny bit, but only a tiny bit, weaker than I implied. Additionally, we don't just have to consider the likelihood we assign to each path, but the measure of the integral we use to sum over those likelihoods. I also haven't discussed the most important types of coordinate transformation (gauge transformations), nor have I mentioned the distinction between local and global transformations, which is important and makes it harder to rule out option (1) of a coordinate independent physics. But these complications can be overcome. The basic principle remains: the action is one of the handful of options which satisfy the symmetries.
And once we know what the action is, we have an almost complete theory of physics. Everything else is just working out the consequences implied by that action, and checking them against experiment.
I have made certain assumptions along the way (such as that the laws of physics are quantum rather than classical, the assumptions behind the argument that the action must be symmetrical as I have described, and that at least part of reality can be represented algebraically in the manner described), and this procedure is not completely limiting. For example, it can't tell us how many types of fundamental particle there are, or how strongly they interact with each other. These things we have to measure. But what this tells us is if these assumptions are true, then the laws of physics must be one of a very limited number of options.
In summary, our knowledge of physics now, not entirely but to a large extent, arises from arguing from a set of premises which almost seem self evident (or follow directly from qualitative observation) to conclusion. In short, theoretical physics has advanced sufficiently that deduction rather than induction plays the leading role. And this is all because of our understanding of the importance of symmetry.
I have only scraped the surface of the role that symmetry plays in modern physics; constraining the possible actions (and Lagrangians) is just one of its many roles. In both general relativity and quantum field theory, almost everything boils down to some aspect of the theory of symmetry or another. But I want to stop here, because this is the particular aspect of symmetry I will need for subsequent posts.
All fields are optional
Comments are generally unmoderated, and only represent the views of the person who posted them.
I reserve the right to delete or edit spam messages, obsene language,or personal attacks.
However, that I do not delete such a message does not mean that I approve of the content.
It just means that I am a lazy little bugger who can't be bothered to police his own blog.