Graphics Beyond Three Dimensions

We are all accustomed to making two dimensional scatter plots by hand on graph paper. Modern software can also draw three dimensional plots but I have never been very successful at interpreting them. Even rotating a three dimensional plot on a two dimensional screen does not help me very much. Luckily, I do not care very much because there are ways to depict far more than three dimensional data in a far clearer way on a screen or a paper.

I am thinking of 'draftsman's plots' or, as they are sometimes called 'scatter plot matrices'. These represent data with more than two dimensions as series of two dimensional graphs. Sometimes information is lost when other dimensions disappear but, usually, we gain far more when we plot four, five or six variables simultaneously.

Here is a simple example. The data consists of heart disease, pulmonary disease and cancer rates in Western and Southern states. There are three two dimensional scatter plots arranged in a triangle so that we can see relationships among all pairs of diseases at once. In the upper left plot, we see heart disease rates along the X-axis and cancer rates along the Y-axis. In the lower left plot, we see heart disease along the X-axis while the Y-axis becomes pulmonary disease. Finally, in the lower right plot, we see cancer rates along the X-axis and pulmonary disease along the Y. Detail is not so important here as is overall trend. For example, we can see that heart disease and cancer rates are strongly correlated and rise together. The correlations between pulmonary disease and the other two are also evident but not nearly as strong.

In the above, we have plotted states with different symbols and those symbols can also give us information. In this case, it looks as if the western states have significantly lower heart disease and cancer rates than the southern states, while the rates for pulmonary disease may not be extremely different.

Modern statistical software is usually capable of making such plots with many more than three variables and more than two groupings of points. From such a plot, we can tell which variables are related and which are not.

Such information is useful but scatter plot matrices also can tell us about remarkable individual cases. For example, consider the point in the lower left hand corner of all the plots in the disease data matrix. This always corresponds to the same state. You can tell that because, in the upper left plot, it has the lowest heart disease rate so it must have the same heart disease rate in the lower left plot. So, the two extreme left points must come from the same state in the data set. Similarly, the two bottom points must come from that same state. I thought it would be interesting to find out which state was so healthy. The answer -- Alaska. Either our friends in the North are the strong, hearty type or they do not live long enough to get conventional diseases.

I found that the unusual point belonged to Alaska by 'scatter-plot brushing' -- a technique far more impressive in practice than in description. I just pointed a cursor at one of the three squares corresponding to Alaska and clicked on it. That point lit up -- as did the other two points in agreement with it -- and the case corresponding to the points was highlighted in the data set. It was easy. Once again, modern statistical software automates this boring task. (For those of you in a more macabre frame of mind, I found the point with the highest cancer rate -- Florida. It was hardly surprising that God's waiting roomí should be there.)

There is another variation of the scatter plot matrix that is useful -- plotting one set of variables against another completely different set. Here is an example of a plot of male and female literacy rates in African countries against male and female life expectancies in those countries. Unsurprisingly, we see some positive correlations among these variables. Once again, I became curious about the countries at the high and the low ends of the spectrum. Their points are emphasized in the plot. The country with the highest male and female life expectancy is Tunisia while the country with the lowest is, sadly, Somalia.

In summary, good software should allow you to plot many variables against each other, speed the search for patterns in that data and find unusual cases among your data points.

--By Dr. Patrick Fleury, President of Op Numerics, a statistical consultancy located in Oak Park, IL.



© 1996 Scitech International, Inc.  All rights reserved

Back to articles menu Go to next article