Rense Nieuwenhuis » graphics

Graph: Abortion Attitudes in United States

Rense Nieuwenhuis — Thu, 23 Oct 2008 10:00:35 +0000

I have been writing about abortion a lot, recently, so I decided to provide some more context regarding this important subject, by making some graphics. The first graph I created is on trends in American public opinion regarding induced abortion:

(click on the graph for a larger image)

To give an impression of how abortion attitudes have developed in the United States, I created a graph which is shown in figure 1. Using survey data from the General Social Survey (GSS), a nationally representative survey program in the United States, I was able to visualise the policy preferences regarding induced abortion for Americans living in nine different regions ((More detailed state-level aggregation is possible in principle, but the data required to do so are not publicly available)). The available data cover the period from the legalisation of induced abortion in the United States, to 2005. Respondents were asked under which conditions they think it should be possible for a pregnant women to have an abortion. The subsequent conditions were:

The woman’s health is seriously endangered by the pregnancy
The woman’s pregnancy is a result of rape
There is a strong chance of serious defect in the baby
Family has a very low income and cannot afford any more children
The woman is not married and does not want to marry the man
The woman is married and does not want any more children
The woman want an abortion for any reason

The graph in figure 1 represents for each of these conditions the proportion of respondents (both men and women) that agreed with each condition. Since the same conditions were asked to respondents every wave of the survey, it is possible to visualise trends over a long period of time.

The graph learns us several things about abortion attitudes in the United States. To start, it is shown that, apart from fluctuations, the overall level of acceptance of induced abortion remained relatively stable in each of these nine regions. Interestingly, much of these fluctuations seem to have occurred during the early 90’s.

Furthermore, it is very clear that two ‘groups’ of responses occur. `Health’ related abortions (woman’s health in danger, pregnancy as a result of rape, defect in baby) have much higher levels of acceptance than ‘discretionary’ abortions (low income, unmarried, no more children, any reason). This is true for each of the nine regions shown. Not all is the same in these regions, however, for large differences between the regions in average levels of accepting abortions are clear, especially with respect to the discretionary abortions. In the Pacific region, approximately 60% of the respondents think that a woman should be able to have an abortion for discretionary reasons, whereas in the E.S. Central region acceptance has been as low as 20% in 2002.

Finally, closer examination shows that that amongst the discretionary conditions, the variation between the different conditions has decreased. For instance, in the Mountain region, we see differences in levels acceptance of almost 20 percentage points amongst the discretionary items (with approximately 40% of the respondents accepting an abortion for any reason, and approximately 60% when the family cannot afford any more children). These differences, however, waned over the years and in 1995 all the four discretionary conditions have very similar levels of acceptance. To a lesser extent, the opposite might have happened regarding the health-related conditions. Whereas the level of acceptance for having an abortion when the mother’s health is in serious danger remained relatively stable in the nine regions, acceptance for having an abortion when the pregnancy is the result of a rape and when there is a serious chance of a defect waned slightly.

Of course, this is only an overview graph, and an overview interpretation of that graph. Nevertheless, I think it provides some interesting insights in the development of the American public opinion on induced abortion.

R-Sessions 15: Intermediate Graphics

Rense Nieuwenhuis — Mon, 25 Aug 2008 10:00:42 +0000

Based on the basic graphics that were created in the previous paragraph of this manual, we will elaborate some to create more advanced graphics. What we are going to do is to add two other sets of data, one represented by an additional line, one as four large green varying symbols. Then, in order to keep oversight over the graph, a basic legend is added to the plot. Finally, we let R draw a curved line based on a quadratic function.

Adding elements to plots

Just as in the previous paragraph, first some data will be created. The x and y variables are copied from the previous paragraph, so we will be able to recreate the graph we saw there. Then, an additional set of values is created and assigned to the variable z. This is done in the first three rows of the syntax below.

x <- c(1, 3, 4, 7, 9)
y <- c(3, 4, 5, 6, 5)
z <- c(5, 6, 4.5, 5.5, 5)

plot (x,y, type=”b”, main=”Example plot”, xlab=”Predictor Value”,
ylab=”Outcome Value”, col=”blue”, lty=2)

lines(x, z, type=”b”, col=”red”)

points(x=c(2,4,6,8), y=c(5.5, 5.5, 5.5, 5.5), col=”darkgreen”, pch=1:4, cex=2)

Then, using the plot() function, a plot is created for the first line we want, closely representing the figure from the previous paragraph. Two things are different now: first of all, we ask for a blue line, by specifying col=”blue”. Secondly a dotted line is created because the line-type is set to ‘2’ (lty=2).

Now we want to add an additional line. As you might have noticed by now, the plot() function generally clears the graphics-device and creates a new graphic from scratch ((This is not always the case, though. You can have the plot() function add elements to existing graphs by specifying ‘add=TRUE’, which results in similar functionality as the lines() and points() functions shown here)). Remember that there are two types of plotting functions: those that create a full plot (including the preparation of the graphics device), and those that only add elements to an already prepared graphics device. We will use two functions of that second type to add elements to our existing plot.

First, we want another line-graph, representing the relationship between the x and the z variables. For this, we use the lines() function. By specifying col=”red” we get a red line, added to our existing graph. It is possible to set the line-type (which we didn’t here), but you can’t set elements as labels for the axes or the main graph-title. This can only be done by functions that setup the graphics device as well, such as plot().

Next, using points(), we add four green points to our already existing graphic. Instead of storing the four coordinates of these points in variables and specifying these variables in a plot-function, we describe the coordinates inside the points() function. This can be done with all (plotting) functions in are that require data, just as all these functions can have data specified by using variables in which the data is stored. By specifying four different values for the ‘pch’-parameter, four different symbols are used to indicate the data-points. The ‘cex=2′ parameter tells R to expand the standard character in size by a factor 2 (cex stands for Character Expansionfactor).

Legend

legend(x=7.5, y=3.55, legend=c(“Line 1″, “Line 2″, “Dots”), col=c(“blue”,”red”, “darkgreen”), lty=c(2,1,0), pch=c(1,1,19))

A legend is not automatically added to graphics made by R-Project. Fortunately, we can add these manually, by using the legend() function. The syntax above is used to create the legend as shown below.

Several parameters are needed to have the right legend added to your graph. In the order as specified above these are:

x=7.5, y=3.55: These parameters specify the coordinates of the legend. Normally, these coordinates refer to the upper-left corner of the legend, but this can be specified differently.
legend=c(“Line 1″, “Line 2″, “Dots”): the ‘legend=’ parameter needs to receive values that will be used as labels in the legend. Here, I chose to use the character strings ‘Line 1′, ‘Line 2′, and ‘Dots’ and concatenated them using c(). It is important to note that the order these labels will appear on the legend is determined by the order that they are specified to the legend-parameter, not by the order in which they are added to the plot.
col=c(“blue”,”red”, “darkgreen”), lty=c(2,1,0), pch=c(1,1,19) The last three parameters, col, lty, and pch, are treated the same way as in other graphical function, as described above. The difference is that not one value is given, but three.

Plotting the curve of a formula

Sometimes you don’t want to graphically represent data, but the model that is based on it. Or, more generally, you want to graphically represent a formula. When the relationship is bivariate, this can easily be done with the curve()-function. For instance, let’s say we want to add the formula “y = 2 + x ^ 5″ (the result is 2 plus the square root of the value of x). Using the syntax below, this formula is graphically represented and added to the existing graphic. It is drawn for the values on the x-axis from 1 to 7. Here it can be seen, that it is not necessary to draw the function over the full scale of the x-scale.
By specifying lwd=2 (lwd= Line Width) a thick line is drawn.

curve(1 + x ^.5, from=1, to=7, add=TRUE, lwd=2)

– – — — —– ——–

– – — — —– ——–
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
——– —– — — – –

R-Sessions 13: Overlapping Data Points

Rense Nieuwenhuis — Wed, 20 Aug 2008 10:00:40 +0000

In many cases, multiple points on a scatterplot have exactly the same coordinates. When these are simply plotted, the visual representation of the data may be unsatisfactory. Today’s R-Session is on how to present this type of data in neatly arranged plots in R-Project.

Introduction

In many cases, multiple points on a scatterplot have exactly the same coordinates. When these are simply plotted, the visual representation of the data may be unsatisfactory. For instance, regard the following data and plot:

x <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 4, 5, 3, 4)
y <- c(5, 5, 5, 4, 4, 4, 4, 4, 3, 4, 2, 2, 2)

data.frame (x, y)

> data.frame(x,y)
   x y
1  1 5
2  1 5
3  2 4
4  2 4
5  2 4
6  2 4
7  2 4
8  3 3
9  4 4
10 5 2
11 1 5
12 3 2
13 4 2

plot (x, y, main=”Multiple points on coordinate”)

On this first plot, we see that only seven points are projected, although there are thirteen data-points available for plotting. The reason for this, is that some of the points overlap each other. There are actually three points on the coordinate [x=1, y=5] and five points on the coordinate [x=2, y=4]. The standard plot function of R does not take action to show all your data.

Fortunately, several methods are available for making all data-points in a plot visible. Consecutively, the following will be described and shown:

Jitter {base}
Sunflower {graphics}
Cluster.overplot {plotrix}
Count.overplot {plotrix}
Sizeplot {plotrix}

Jitter {base}

The jitter function adds a slight amount of irregular ‘movement’ to a vector of data. Some functions, such as stripchart, have a jitter-argument built in. The general plot-function does not, so we have to change the data the plot is based on. Therefor, the jitter-function is not applied to the x=argument of the plot function. This will result, as shown, in some variation on the x-axis, thereby revealing all the available data-points. This is done by:

plot(jitter(x), y, main=”Using Jitter on x-axis”)

As we can see, all three data-points on x=1 are clearly visible. But, the points on x=2 still clutter together. So, when to many points overlap each other, jittering on just one axis might be not enough. Fortunately, we can jitter more than just one axis:

plot(jitter(x), jitter(y), main=”Using Jitter on x- and y-axis”)

Now, we see the overlapping points varying slightly over both the x-axis and the y-axis. All of the points are now clearly visible. Nevertheless, if many more data-points were plotted, again cluttering would occur. But, although not all individual points will then be shown, using jitter still allows for a better impression of the density of points in a region.

Sunflower {graphics}

sunflowerplot(x, y, main=”Using Sunflowers”)

Sunflower are often seen in the graphics produced by statistical packages. When more than a one point is to be drawn on a single coordinate, a number of â€˜leafsâ€™ of the sunflower are drawn, instead of the points that is to be expected. The advantage of this is the increased accuracy, but the back-draw is that is works only when relatively few points need to be drawn on one coordinate. Another back-draw of the method is that the sunflowers take quite a lot of place, so overlapping might occur if several points are to be plotted very close to each other.

Cluster.overplot {plotrix}

The next three examples are coming from functions inside the plotrix package. The first of these functions is cluster.overplot(). This function clusters up to nine overlapping points of data. Therefor, this function is ideal for relatively small data-sets. Due to the tight clustering, the plot is not easily mistaken for showing randomness that is ‘real’ in the data.

The functions itself does not plot data, but will return a list with ‘new’ coordinates which can be plotted succeedingly. In the code below, first the plotrix package is loaded. Next, the list with new coordinates will be shown. Finally, cluster.overplot() function is nested in the plot()-function, which leeds to the following plot:

require(plotrix)
cluster.overplot(x,y)
plot(cluster.overplot(x, y ), main=”Using cluster.overplot”)

Count.overplot {plotrix}

We have seen, that all of the methods that were described above still rely on a visual representation of each overlapping point of data. This still can result in very dense plots that are hard to interpret. The next few methods try to solve this problem.

The function count.overplot tries to give a more accurate representation of overlapping data by not plotting every point on slightly altered coordinates, but by placing a numerical count of the overlapping data-points on the right coordinate. This results in a very accurate plot, which still may be difficult to interpret, though. Using this method will result in a plot that does not give us a feel of the localized density of the data and thereby may be misrepresenting the data. It should only be used when described extensively.

count.overplot(x, y, main=”Using count.overplot”, xlab=”X-Axis”, ylab=”Y-Axis”)

Sizeplot {plotrix}

The next method adjusts the size of the plotted points. Since the relation between number of overlapping points and the increase in size can be adjusted, this method is suitable for large sets of data.

sizeplot(x, y, main=”Using sizeplot”, xlab=”X-Axis”, ylab=”Y-Axis”)

– – — — —– ——–

R-Sessions 12: Basic Graphics

Rense Nieuwenhuis — Mon, 18 Aug 2008 14:39:44 +0000

Introduction

Producing graphics can be a way to get familiar with your data or to strongly present your results. Fortunately, this can be done both easy as well as in a very powerful way in R-Project. R-Project comes with some standard graphical functions and a package for Trellis-graphics. Here, we will see some of the basics of the standard graphics functionality of R-Project.

R-Project creates graphics and presents them in a ‘graphics device’. This can be a window on the screen, but just as easily a file in a specified format (such as .bmp or .pdf). There are two types of functions that create graphics in R-Project. One of those sets up such a graphics device by calculating and drawing the axes, plot title, margins and so on. Then the data is plotted into the device. The other type of graphics function cannot create a graphics device and only adds data to a plot. This paragraph shows only the first type of plotting-functions.

Basic Plotting

The most basic plot-function in R-Project is called ‘plot()’. It is a function that sets up the graphics-device and is able to create some different types of graphic representations of your data.

For instance, let’s say we want to visualize a set of five values: 3,4,5,6 and 5. In the syntax below, these values are first assigned to the variable ‘y’. Then, we call the plot()-function and tell it to plot the data assigned to y.

y <- c(3,4,5,6,5)
plot (y)
plot (y, type=”l”, main=”Example line-plot”, xlab=”Predictor Value”, ylab=”Outcome Value”)

Although the syntax of the first plot-command is very simple, R-Project actually performs quite a bit of work for us. For example: a new window opens, the plotting area is set alongside margins, minimal and maximum values for the axes are calculated based on the data and drawn succeedingly, basic labels are added to the aces and finally: the data is represented. Obviously this plot is not ready for publication, but fortunately all the ‘choices’ R-Project made for us are only the defaults, so we can easily specify exactly what we want.

The next plot already looks a bit better. This is because some extra specifications are added to the second plot-command in the syntax above. By specifying “type=”l” we tell the plot-function that we want the data-points to be connected using a line. The main=” ” specification creates a header for the plot, while the xlab=” ” and ylab=” ” specify the labels for the x-axis and y-axis respectively.

x <- c(1,3,4,7,9)
plot (x,y, type=”b”, main=”Example plot”, xlab=”Predictor Value”, ylab=”Outcome Value”)

What if the values that we want to represent are related to predictor values (values on the x-axis) that are not evenly spread, such as in the graphics above? In that case, we have to specify the values on the x-axis to the plot()-function. The syntax above shows how this is done. First, we assign some values to the variable we call x. Then, we replicate the plot()-syntax from above and add the x-variable to it, before the y-variable. Additionally the type=”l” from above is changed into type=”b” (b stands for both), which results in plotting both a line as well as points. The plot this results in, is shown below.

Other types of plots

Statistics does not exist solely out of line- and points-graphics. The syntax below shows how the represent the data stored in the y-variable can be represented using a barplot, pie-chart, histogram and a boxplot. These types of graphics are only shown, not described exhaustingly. All of these functions have many parameters that can be used to create exactly what you want.

barplot(y, main=”Barplot”, names.arg=c(“a”,”b”,”c”,”d”,”e”))
pie(y, main=”Pie-chart”, labels=c(“a”,”b”,”c”,”d”,”e”))
hist(y, main=”Histogram”)
boxplot(y, main=”Boxplot”)

The syntax above results almost exactly in the graph shown above. The only difference is, that normally R-Project would create four separate graphs when the syntax above is provided. For an explanation of how to place more graphs on one graphics device, see elsewhere in this manual.

All the graphics functions shown here have the main=” ” argument specified. The barplot() function has the additional names.arg – argument specified, which here provides five letters (“a” to “e”) as labels for the bars. On the pie-chart this is done as well, but with the label-argument.

– – — — —– ——–