R-Sessions 02: Why R-Project?

There are many good reasons to start using R. Obviously, there are some reasons not to use R, as well. Some of these reasons are shortly described here. In the end, it is just some kind of personal preference that leads a researcher to use one statistical package, or another. Here are some arguments as a base for your own evaluation.

Why use R?

Powerful & Flexible

Probably the best reason to use R is its power. It is not so much a statistical software, but more a statistical programming language. This results in the availability of powerful methods of analyses, but in strong capabilities of managing, manipulating and storing your data. Due to its data-structure, R gains a tremendous flexibility. Everything can be stored inside an object, from data, via functions to the output of functions. This allows the user to easily compare different sets of data, or the results of different analyses just as easy. Because the results of an analysis can be stored in objects, parts of these results can be extracted as well and used in new functions / analyses.
Besides the many already available functions, it is possible to write your own. This results in flexibility that can be used to create functions that are not available in other packages. In general: if you can think of it, you can make it. Thereby, R becomes a very attractive choice for methodological advanced studies.

Excels in Graphics

The R-project comes with two packages for creating graphics. Both are very powerful, although the lattice package seems to supersede the basic graphics package. Using either of these packages, it is very easy, as well a fast, to create basic graphics. The graphics system is set up in a way that, again, allows for great flexibility. Many parameters can be set using syntax, ranging from colors, line-styles, and plotting-characters to fundamental things such as the coordinate-system. Once a plot is made, many items can be added to it later, such as data-points from other data-sets, or plotting a regression-line over a scatterplot of the data.
The lattice-package allows the graphic to be stored in an object, which can later be used to plot the graphic, to alter the graphic or even to let the graphic to be analyzed by (statistical) functions.
A great many graphical devices are available that can output the graphics to many file-formats, besides to the screen of course. All graphics are vector-based, insuring great quality even when the graphics are scaled. Graphic devices for bitmap-graphics are available as well.

Open Source & Free

R software is open source, meaning that everybody can have access to the source-code of the software. In this way, everybody can make their own changes if he wants to. Also, it is possible to check the way a feature is implemented. In this way, it is easy to find bugs or errors that can be changed immediately for your own version, or generally in the next official version. Of course, not everyone has the programming knowledge to do so, but many users of R do. Generally, open-source software is characterized by a much lower degree of bugs and errors than closed-software.
Did I already mention that it is free? In line with the open-source philosophy (but not necessarily so!), the R-software is available freely. In this it gains advantage to many other statistical packages, that can be very expensive. When used on a large scale, such as on universities, the money gained by using R instead of other packages, can be enormous.

Large supporting user base

R is supported by a very large group of active users, from a great many disciplines. The R-Core development group presently exists of seventeen members. These people have write-access to the core of the R program (for the version that is distributed centrally. Everybody has write-access to the core of their own version of the software). They are supported by many that give suggestions or work in collaboration with the R-code team.
Besides a good, strong, core, statistical software needs a great many functions to function properly. Fortunately, a great many R users make their own functions available to the R community, free to download. This results in the availability of packages containing functions for methods that are frequently used in a diversity of disciplines.
Next to providing a great many functions, the R community is has several mailing-lists available. One of these is dedicated to helping each other. Many very experienced users, as well as some members of the R-core development team, participate actively on this mailing list. Most of the times, you’ll have some guidance, or even a full solution to your problem, within hours.

Why not to use R?


In general, due to the very open structure of R, it tends to be slower than other packages. This is because the functions that you write yourself in R are not pre-compiled into ‘computer-language’, when they are run. In many other statistical packages, the functions are all pre-compiled, but this has the drawback of losing flexibility. On the other hand, when using the powerful available functions and using these in smart programming, speed can be gained. For instance, in many cases ‘looping’ can be avoided in R by using other functions that are not available in other packages. When this is the case, R will probably win the speed-contest. In other cases, it will probably lose.
One way of avoiding the speed-drawback when programming complex functions, is to implement C or Fortran programs. R can have access to programs in both languages, that are both much faster than un-compiled syntax. By using this method, you can place the work-horse functions in a fast language and have these return the output to R, which then can further analyze these.

Chokes on large data-sets

A somewhat larger draw-back of R is that it chokes on large data-sets. All data is stored in active memory, and all calculations are ‘performed’ there as well. This leads to problems when active memory is limited. Although modern computers can easily have 4 Gb (or even more) of RAM, using large data sets and complex functions, you can easily run into problems. Until a disk-paging element is implemented in R, this problem does not seem to be fully solved easily.
Some attempts have been made though, that can be very useful in some specific cases. One package for instance allows the user to store the data in a MySQL database. The package then extracts parts of the data from the database several times to be able to analyze these parts of the data succeedingly. Finally, the partial results are combined as if the analysis was performed on just the whole set of data at once. This method doesn’t work for all functions, though. Only a selection of functions that can handle this methodology is available at present.

No point-and-click

I don’t really know it this is a true drawback on the R software, but it doesn’t come with a point-click-analyse interface. All commands need to be given as syntax. This results in a somewhat steeper learning-curve compared to some other statistical programs, which can be discouraging for starting users. But, to my opinion this pays of on the long term. Syntax seems to be a lot faster, and more precise, when working on complex analyses.
Mentioning this as a draw-back of R is not entirely fair, since John Fox wrote a package R Commander, which provides in a point-and-click interface. It is freely available, as all packages, and can be used a an introduction to the capabilities of R.

- – — — —– ——–

- – — — —– ——–
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
——– —– — — – -

Latest Comments

  1. Dr.D.K.Samuel says:

    I would be happy if the posts are available as 1 pdf / set of pdf files for easier D/L and printing

Leave a Reply