useR! 2008: Duncan Murdoch and package development


Duncan Murdoch gave an invited lecture on why we need packges in R-Project. He explained that R-Project is normally distributed with only 12 base packages, and 14 recommended packages. Nevertheless, the total number of packages available amounts to more than 1500 (and counting). So, “R is basically its packages”.

He first described other methods used to distribute work that is done in R-Project, mainly distributing complete workspaces or using script files.

Sure you can save the workspace, but it is easy to gorget how some objects were created and you often save more than you would like. It is also possible to save code to script files, and then to run those when needed. This does work, but when the number of functions in the script files increases it rapidly becomes a rather cluttered working process.

So, we need packages. These are small, compact, and complete ways to easily distribute your work. A lot can be stored in packages, such as the functions we wrote in separate script-files and data-sets that are stored in the native R format. But, the package also offers the opportunity to add manual files and vignettes (a specific type of manual file, that also contains R code). But, most of all, R Packages can contain program code in different programming languages, such as for instance C, C++, Fortran, and Objective C. The benefit to be gained by these lower level programming languages are enormous gains in the speed of functions. Murdoch estimated that when loops are transformed from R-code to a variant of C, a speed increase of a factor 100 can be achieved! Also, a package can contain explicit tests, that allow the developer to be warned almost automatically, when an update in the R-Project base code breaks the package.

The rest of the presentation was focused on creating packages in Windows. I will not be detailing that process, so be sure to find his presentation on the conference site when it becomes available.

Leave a Reply