June 2015 – Bradley J Eck, PhD

I’ve been using R for a while now — since 2008!?! But I’ve only recently started writing simple packages. From another user’s perspective, a package is the easiest way to use code someone else wrote. Happily, getting from scripts that work to a simple package is a short journey.

Compared to a set of scripts, packages are easier for others to use because of documentation and namespace. If you are looking at this page, you have probably spent some time reading documentation for R. I’m sure you’ll agree that documented functions are much easier to use than undocumented ones. Namespace is a somewhat more obscure concept in R programming. Namespaces specify how R looks for things and in this way control the visibility of functions. In packages, namespaces make some functions easily accessible while making other functions less visible. It’s a bit like the difference between public and protected members of a java class. As shown in the example below, putting a few special comment lines in your existing code will sort out both documentation and namespace.

There is a lot of material online already about what R packages are and different perspectives on how to write them. Three references in particular come up frequently:

Creating R Packages: A Tutorial by Friedrich Leisch
R packages by Hadley Wickham
Writing R Extensions on CRAN

In this post, I describe a workflow of developing simple R packages that is heavily influenced by Wickham’s work. There is also an illustrative example. By “simple” package, I mean that there is no compiled code as part of the package — the code you want to share is pure R.

Workflow

1. Create a directory structure

Packages are a collection of files organized into directories. A simple package needs only a few directories.

package-name
    + man
    + R
    + tests
    DESCRIPTION

The top level folder is the name of your package. The man folder will contain files for making the package manual. The R folder holds the source code for your package. A directory of tests contains tests of the code in your package. The DESCRIPTION file gives a short description of the package.

2. Develop and test your code

The staring point for most packages is a set of scripts that work. Put these in the R folder. Unit tests are indispensable to make sure that the code actually works the way you think it does. These go in the tests folder. A unit testing framework such as the testthat package helps to organize and run these.

The way this works for me is to open an R session and make package-name/tests the working directory. As I write a function in the R folder, I run tests from the tests folder.

3. Create Documentation & Namespace

Documentation for R packages should be in the .rd format. The roxygen2 package creates documentation in the .rd format automatically from the comments in your source files. It also creates a namespace file that tells R about the visibility of the package functions.

> setwd("package-name")
> library(roxygen2)
> roxygenize(".")

4. Build and check the package

Once the package has code and documentation, it can be built and checked from the shell using the Rtools.

$ R CMD build path/to/package-name
$ R CMD check package-name_x.y.z.tar.gz

5. Share

After your package passes R CMD check, you can have good confidence that the package will behave in a reasonable way for your users.

Example

A simple example helps illustrate this process. Let’s say that I want to share functions to compute the geometric and harmonic mean of a vector of numbers. Along with the arithmetic mean, these are the three classical Pythagorean means. But only the arithmetic mean is built into base R. So I’m going to make a package called pythagmeans with these functions.

1. Create directory structure

In addition to the directory structure, you also need a description file.

pythagmeans
    + man
    + R
    + tests
    DESCRIPTION

A minimal description file looks like this.



Package: pythagmeans

Type: Package

Title: Pythagorean Means

Version: 0.0.1

Date: 2015-06-16

Author: Bradley J. Eck

Maintainer: Bradley Eck <brad@bradeck.net>

Description: Compute the three classical Pythagorean means: arithmetic, geometric, and harmonic.

License: MIT

2. Develop and test

This is a small package so I have only two source files: the package source and a file of tests.

The package source file has five functions. There is one function for each type of mean (arithmetic, geometric, harmonic) and there are two helper functions. The helper functions check the agrument and give errors. The comments character #’ denotes a roxygen comment that is parsed for use in building the documentation. The @export tag notes that the function is made available to the package user. Note that the utility functions are not exported.


# File:  pythagmeans/R/PythagoreanMeans.r 

#' Arithmetic Mean
#'
#' Computes the arithmetic mean of a vector of numbers
#'
#' @export 
#' @param x vector of numbers without NAs
arithmetic_mean <- function( x ) { 
  argCheck(x)
  am <- mean(x) # used the built-in function
  return( am )
}

#' Geometric Mean
#'
#' Computes the geometric mean of a vector of numbers
#'
#' @export 
#' @param x vector of numbers without NAs
geometric_mean <- function ( x ) {
  argCheck(x)
  n <- length(x) 
  gm <- prod(x)^(1/n)
  return( gm ) 
}

#' Harmonic Mean
#'
#' Computes the harmonic mean of a vector of numbers
#'
#' @export 
#' @param x vector of numbers without NAs
harmonic_mean <- function( x ) {
  argCheck(x)
  n <- length(x) 
  hm <- n / sum( reciprocal( x ) ) 
  return( hm ) 
}

# confirm argument is numeric and without NAs 
argCheck <- function( x ){ 
   if( is.numeric(x) == FALSE ){ stop("argument must be numeric")} 
   if( max( is.na(x) ) > 0 ){ stop("NA values not allowed") }
}

# a helper function  to compute reciprocals check if a zero is present  
reciprocal <- function( x ) { 
  if(  max( x == 0 ) > 0 ){ stop("zeros not allowed in x") }
  recip <- 1/x
  return (  recip ) 
}

The file of test code looks like this. To run the tests just call >test_dir(".") from an R session in the \tests directory.


# File: pythagmeans/tests/test_PythagoreanMeans.r

library(testthat)

# assumes /tests is the working directory  
source("../R/PythagoreanMeans.r")

context("verify means")
test_that("arithmetic mean is correct",{
  x <- c(1,2,3)
  expect_equal( arithmetic_mean(x), expected = 2 ) 
})

test_that("geometric mean is correct",{  
  x <- c(1,2,3)
  expect_equal( geometric_mean(x), expected = 1.817121, 
                tolerance = 0.000001,  scale = 1 )
})

test_that("harmonic mean is correct",{ 
  x <- c(1,2,3)
  expect_equal( harmonic_mean(x), expected = 1.636364, 
                tolerance = 0.000001, scale = 1 ) 
})

context("throwing errors") 
test_that( "error on NA entry",{
  x<- c(1,NA,3)
  expect_error( arithmetic_mean(x), "NA values not allowed" )
})

test_that( "error on 0 entry for harmonic_mean",{
  x<- c(1,0,3)
  expect_error( harmonic_mean(x), "zeros not allowed" )
})

3. Generate documentation

With the roxygen2 package, generating documentation becomes automatic. The NAMESPACE file is also generated automatically.


> setwd( "/pythagmeans")
> library(roxygen2)
> roxygenize(".")

4. Build & check

Once the tests pass and the documentation is generated you're ready to build and check the package.


$ R CMD build path/to/pythagmeans
$ R CMD check pythagmeans_0.0.1.tar.gz

5. Share, Install, Use

These functions are now easily shared with colleagues as the archive file built built by R CMD build.

$ R CMD INSTALL pythagmeans_0.0.1.tar.gz

After installation the exported functions are available by loading the package in an R session.

> library(pythagmeans)