I recently encountered a problem while bundling an old C library into a
new R package. The library itself depends on, and includes, an external
“dictionary” in plain text format used to construct a large lookup
table. The creators of this library of course assume that this
dictionary file will always reside in the same directory as the compiled
object, and so can always be directly linked. The src
directory of R
packages is, however, only permitted to contain source code, which text
files definitively are not. This blog entry is about where to put such
files, and how to link them within the source code.
The answer turns out to be very simple, yet was nevertheless one which
occupied a couple of days of my time, hence this documentation for the
sake of posterity. As with many “external” files within R packages, the
recommended locations is within the inst
directory, or some
sub-directory thereof. Any files within this directory will be copied
“recursively to the installation directory” (from Writing R
Extensions).
Such files can nevertheless not be called directly from any src
code, because there is no way for a compiled source object to find them
– relative paths can not be used, because they will be implemented
relative to the directory from which the compiled object is called.
Tests, for example, will call the compiled object from the ./tests
directory, while direct use within the package directory will call from
.
. For general usage, the directory from which the object is called
could be anywhere, and external files can not be linked. In other words,
it is not possible to directly link a compiled object in a R package
with other package-local files, because the only “local” in R is the
currently working directory.
It is thus necessary to step back “out” from the source into the R
environment to obtain the path to the external file – in my case, to the
dictionary. This information needs somehow to be fed to the source code
whenever and wherever the package is used: precisely the kind of job for
which the .onLoad()
function is intended. An additional problem in my
particular case was that the source code relied very extensively on
defining the dictionary file through a simple C macro:
#define MY_DICTIONARY "dictionary.txt"
Literally dozens of functions then call that simple macro to read from
the dictionary. Rewriting all of them to accept a dynamic parameter
defining the location would have been way too much work, and so I
urgently needed a simpler solution. The easiest turned out to be to use
environmental variables, which are universally accessible by any
programming language. I just needed to define and write the
environmental variable of the package dictionary in the .onLoad()
function as,
Sys.setenv ("DICT_DIR" = system.file (package = "my_package", "subdir", "my_dict.txt"))
Accessing this within the source code was then as simple as defining an equivalent function in C to read that variable:
char * getDictPath()
{
char *ret = getenv("DICT_DIR");
return ret;
}
and then replacing the hard-coded macro with a functional equivalent:
#define MY_DICTIONARY getDictPath()
The entire bundled source then remained intact, with the getDictPath()
function returning the appropriate path as defined within R itself, and
accessible through the system.file()
function, and leaving the C code
able to simply call the macro MY_DICTIONARY
to access the local copy
of that file.
Credit and gratitude to Iñaki Ucar and Martin Morgan for suggestions on the r-package-devel mailing list.
Copyright © 2019--22 mark padgham