class: center, middle, inverse, title-slide # rOpenSci, peer review, statistical software, and testing ## SatRdays
Neuchatel
anywhere ### Mark Padgham
rOpenSci
Münster, Germany ### Saturday 14th March, 2020 --- class: left, middle, inverse .left-column[
mpadge
ropensci ] .right-column[
bikesRdata
.small[mark@ropensci.org]<br><br>
mpadge.github.io ] .box-bottom[ slides at <br> [https://github.com/mpadge/satRday-neuchatel-2020](https://github.com/mpadge/satRday-neuchatel-2020) ] --- [](https://ropensci.org) --- ## rOpenSci packages ```r url <- "https://ropensci.github.io/roregistry/registry.json" x <- jsonlite::fromJSON(url)$packages names (x) ``` ``` ## [1] "name" "description" "details" ## [4] "maintainer" "keywords" "github" ## [7] "status" "onboarding" "on_cran" ## [10] "on_bioc" "url" "ropensci_category" ``` ```r nrow (x) ``` ``` ## [1] 386 ``` --- ## rOpenSci package categories ```r table (x$ropensci_category) ``` ``` ## category n ## altmetrics 2 ## data-access 125 ## data-analysis 4 ## data-extraction 6 ## data-publication 5 ## data-tools 25 ## data-visualization 5 ## databases 9 ## geospatial 22 ## http-tools 17 ## image-processing 5 ## literature 30 ## scalereprod 32 ## security 8 ## taxonomy 10 ``` --- class: center <!-- --> .large[ [github.com/ropensci](https://github.com/ropensci) <br> [ropensci.org](https://ropensci.org) ] --- class: center <!-- --> .large[ What can you do to engage with rOpenSci? ] --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What can you do to engage with rOpenSci? ### As a package user - Give feedback on [discuss.ropensci.org](https://discuss.ropensci.org) or via github issues - Submit a Use Case to [discuss.ropensci.org](https://discuss.ropensci.org) - Participate in regular [Community Calls](https://ropensci.org/commcalls) - Ping @ropensci on twitter --- along with package authors --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What can you do to engage with rOpenSci? ### As a developer - Read the [rOpenSci Developer Guide](https://devguide.ropensci.org/contributingguide.html) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What can you do to engage with rOpenSci? > 15.1 Why contribute to rOpenSci packages? > > In general, as explained by Kara Woo in her talk at the CascadiaR conference, contributing to R packages allows you to make things work the way you want (by adding some functionality to your favorite package), can lead to opportunities and allows you to learn about package development. > > ... we strive to make contributing a good experience... we are creating social infrastructure through a welcoming and diverse community --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What can you do to engage with rOpenSci? ### As a code contributor - Read the [rOpenSci Developer Guide](https://devguide.ropensci.org) - Use packages, contribute to or open<br>issues, make pull requests ### As a developer - Offer to help with / help maintain a package - Develop your own package - Open a pre-submission enquiry (= issue) on [github.com/ropensci/software-review](https://github.com/ropensci/software-review) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci Software Categories ``` ## category n ## 1 altmetrics 2 ## 2 data-access 125 ## 3 data-analysis 4 ## 4 data-extraction 6 ## 5 data-publication 5 ## 6 data-tools 25 ## 7 data-visualization 5 ## 8 databases 9 ## 9 geospatial 22 ## 10 http-tools 17 ## 11 image-processing 5 ## 12 literature 30 ## 13 scalereprod 32 ## 14 security 8 ## 15 taxonomy 10 ``` -- ... but [R is a language for statistics](https://github.com/ropensci/software-review/issues/331), right? --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% class: center, middle <!-- --> --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software ### Why Not? Because it's really difficult ### Why? Because its really important -- - The [R Validation Hub](https://www.pharmar.org/about/) are doing it, but exclusively<br>for the bio-pharmaceutical industry. - We will be (co-)developing a generalised methodology -- - The [R Validation Hub](https://www.pharmar.org/about/) has 44 organisations yet relatively little money - rOpenSci has a board of 6 members, around 1.5 full-time staff, and some money --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software > We’ll be working with the new board and the broader statistical software community to develop a set of agreed-upon standards for statistical package implementation and testing, then launching a new peer-review process and testing tools. (rOpenSci blog 15 July 2019) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software ### ~~Why Not? Because~~ it's really difficult -- .left-column[ - Bayesian & Monte Carlo - Dimensionality & Feature Reduction - Machine Learning - Regression, Splines, & Interpolation - Statistical Indices and Scores - Visualisation - Probability Distributions ] .right-column[ - Wrapper Packages - Categorical Variables - Networks - Exploratory Data Analysis (EDA) - Survival Analysis - Workflow Software - Summary Statistics - Spatial Analysis - Educational Software ] --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software > We’ll be working with the new board and the broader statistical software community to develop a set of agreed-upon standards for statistical package implementation and testing, then launching a new peer-review process and testing tools. (rOpenSci blog 15 July 2019) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software "agreed-upon standards for statistical package implementation and testing" .left-column[ - Bayesian & Monte Carlo - Dimensionality & Feature Reduction - Machine Learning - Regression, Splines, & Interpolation - Statistical Indices and Scores - Visualisation - Probability Distributions ] .right-column[ - Wrapper Packages - Categorical Variables - Networks - Exploratory Data Analysis (EDA) - Survival Analysis - Workflow Software - Summary Statistics - Spatial Analysis - Educational Software ] --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software "agreed-upon standards for ~~statistical package implementation and~~ testing" .left-column[ - Bayesian & Monte Carlo - Dimensionality & Feature Reduction - Machine Learning - Regression, Splines, & Interpolation - Statistical Indices and Scores - Visualisation - Probability Distributions ] .right-column[ - Wrapper Packages - Categorical Variables - Networks - Exploratory Data Analysis (EDA) - Survival Analysis - Workflow Software - Summary Statistics - Spatial Analysis - Educational Software ] --- background-image: url(images/ropensci-bg-dark.png) background-size: contain background-position: 0% 50% class: inverse # TESTING --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What is testing? ### Concrete Testing Testing of concrete inputs and outputs ### Property-based Testing Testing functional responses based on the general *properties* of their inputs and outputs --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What is testing? - Alternative approaches ### Concrete Testing - R via `testthat` and a few other packages - python via `pytest` and lots of other packages - inbuilt in `rust` (the benchmark) ### Property-based Testing - `python` via [`hypothesis`](https://hypothesis.works/) (the benchmark) - `rust` via [`quickcheck`](https://github.com/BurntSushi/quickcheck), [`proptest`](https://lib.rs/crates/proptest), and others --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ### Typical `roxygen` function documentation lines ```r #' @param x The first input #' @param y The second input #' @param z The third and last input #' @return The output value #' @export f <- function(x, y, z) { # function definition } ``` -- `roxygen2` is extensible, and the [`roxytest` package](https://github.com/mikldk/roxytest) enables concrete tests to be specified directly in function documentation. --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ### Typical `roxygen` function documentation<br>plus property-based testing lines ```r #' @param x The first input #' @param y The second input #' @param z The third and last input #' @return The output value #' @export #' #' @given x integer #' @given y numeric #' @given z character #' @expect is.integer(f(x, y, z)) #' @expect length(f(x, y, z)) == 1 f <- function(x, y, z) { # function definition } ``` --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r f <- function(x, y, z) { # function definition } for (i in seq(ntrials)) { x <- ceiling(runif(1, min = -Inf, max = Inf)) y <- runif(1, min = -Inf, max = Inf) tryCatch ( res <- f(x, y, z = "<string>"), error = function(e) e ) } ``` Property-based testing tests responses<br>to ranges of input values --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r f <- function(x, y, z) { # function definition } for (i in seq(ntrials)) { tryCatch ( len <- ceiling(runif(1e6)) x <- ceiling(runif(len, min = -Inf, max = Inf)) y <- runif(1, min = -Inf, max = Inf) res <- f(x, y, z = "<string>"), error = function(e) e ) } ``` Property-based testing tests responses<br>to structures of input values --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r #' @given x integer #' @given y numeric #' @given z character #' @given length(x) <= 10 #' @given res = f(x, y, z) #' @expect res is silent f <- function(x, y, z) { # function definition } ``` --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r #' @given x integer #' @given y numeric #' @given z character #' @given length(x) <= 10 #' @given res = f(x, y, z) #' @expect res is silent #' #' @given length(x) == 10 #' @given res = f(x, y, z) #' @expect res is silent #' @report res is error f <- function(x, y, z) { # function definition } ``` - Bug reports use the same grammar as tests --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r #' @given x integer #' @given y numeric #' @given z character #' @given length(x) <= 10 #' @given res = f(x, y, z) #' @expect res is silent #' #' @given viz = TRUE #' @report f(x, y, z, viz) is error #' @request f(x, y, z, viz) produces interactive graphical output f <- function(x, y, z, int) { # function definition } ``` - Feature requests use the same grammar as tests -- - Feature requests are pull requests are direct code contributions --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? - Lots of prior work - Efforts underway to incorporate/adapt within python - R is an opportunity in waiting - [cucumber.io](https://cucumber.io) -- ## Property-based testing can build community --- background-image: url(images/ropensci-bg-dark.png) background-size: contain background-position: 0% 50% class: left, middle, inverse ## Please Help! Please Contribute! ## [Community Call](https://ropensci.org/commcalls/): Wed 18th March .left-column[
mpadge
ropensci ] .right-column[
bikesRdata
.small[mark@ropensci.org]<br><br>
mpadge.github.io ] .box-bottom[ slides at <br> [https://github.com/mpadge/satRday-neuchatel-2020](https://github.com/mpadge/satRday-neuchatel-2020) ]