Preparing dendrochronological datasets in R

1. Image scans

I prepared and scanned seven dendrochronological datasets at the PNW Research Station between Nov. 2017 and July 2018. Imagery was scanned at either 1200 or 2400 DPI using a mixture of GraphicsGale and Microsoft Paint software. Of these datasets, four were fully measured and analyzed.

2. Tree-ring measurements

All measured samples were measured using CooRecorder (v. 9.0 & 9.3 beta) software. Within-site samples were crossdated using both CooRecorder its associated CDendro software. Pith placements were estimated where possible during ring measurement. CDendro was then used to compile site-level sets of tree-ring widths (.RWL filetype) in the Tucson format. Distance to pith measurements were saved separately as .TXT filetypes via options in CDendro.

article-preparingdendro-coorecorder.PNG
CooRecorder’s measuring tools with a red pine (Pinus resinosa) core.

If samples were deemed problematic or affected by external impacts, their measurements were removed from the respective ring-width measurement files.*

3. Basal area increment conversion

For all of these studies, we converted the raw ring-width measurements (RWL) to basal area increments (BAI) using R (v. 3.x), the RStudio IDE, and the dplR package for R.

I loaded a site’s RWL and distance-to-pith measurements into R, and used the bai.in() function provided by dplR.†

An example is seen below:

eef.rwl <- read.rwl("./WA Entiat Experimental Forest/data/analysis/treeRWL.rwl",
format = "tucson",
header = TRUE)

eef.dtp <- read.csv("./WA Entiat Experimental Forest/data/analysis/treeDTP.txt", # distance-to-pith file
sep = " ")

eef.bai <- bai.in(eef.rwl, eef.dtp)

Which produces a wide-format file that looks something like this:

##      E892145A E892146A  E892147A  E892147B  E892148A
## 2011 2527.575 573.0218 1115.5698 1800.5209  771.4982
## 2012 2205.700 422.5929  713.5170 1293.9479  739.9371
## 2013 2600.176 592.5396  629.9914  901.0342 1116.0354
## 2014 2072.956 734.2870  298.1104  477.8815  987.0872
## 2015 1588.834 625.2938  230.9686  350.8958  783.2345
## 2016 1791.964 564.9881  206.9131  444.4750  885.3988

Trees were then averaged together using dplR’s treeMean() and read.ids() functions:

  1. ids <- read.ids(eef.bai, stc = c(1, 6, 1))
  2. bai <- treeMean(eef.bai, eef.ids, na.rm = TRUE)

The stc part describes the name structure of individual core time series, e.g., stc(1, 6, 1) translates to a length of 1 for site ID, 6 for tree ID, and 1 for core ID: E892145A is separated by E, 892145, and A. na.rm being TRUE means that missing values are not taken into account when creating a new tree-level mean. By default, this setting is FALSE, and only the shared time series period is averaged together.‡

I eschewed dplR’s preference for the wide-format time series in favor of thin-format time series which make analysis and visualization both much easier and faster. This is best accomplished using the tidyverse library. For example, and with further edits:

eef.bai %
mutate(year = rownames(eef.bai)) %>%
filter(year >= 1992) %>%
gather("siteTreeID", "bai", -"year", na.rm = T) %>%
mutate(siteTreeID = str_replace(siteTreeID, "^([0-9]{6})$", "E\\1"),
bai_ln = log(bai + 1))

To break this down:

  1. bai %
  2. This states that eef.bai will be replaced by the edits we’re about to make. The %>% is the ‘pipe-down’ shortcut to make code more readable and chain multiple functions together.

  3. mutate(year = rownames(eef.bai)) %>%
  4. The mutate() function lets you create new variables within a dataframe. dplR keeps years as row names, which is not helpful for us, so here we’ll pulling the row names out of the file and creating a new column of referenceable data.

  5. filter(year >= 1992) %>%
  6. The filter() function lets us filter out data we ultimately won’t be interested in. For this study, we’ll only be looking at BAI measurements between 1992 and 2016.

  7. gather("siteTreeID", "bai", -"year", na.rm = T) %>%
  8. gather() thins wide-format data. siteTreeID is what I’m naming the new column of gathered wide-format column names, e.g., E892145. bai is the name for the collapsed column of BAI measurements. -“year” ensures our year data is not included in the translation to thin-format, and remains its own column. na.rm = T cuts waste, so we’re not keeping rows without any BAI measurements.

  9. mutate(siteTreeID = str_replace(siteTreeID, "^([0-9]{6})$", "E\\1"), bai_ln = log(bai + 1))
  10. This chunk of text creates a new variable – bai_ln – and edits another one – siteTreeID. bai_ln is a new log-transformed column of BAI measurements, which will likely be the focus of analysis. The str_replace() function being used on the siteTreeID column is a correction to another issue with dplR: The treeMean() function, which assumes only one site exists, removes the site ID from the tree IDs, so E892145 becomes 892145. This adds an ‘E’ to the front of each unique siteTreeID, correcting for that issue.

With that, our final data looks like this…

##    year siteTreeID      bai   bai_ln
## 1  1992    E892145 2704.793 7.903150
## 2  1993    E892145 3317.640 8.107310
## 3  1994    E892145 2976.125 7.998713
## 4  1995    E892145 3139.076 8.052002
## 5  1996    E892145 3032.253 8.017391
## 6  1997    E892145 3670.383 8.208324
## 7  1998    E892145 2905.006 7.974535
## 8  1999    E892145 2734.899 7.914215
## 9  2000    E892145 2871.042 7.962778
## 10 2001    E892145 2974.103 7.998034

4. Further steps

This has been the go-to formula for crating a solid dendrochronolgoical dataset for analysis, and one I used with all four measured datasets. The prior step, specifically its use of different tidyverse functions, is the only one that varies between sites based on the goals of the study and the shape of the data (e.g., site ID structure). This means I can use a single file organization and naming structure across all studies with very little variation, including in the R project design.

Measuring and crossdating in CooRecorder and CDendro takes time — especially if the tree species produce porous rings (in which case, scanning at a higher DPI or even applying image filters helps) — but once that’s done it’s a short road to reconstructing your BAIs and putting together functional climate models.

If I ever have the time, I’d love to try implementing suggested improvements to many of the R functions used throughout this article.


* For one dataset, I ran some exploratory data analysis using latewood as raw ring measurements and basal area increments. Late- and earlywood can be uniquely measured and written to .RWL filetype within CooRecorder and CDendro. I am lost as to the accuracy of using inside-out reconstructed (i.e., bai.in()) BAIs on late-  or earlywood exclusively, however, as the distance-to-pith measurements come from complete ring measurements. I would wager the resulting pattern is accurate — e.g., exogenous constraints on tree growth, such as insect outbreaks or drought, are still identifiable — but the actual measurements might be questionable.

† Distance-to-pith files had to be manually edited prior to being loaded into R to remove extraneous text and spaces CDendro inserts at the top of the file.

‡ One setback with dplR is that it has issues with multi-site datasets, likely because it was built with the assumption that users would detrend and create chronologies for individual sites. For the example above, E892145A, E89 is the actual site ID, and 2145 the tree ID. dplR does not work with this, so I’m having it read the site and tree IDs together. Unfortunately, a structure of 0, 7, and 1 instead of 1, 6, and 1 fails.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s