In 2017 and 2018, I worked in a forestry research laboratory in Washington State. While I worked primarily with R and specialty software like CDendro, the managers I worked with had little-to-no familiarity with my methods. I wrote this report for them as a means to explain my methods in R and create a pipeline that can be reused across every dendrochronological dataset. I’m sharing this here as a means to preserve that pipeline for future analyses.
1. Image scans
I prepared and scanned seven dendrochronological datasets at the PNW Research Station between Nov. 2017 and July 2018. Imagery was scanned at either 1200 or 2400 DPI using a mixture of GraphicsGale and Microsoft Paint software. Of these datasets, four were fully measured and analyzed.
2. Tree-ring measurements
All measured samples were measured using CooRecorder (v. 9.0 & 9.3 beta) software. Within-site samples were crossdated using both CooRecorder its associated CDendro software. Pith placements were estimated where possible during ring measurement. CDendro was then used to compile site-level sets of tree-ring widths (.RWL filetype) in the Tucson format. Distance to pith measurements were saved separately as .TXT filetypes via options in CDendro.

If samples were deemed problematic or affected by external impacts, their measurements were removed from the respective ring-width measurement files.*
3. Basal area increment conversion
For all of these studies, we converted the raw ring-width measurements (RWL) to basal area increments (BAI) using R (v. 3.x), the RStudio IDE, and the dplR
package for R.
I loaded a site’s RWL and distance-to-pith measurements into R, and used the bai.in()
function provided by dplR
.†
An example is seen below:
eef.rwl <- read.rwl("./WA Entiat Experimental Forest/data/analysis/treeRWL.rwl",
format = "tucson",
header = TRUE)
eef.dtp <- read.csv("./WA Entiat Experimental Forest/data/analysis/treeDTP.txt", # distance-to-pith file
sep = " ")
eef.bai <- bai.in(eef.rwl, eef.dtp)
Which produces a wide-format file that looks something like this:
## E892145A E892146A E892147A E892147B E892148A
## 2011 2527.575 573.0218 1115.5698 1800.5209 771.4982
## 2012 2205.700 422.5929 713.5170 1293.9479 739.9371
## 2013 2600.176 592.5396 629.9914 901.0342 1116.0354
## 2014 2072.956 734.2870 298.1104 477.8815 987.0872
## 2015 1588.834 625.2938 230.9686 350.8958 783.2345
## 2016 1791.964 564.9881 206.9131 444.4750 885.3988
Trees were then averaged together using dplR
’s treeMean()
and read.ids()
functions:
ids <- read.ids(eef.bai, stc = c(1, 6, 1))
bai <- treeMean(eef.bai, eef.ids, na.rm = TRUE)
The stc
part describes the name structure of individual core time series, e.g., stc(1, 6, 1)
translates to a length of 1 for site ID, 6 for tree ID, and 1 for core ID: E892145A is separated by E, 892145, and A. na.rm
being TRUE
means that missing values are not taken into account when creating a new tree-level mean. By default, this setting is FALSE
, and only the shared time series period is averaged together.‡
I eschewed dplR
’s preference for the wide-format time series in favor of thin-format time series which make analysis and visualization both much easier and faster. This is best accomplished using the tidyverse
library. For example, and with further edits:
eef.bai %>%
mutate(year = rownames(eef.bai)) %>%
filter(year >= 1992) %>%
gather("siteTreeID", "bai", -"year", na.rm = T) %>%
mutate(siteTreeID = str_replace(siteTreeID, "^([0-9]{6})$", "E\\1"),
bai_ln = log(bai + 1))
To break this down:
-
bai %>%
This states that eef.bai will be replaced by the edits we’re about to make. The %>% is the ‘pipe-down’ shortcut to make code more readable and chain multiple functions together.
-
mutate(year = rownames(eef.bai)) %>%
The mutate() function lets you create new variables within a dataframe. dplR keeps years as row names, which is not helpful for us, so here we’ll pulling the row names out of the file and creating a new column of referenceable data.
-
filter(year >= 1992) %>%
The filter() function lets us filter out data we ultimately won’t be interested in. For this study, we’ll only be looking at BAI measurements between 1992 and 2016.
-
gather("siteTreeID", "bai", -"year", na.rm = T) %>%
gather() thins wide-format data. siteTreeID is what I’m naming the new column of gathered wide-format column names, e.g., E892145. bai is the name for the collapsed column of BAI measurements. -“year” ensures our year data is not included in the translation to thin-format, and remains its own column. na.rm = T cuts waste, so we’re not keeping rows without any BAI measurements.
-
mutate(siteTreeID = str_replace(siteTreeID, "^([0-9]{6})$", "E\\1"), bai_ln = log(bai + 1))
This chunk of text creates a new variable – bai_ln
– and edits another one – siteTreeID
. bai_ln
is a new log-transformed column of BAI measurements, which will likely be the focus of analysis. The str_replace()
function being used on the siteTreeID
column is a correction to another issue with dplR
: The treeMean()
function, which assumes only one site exists, removes the site ID from the tree IDs, so E892145 becomes 892145. This adds an ‘E’ to the front of each unique siteTreeID
, correcting for that issue.
With that, our final data looks like this…
## year siteTreeID bai bai_ln
## 1 1992 E892145 2704.793 7.903150
## 2 1993 E892145 3317.640 8.107310
## 3 1994 E892145 2976.125 7.998713
## 4 1995 E892145 3139.076 8.052002
## 5 1996 E892145 3032.253 8.017391
## 6 1997 E892145 3670.383 8.208324
## 7 1998 E892145 2905.006 7.974535
## 8 1999 E892145 2734.899 7.914215
## 9 2000 E892145 2871.042 7.962778
## 10 2001 E892145 2974.103 7.998034
4. Further steps
This has been the go-to formula for crating a solid dendrochronolgoical dataset for analysis, and one I used with all four measured datasets. The prior step, specifically its use of different tidyverse functions, is the only one that varies between sites based on the goals of the study and the shape of the data (e.g., site ID structure). This means I can use a single file organization and naming structure across all studies with very little variation, including in the R project design.
Measuring and crossdating in CooRecorder and CDendro takes time — especially if the tree species produce porous rings (in which case, scanning at a higher DPI or even applying image filters helps) — but once that’s done it’s a short road to reconstructing your BAIs and putting together functional climate models.
If I ever have the time, I’d love to try implementing suggested improvements to many of the R functions used throughout this article.
- For one dataset, I ran some exploratory data analysis using latewood as raw ring measurements and basal area increments. Late- and earlywood can be uniquely measured and written to .RWL filetype within CooRecorder and CDendro. I am lost as to the accuracy of using inside-out reconstructed (i.e.,
bai.in()
) BAIs on late- or earlywood exclusively, however, as the distance-to-pith measurements come from complete ring measurements. I would wager the resulting pattern is accurate — e.g., exogenous constraints on tree growth, such as insect outbreaks or drought, are still identifiable — but the actual measurements might be questionable.
† Distance-to-pith files had to be manually edited prior to being loaded into R to remove extraneous text and spaces CDendro inserts at the top of the file.
‡ One setback with dplR
is that it has issues with multi-site datasets, likely because it was built with the assumption that users would detrend and create chronologies for individual sites. For the example above, E892145A, E89 is the actual site ID, and 2145 the tree ID. dplR
does not work with this, so I’m having it read the site and tree IDs together. Unfortunately, a structure of 0, 7, and 1 instead of 1, 6, and 1 fails.