-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
293 lines (187 loc) · 18.4 KB
/
Copy pathREADME.Rmd
File metadata and controls
293 lines (187 loc) · 18.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# modendro: more dendro (tree-ring) functions in R <img src="./man/figures/modendro_hex_logo_skyonly_small.png" width="160" align="right" />
<!-- badges: start -->
<!-- badges: end -->
modendro is a collection of functions to do a variety of tree-ring analyses that are not available elsewhere or are improvements on existing analyses: various utility functions for wrangling tree-ring data, power transformation & detrending, flexible climate-growth correlations to identify the strongest seasonal climate signals, and disturbance detection & removal.
My inclusion of an analysis method in modendro does not imply broad endorsement of the method. My goal is to make available some of these methods that I've found useful in the past but had to write my own code for. Enjoy and I hope some of these are useful for you too!
## Installation
You can install the development version of modendro from [GitHub](https://github.com/) with:
``` r
pak::pak("ctmaher/modendro")
```
I recommend you do this often, as modendro is under active development.
## Example: convert rwl-format data.frames to "long" format - and back
The rwl-format data.frame (rownames are years, colnames are series names) is the standard for the important dplR package, and modendro inherits this. However, this format is inconvenient for many other standard operations in R - a "long" format would be much more useful. Enter modendro's `rwl_longer()` function.
```{r example_1.1}
library(modendro)
data("ps96")
head(ps96)[,1:5] # typical rwl-format
PS.long <- rwl_longer(rwl = ps96,
series.name = "series", # name the series IDs whatever you want
dat.name = "rw.mm", # same for the measurement data
trim = TRUE, # trim off the NAs before and after a series
new.val.internal.na = NULL) # leave NAs internal to the measurement series
# Note that we could replace internal NAs here if there were any (e.g., with 0s)
head(PS.long) # familiar format for R users
```
Once our tree-ring data is in the familiar (to me anyway) long format, we do all kinds of data wrangling, analyses and plotting that we normally do in R. For example, we might want to merge the tree-ring data with site IDs so that we can add site as a random effect in a model or for plotting.
```{r example_1.2}
data("ps.groupIDs")
PS.long.sites <- merge(PS.long, ps.groupIDs, by.x = "series", by.y = "tree")
library(ggplot2)
ggplot(PS.long.sites, aes(year, rw.mm, group = series)) +
geom_line(alpha = 0.5) +
facet_wrap( ~ site, ncol = 1) +
theme(panel.background = element_blank())
```
Another useful application of `rwl_longer()` is when we have several rwl-format files that we want to bind together. This is not intuitive with the rwl-format - the number of rows may not line up and the columns won't either. This is a simple operation in the long format.
We'll borrow some data from the dplR package for this.
We may want to convert this joined data back into a single rwl-format data.frame, e.g., for export or for unifying analyses in modendro or dplR. We can do this with modendro's `longer_rwl()`.
```{r example_1.3}
library(dplR)
data("ca533")
ca533.long <- rwl_longer(ca533,
series.name = "series",
dat.name = "rw.mm")
comb.long <- rbind(PS.long, ca533.long)
# note that longer_rwl() ignores other variables in your long data.frame
# We'll add some columns here to illustrate that
comb.long$foo <- "bar"
comb.long$bar <- "foo"
comb.rwl <- longer_rwl(df = comb.long,
series.name = "series",
dat.name = "rw.mm")
head(comb.rwl)[,1:5]
```
## Example: find ITRDB collections for a specified region and species
A valuable step in checking your cross-dating work is finding pre-existing tree-ring datasets that have been (we assume) properly cross-dated for your study species in your study region. modendro's `ITRDB_search()` does a species- and geographically-defined search of the International Tree-Ring Data Bank and returns basic info on each collection, including links to the resource so that you can download the files from the NCEI website. See the `ITRDB_search()` help file for more info.
```{r example_1.4}
# Find some spruce collections in northern Alaska
AK.spruce <- ITRDB_search(species = c("PCGL","PCMA"), # Picea glauca & Picea mariana
lon.range = c(-165, -140), # the longitude component of the bounding box
lat.range = c(64.5, 70), # the latitude component of the bounding box
limit = 10) # how many records you want to see - choose higher if you want an exhaustive list
# A truncated view of the output
head(AK.spruce[,c("studyName","onlineResourceLink")])
# The full output contains the following information:
colnames(AK.spruce)
```
## Example: read in files from CooRecorder directly into R
`read_pos()` reads files created by the excellent and popular tree-ring measurement software, CooRecorder (https://cybis.se/forfun/dendro/index.htm). This allows the user more flexibility in data format than what CDendro (CooRecorder's sister software) offers. There are other advantages too - like recording of point labels and exports of attributes like distance to pith, outer and inner dates, and comments.
```{r example_1.5,fig.height=6}
library(ggplot2)
# Read in some example .pos files that show normal files and behavior on files with errors.
ex.pos <- read_pos(system.file("extdata", package = "modendro"))
# We get two erroneous point order warnings - one is real the other is a false positive. Below we can see the difference between the two.
# Check the contents of the output list
names(ex.pos)
# Take a look at the ring widths
ex.pos[["Ring widths"]] |> head()
# Take a look at the attributes
ex.pos[["Attributes"]]
# "Not read" gives you a data.frame of files that were not read in and potentially why
ex.pos[["Not read"]]
# you can see the coordinates
ggplot(ex.pos[["Raw coordinates"]], aes(x, y)) +
geom_path() +
geom_point(aes(color = type)) +
facet_wrap(~series, ncol = 1, scales = "free") +
theme(panel.background = element_blank())
# Note that one file truly had erroneous point order - signified by the jagged black line from geom_path (which plots points in the order it receives them). This file you would want to fix in CooRecorder
# take a look at the ring widths - what you came here for
ggplot(ex.pos[["Ring widths"]], aes(year, rw.mm)) +
geom_line() +
facet_wrap(~series, ncol = 1, scales = "free") +
theme(panel.background = element_blank())
# The true erroneous order file has invalid ring widths.
```
## Example: power transformation & detrending ala Cook & Peters (1997)
Cook & Peters describe a method in their 1997 paper in The Holocene for removing age/size trends from tree-ring series in a way that minimizes bias that can result from using more traditional methods. Specifically, C&P's method involves power transforming raw ring widths to homogenize variance, fitting a trend, and then *subtracting* the trend line to get power transformed detrended residual series.
In modendro, the `cp_detrend()` function does this. I should say that you can do this with a string of dplR functions, but `cp_detrend()` contains the entire operation in a single function. The trend fitting and subtraction step is done using dplR's `detrend()` function, allowing all the detrend curve options availble. Additionally, `cp_detrend()` is much more transparent and returns information about the power transformation, including the power used. Another reason for this detailed output is that this is the process used in the modendro function `ci_detect()`, which I'll demonstrate below. The `cp_detrend()` process can be visualized with the `plot_cp_detrend()` function.
```{r example_2.1}
# The ps96 data allows an illustration of a simple utility function in modendro
# that will identify and fill internal NAs in a series with a specified new value.
# The crucial assumption is that that internal NAs represent missing rings due to
# no growth (0 mm).
ps96 <- rwl_replace_internal_NAs(ps96, new.val = 0)
PS.cp <- cp_detrend(ps96, detrend.method = "ModHugershoff")
names(PS.cp) # output is a list
PS.cp[["Transformation metadata"]][["RRR27"]]
PS.cp.plots <- plot_cp_detrend(PS.cp)
PS.cp.plots[["RRR27"]]
```
## Example: flexible growth-climate relationships
`n_mon_corr()` is a modendro function that is for exploratory data analysis of growth-climate relationships using tree-rings and monthly climate data. The concept is relatively simple, in practice it is not so simple!
The basic operation is to take monthly climate data and compute every possible contiguous monthly aggregate window given a user-specified n months (2-12 month aggregates possible) and for a set of lag years. The monthly aggregate windows are continuous across (lagged) annual boundaries. Then, `n_mon_corr()` computes correlations with each of these monthly aggregates and every single tree-ring series in your rwl. As such, this is a "brute-force" type of operation and can be somewhat slow, depending on the parameters.
`n_mon_corr()` works in both hemispheres - if hemisphere = "S" is selected, then a +1 "lag" is added to whatever lags you specify (e.g., -2, -1, 0, +1). This is because the convention of assigning years to tree-rings in the southern hemisphere is to use the year that growth starts in. Since growth will usually continue through the Gregorian new year, the +1 is there to accommodate the full range of possible monthly climate aggregates into the growing season.
I have built in operations like "prewhitening" so that the both the climate aggregates (generated inside the function) and the tree-ring data get treated the same way. In most tree-ring analyses, prewhitening means residuals from an autoregressive model of the ring-width series, the goal of which is to remove autocorrelation (includes trends) and leave only the high-frequency variability that resembles white noise. In practice, I have found that the simple `ar()` is not always adequate to model all of the autocorrelation in tree-ring or climate series. A more complex time-series model does better, like an ARIMA model. `n_mon_corr()` uses `auto.arima()` from the `forecast` package to find a best-fit ARIMA model, and then extracts the residuals. ARIMA models don't model the variability, so all of the high-frequency component is retained, including any changes in variability over time (i.e., volatility). All this is to say that `n_mon_corr()` does a more honest job of removing autocorrelation than traditional tree-ring approaches. We want this before doing cross-correlations between two time series!
The default method for correlations is a Spearman rank correlation - this is to allow for the detection of possibly non-linear associations between tree-rings and climate. For significance testing of the individual correlation analyses, I use a method described by Lun et al. (2022), Journal of Applied Statistics, 50(14) that adjusts the significance assuming that there is autocorrelation in the time-series. For "prewhitened" series, this will probably be a bit conservative, but is useful if you want to run both "prewhitened" and not series to try to understand the influence that trends may have on the relationships - as such it makes sense that both types of series are treated with the same significance test. Hence the default. Kendall rank correlations are also possible and use this adjusted significance test. *Pearson correlations are an option, but there is no adjustment for autocorrelation* in Pearson. I don't recommend it.
If you turn off "prewhitening", be aware of the effects of autocorrelated time-series in cross-correlations. This is a problem for the *correlation coefficients*, not the significance testing per se. See Yule (1926), Journal of the Royal Statistical Society 89(1). The take away is that for cross-correlations between highly autocorrelated time series (i.e., low-frequency amplitudes are much greater than high-frequency amplitudes), correlation coefficients are nearly always very high, even though they may be meaningless.
The `n_mon_corr()` documentation has more details on the various arguments.
```{r example_3.1}
# Bring in some climate data associated with the tree ring data we loaded earlier
data("idPRISM")
head(idPRISM)
PS.corr <- n_mon_corr(rwl = ps96,
clim = idPRISM,
clim.var = "Tavg.C",
common.years = 1895:1991,
agg.fun = "mean",
max.win = 6,
win.align = "left",
max.lag = 2,
hemisphere = "N",
prewhiten = TRUE,
corr.method = "spearman")
names(PS.corr)
```
Note the warnings about series that don't have enough overlap with the climate data.
The output of `n_mon_corr()` includes the individual correlation results, the monthly aggreated climate data (prewhitened & raw), and the prewhitened ring widths. This allows you to inspect each of these elements.
As with other `modendro` functions, there is a separate plotting function. This is designed to deal with multiple (a lot of) ring width series, so keep that in mind.
The first plot shows the percentage of significant correlations for each month and each window length, the second shows the mean correlation coefficients of the same. These are two ways to find the strongest signals across lags and moving window lengths. Lags are indicated by labeled rectangles at the bottom of the plots (-2, -1, 0).
I include the aggregated data used to make the plots as well, for some added transparency.
```{r example_3.2}
PS.corr.plots <- plot_n_mon_corr(PS.corr)
names(PS.corr.plots)
PS.corr.plots[["Percent sig. corr. plot"]]
PS.corr.plots[["Mean corr. coef. plot"]]
PS.corr.plots[["Aggregated data"]] |> head()
```
## Example: detection and removal of disturbances in tree-ring series
There are several existing methods for detecting release or suppression events in tree-ring series - the `TRADER` package for R offers several methods for detecting growth releases and the `dfoliatR` package offers methods for detecting suppression events from insect defoliators. To my knowledge, there has not been an R implementation of the time-series based intervention detection methods developed by Druckenbrod et al (2005, 2013) and later by Rydval et al. (2016, 2018). modendro's `ci_detect()` is an R implementation of this method, called "curve intervention detection". See `?ci_detect` for full citations. This method simultaneously identifies and removes suppression and release events in tree-ring series.
The curve intervention detection method starts with detrending the ring-width series using Cook & Peters (1997) methods, described above. This step is done inside of `ci_detect()` using `cp_detrend()`. You can choose the detrending curve method here as well - although be aware that the choice of method can influence the detection of suppression and release events! An extreme example would be to use a very flexible spline, which will do a thorough job of removing anything that might qualify as a disturbance. Reasonable choices are "ModNegExp" and "ModHugershoff". The default is "Mean", which does no actual detrending.
After initial detrending, autoregressive (AR) residuals of the detrended power-transformed series are obtained from a best-fit AR model. The function then computes multiple moving averages (defaults 9-30 years) of the AR residuals, and any moving averages that exceed a variance threshold are identified as disturbances. In the first iteration, the largest of these identified disturbances is selected and a Hugershoff-type curve is fit to the disturbance and the remainder of the series. Then this curve is subtracted from the series and AR residuals and moving averages are re-calculated, starting a new iteration. Iterations are repeated until no more disturbances are detected or the user-defined maximum iterations is reached.
After detection and removal iterations are complete, the "corrected" series is back transformed and "retrended" to return a "disturbance-free" set of ring-widths in original units. This process allows a kind of counter-factual view of how a tree might have grown without disturbances. `ci_detect()` also returns a "disturbance index", the difference between the original and the corrected ring-widths, for each ring-width series in a rwl-format data.frame. The disturbances detected are returned in a data.frame that has the estimated start year, duration, direction of the disturbance, and the equation of the fitted curve.
One of the implicit features of curve intervention detection is that disturbances can generally be defined by a Hugershoff-type curve - that is they often have a relatively abrupt initiation and a slow recovery to previous conditions.
Let's employ the same tree-ring dataset that we've been using for the previous analyses. Then we'll use `plot_ci_detect()` to show plots of the disturbance detection and removal iterations and the final results plot for a single tree's series.
```{r example_3.3}
PS.cid <- ci_detect(ps96, detrend.method = "ModHugershoff")
PS.cid.plots <- plot_ci_detect(PS.cid)
PS.cid.plots[["Disturbance detection & removal plots"]][["RRR27"]]
PS.cid.plots[["Final disturbance-free series plots"]][["RRR27"]]
```
Druckenbrod et al (2024) recently developed a new method for detecting and removing disturbances (disturbance detrending). This method is implemented in modendro as the `d_detrend()` function. This method has several advantages over the `ci_detect()` method (see function documentation and references for details). `plot_d_detrend` produces plots that illustrate the steps of the process. New to the modendro implementation is the possibility to detrend suppression events - this option is still experimental, so use with caution! It is also possible to use a variety of detrending methods (both for the individual disturbances and the overall age trend).
```{r example_3.4}
# Load Missouri post oak ring widths
data(mo024)
mo024.ddtrd <- d_detrend(data = mo024,
win.len = 15,
pgc.thresh = 50,
d.detrend.method = "AgeDepSpline",
detrend.method = "AgeDepSpline",
nyrs = c(10, 30),
event.type = "release")
mo024.ddtrd.plots <- plot_d_detrend(mo024.ddtrd)
mo024.ddtrd.plots$DEM01C
```