The northern Gulf of Mexico is a taxonomically rich ecosystem; however, the trophic dynamics of the region are not well understood. This project seeks to explore the trophic dynamics of the northern Gulf of Mexico. To do so, an extensive source of stomach content and stable isotope data will be tidied and reformatted in order to be implemented into a newly developed model, EcoDiet.
Data come from meta-analysis of literature of the region and raw values. Stomach content data comes from a meta-analysis by a previous student, Oshima, containing over 12,500 unique observations from 69 studies. Stable isotope data come from values collected by meta-analysis, summarized, and raw values.
load("isotope.for.plotting.RData")
%>% filter(!is.na(deltaC)) %>%
isotope ggplot(aes(x = deltaC, y = deltaN)) +
geom_point() +
xlab("Mean δ13C") + ylab("Mean δ15N") +
theme_classic() +
theme(plot.caption = element_text(hjust = 0.5))
load("isotope.for.plotting.RData")
%>%
isotope filter(!is.na(`mean deltaC`), !is.na(`mean deltaN`)) %>%
ggplot(aes(x = `mean deltaC`, y = `mean deltaN`)) +
geom_point(position = "jitter") +
xlab("Mean δ13C") + ylab("Mean δ15N") +
theme_classic() +
theme(plot.caption = element_text(hjust = 0.5))
load("raw.isotope.for.plotting.RData")
ggplot(raw.iso.name, aes(x = D13C, y = D15N)) +
geom_point(show.legend = F) +
xlab("δ13C") + ylab("δ15N") +
theme_classic() +
theme(plot.caption = element_text(hjust = 0.5))
Rows | Columns | |
---|---|---|
Stomach Contents | 12519 | 45 |
Raw Isotope | 5114 | 24 |
Meta-analysis and Summarized Isotope | 1001 | 18 |
Given the breadth of the data collected, significant tidying and selection of relevant data was required. For taxa from the stable isotope meta-analysis and summarized values, full taxonomy was not provided. The full taxonomy of each observation was required, as part of the process was to obtain stable isotope values for higher taxonomic levels. The full taxonomy was found by running a loop over all lowest available taxa with a WORMS function.
require(worrms)
<- read_csv("./Data/master.taxa.list.calvin.csv")
sia.data <- sia.data$Taxa %>% unique
un.taxa
<- c()
output for (j in 1:length(un.taxa)) {
<- try(wm_records_taxamatch(name = un.taxa[j]), silent = T)
taxonomy if ((taxonomy %>% as.character()) != "Error : (204) No Content - AphiaRecordsByMatchNames\n")
<- rbind(output, c(un.taxa[j], taxonomy[[1]] %>% select(phylum:genus)))
output print(j)
}
<- output
taxonomic.data rm(output)
save(taxonomic.data, file = "./Data/SIA.taxonomy.data.RData")
The two data frames (taxonomy and values) were merged into a single data frame. Before this could be done, the output of the WORMS function was mutated from a list into characters. Relevant columns were also selected from the data set to make it more manageable.
<- read_csv("./Data/master.taxa.list.calvin.csv")
sia.data load("./Data/SIA.taxonomy.data.RData")
<- as_tibble(taxonomic.data) %>%
tax.data.tibble rename("Taxa" = "V1") %>%
mutate(Taxa = as.character(Taxa),
phylum = as.character(phylum),
class = as.character(class),
order = as.character(order),
family = as.character(family),
genus = as.character(genus))
<- sia.data %>% select(Taxa:n)
sia.data.only
<- left_join(sia.data.only, tax.data.tibble, by = "Taxa")
sia.taxa.data
write.csv(sia.taxa.data, file = "./Data/SIA.and.taxonomy.csv")
The WORMS function also gave some nonsense values that needed to be corrected. This was done by merging a “data dictionary,” which was created outside of the working data frame.
<- read.csv("./Data/Replacement taxonomy.csv") %>% as_tibble %>%
dictionary mutate_all(na_if, "")
<- read.csv("./Data/SIA.and.taxonomy.csv") %>%
sia.and.taxonomy
as_tibble
$order[which(sia.and.taxonomy$order == "Ovalentaria incertae sedis")] <- NA
sia.and.taxonomy$order[which(sia.and.taxonomy$order == "Eupercaria incertae sedis")] <- NA
sia.and.taxonomy
<- sia.and.taxonomy %>% rows_update(dictionary, by= "Taxa")
updated.taxa
write.csv(updated.taxa, file = "./Data/clean.SIA.taxonomy.csv")
The stable isotope values from the meta-analysis were reported with different metrics of uncertainty. Because we wanted to take pseudo-samples from our literature values, we converted all error metrics into standard deviations. Both standard error and 95% confidence intervals were converted to standard deviations, and all reported and calculated standard deviations were combined into a single column.
<- read.csv("./Data/clean.SIA.taxonomy.csv")
updated.taxa
<- updated.taxa %>% mutate(n.df = n-1) %>%
step1 ::filter(!is.na(`X95CI.deltaN`))%>%
dplyrmutate(SD.from95CI.deltaC = `X95CI.deltaC` / qt(p = 0.975, df = (n.df)),
SD.from95CI.deltaN = `X95CI.deltaN` / qt(p = 0.975, df = (n.df))) %>%
select(X, SD.from95CI.deltaC, SD.from95CI.deltaN)
<- left_join(updated.taxa, step1, by = "X")
step2
<- step2 %>% mutate(sqrt.n = sqrt(n)) %>%
step3 mutate(
SD.from.SE.deltaC = `SE.deltaC` * `sqrt.n`,
SD.from.SE.deltaN = `SE.deltaN` * `sqrt.n`) %>%
select(
X, Taxa, deltaC, deltaN, mean.deltaC, SD.deltaC, SD.from.SE.deltaC, SD.from95CI.deltaC,:genus
mean.deltaN, SD.deltaN, SD.from.SE.deltaN, SD.from95CI.deltaN, n, phylum
)
<- step3 %>%
only.SD mutate(all.SD.deltaC = coalesce(SD.deltaC, SD.from.SE.deltaC, SD.from95CI.deltaC),
all.SD.deltaN = coalesce(SD.deltaN, SD.from.SE.deltaN, SD.from95CI.deltaN)) %>%
select(-SD.deltaC, -SD.from.SE.deltaC, -SD.from95CI.deltaC, -SD.deltaN,
-SD.from.SE.deltaN, -SD.from95CI.deltaN) %>%
select(X:mean.deltaC, all.SD.deltaC, mean.deltaN, all.SD.deltaN, n:genus)
rm(step1, step2, step3)
write.csv(only.SD, file = "./Data/SIA.converted.SD.csv")
The next step was to take samples based on the mean, standard deviation, and number of individuals reported. This sampling was done over each taxonomic level, and samples were combined into a single data frame.
<- read.csv("./Data/SIA.converted.SD.csv")
only.SD
<- only.SD %>% dplyr::filter(!is.na(all.SD.deltaC)) %>% group_by(Taxa) %>%
boot.samp.CN.literature summarise(bootC = rnorm(n = max(`n`,1,na.rm = T), mean = `mean.deltaC`, sd = `all.SD.deltaC`),
bootN = rnorm(n = max(`n`,1,na.rm = T), mean = `mean.deltaN`, sd = `all.SD.deltaN`))
<- only.SD %>%
un.genus ::filter(!is.na(genus)) %>%
dplyr::filter(!is.na(all.SD.deltaC))
dplyr
<- c()
genus.samp for (j in 1:dim(un.genus)) {
<- rbind(genus.samp,
genus.samp data.frame(
Taxa = un.genus$genus[j],
bootC = rnorm(n = max(un.genus$`n`[j],1,na.rm = T), mean = un.genus$`mean.deltaC`[j], sd = un.genus$`all.SD.deltaC`[j]),
bootN = rnorm(n = max(un.genus$`n`[j],1,na.rm = T), mean = un.genus$`mean.deltaN`[j], sd = un.genus$`all.SD.deltaN`[j])))
}
# other taxonomic levels not shown, but followed an identical process
<- rbind(boot.samp.CN.literature,
boot.SIA.samples
genus.samp,
family.samp,
order.samp,
class.samp,
phylum.samp)
write.csv(boot.SIA.samples, file = "./Data/boot.SIA.samples.csv")
The full taxonomy for prey in the stomach content data were obtained following a similar workflow. This included identifying unique taxa, running the WORMS loop, joining data frames, and correcting nonsense values from WORMS.
require(worrms)
<- read_csv("./Data/Diet Matrix GOM clean taxonomy.csv")
diet.matrix <- diet.matrix %>% select(Species.Name, n:Prey.S, Frequency.of.Occurrence, X..IRI) %>%
diet.matrix mutate(gen.sp = paste(`Prey.G`, `Prey.S`))
$gen.sp <- str_replace(diet.matrix$gen.sp, " NA","")
diet.matrix
# Filter out if all taxonomic categories are NA
<- c()
ind for (j in 1:dim(diet.matrix)[1]) {
if (all(is.na(diet.matrix[j,3:7])))
<- c(ind, j) }
ind <- diet.matrix[-ind, ]
diet.matrix
# Remove unused objects
rm(j, ind)
# populate lowest taxonomic category at lowest possible
for (j in 1:dim(diet.matrix)[1]) {
if (stringr::str_detect(diet.matrix$gen.sp[j],"NA"))
$gen.sp[j] <- diet.matrix[j,3:7][which(!is.na(diet.matrix[j,3:7])) %>% max()] %>% unlist()
diet.matrix
}
<- diet.matrix$gen.sp %>% unique
un.taxa <- c()
diet.taxonomy for (j in 1:length(un.taxa)) {
<- try(wm_records_taxamatch(name = un.taxa[j]), silent = T)
taxonomy if ((taxonomy %>% as.character()) != "Error : (204) No Content - AphiaRecordsByMatchNames\n")
<- rbind(diet.taxonomy, c(un.taxa[j], taxonomy[[1]] %>% select(phylum:genus)))
diet.taxonomy
}
save(diet.taxonomy, file = "./Data/diet.taxonomy.data.RData")
load("./Data/diet.taxonomy.data.RData")
<- read_csv(file = "./Data/diet.dictionary.csv")
diet.dictionary
<- as_tibble(diet.taxonomy) %>%
tax.data rename("Prey.taxa" = "V1") %>%
mutate(Prey.taxa = as.character(Prey.taxa),
phylum = as.character(phylum),
class = as.character(class),
order = as.character(order),
family = as.character(family),
genus = as.character(genus))
$order[which(tax.data$order == "Ovalentaria incertae sedis")] <- NA
tax.data$order[which(tax.data$order == "Eupercaria incertae sedis")] <- NA
tax.data$order[which(tax.data$order == "Carangaria incertae sedis")] <- NA
tax.data
<- tax.data %>% rows_update(diet.dictionary, by= "Prey.taxa")
clean.diet.taxa
<- diet.matrix %>% mutate(Prey.taxa = gen.sp)
diet.matrix
<- left_join(diet.matrix, clean.diet.taxa, by = "Prey.taxa", multiple = "all")
updated.data.matrix
<- updated.data.matrix %>% select(Species.Name, n, Prey.taxa:genus, Frequency.of.Occurrence, X..IRI) %>%
updated.data.matrix rename(Prey.phylum = phylum,
Prey.class = class,
Prey.order = order,
Prey.family = family,
Prey.genus = genus)
# diet data with clean taxonomy, only including FOO and X..IRI from diet data
write.csv(updated.data.matrix, file = "./Data/diet.data.clean.taxonomy.csv")
The EcoDiet model takes values of frequency of occurrence. Oshima’s meta-analysis provided many metrics used in stomach content analysis; however, the only other metric found to have a linear relationship with frequency of occurrence (FOO) was index of relative importance (IRI). A new column was created that contained either the original FOO or calculated FOO. If both were present, then the original FOO was preferentially taken. Next, the weighted mean was taken for each predator/prey interaction.
<- read_csv(file = "./Data/diet.data.clean.taxonomy.csv")
diet.data
$Frequency.of.Occurrence <- diet.data$Frequency.of.Occurrence %>% as.numeric()
diet.data
<- lm(Frequency.of.Occurrence ~ X..IRI,data = diet.data) %>% coef()
FOO.IRI
<- diet.data %>% mutate(predict.FOO = X..IRI*FOO.IRI[[2]] + FOO.IRI[[1]])
diet.data <- diet.data %>% mutate(comb.FOO = coalesce(Frequency.of.Occurrence, `predict.FOO`))
diet.data <- diet.data %>% dplyr::filter(!is.na(`comb.FOO`))
diet.data
$n <- as.numeric(diet.data$n)
diet.data
# replace n = NA, with n = 1
<- diet.data %>% mutate(n = ifelse(is.na(n),1,n))
diet.data
# FOO.data has the weighted means of FOO
<- diet.data %>% group_by(Species.Name, Prey.taxa) %>% summarise(mean.FOO = weighted.mean(comb.FOO, n))
FOO.data
write.csv(FOO.data, file= "./Data/processed.diet.data.csv")
The stable isotope and stomach content data were filtered to contain only species that were in both data sets.
<- read_csv("./Data/processed.diet.data.csv")
diet.data <- read_csv("./Data/boot.SIA.samples.CSV")
boot.SIA
<- boot.SIA %>% dplyr::filter(!is.na(bootN))
boot.SIA <- boot.SIA %>% dplyr::filter(Taxa %in% c(diet.data$Species.Name, diet.data$Prey.taxa))
boot.SIA.reduce save(boot.SIA.reduce, file = "./Data/SIA.model.trial.1.RData")
<- diet.data %>% dplyr::filter(Species.Name %in% boot.SIA$Taxa, Prey.taxa %in% boot.SIA$Taxa)
diet.data.reduce save(diet.data.reduce, file = "./Data/diet.data.model.trial.1.RData")
Stomach content data go into the model as a matrix. An empty square matrix was formed with the dimensions of all taxa, and a loop was run to populate the empty matrix. The loop found the position of each specific taxa/taxa interaction in the data frame containing FOO values and populated the newly created matrix based on which iteration the loop was on. The model also required a row with the number of stomachs analyzed. This was set to 100 for predators and 0 for prey. The stable isotope data was in the correct format, but the columns were renamed.
load("./Data/diet.data.model.trial.1.RData")
<- diet.data.reduce$Species.Name %>% unique()
predators <- prey.name <- c(diet.data.reduce$Species.Name,diet.data.reduce$Prey.taxa) %>% unique()
pred.name <- matrix(0, nrow = length(pred.name), ncol = length(pred.name))
SCA.mat for (j in 1:length(pred.name)) {
for (k in 1:length(prey.name)) {
<- which(diet.data.reduce$Species.Name == pred.name[j] & diet.data.reduce$Prey.taxa == prey.name[k])
ind if (length(ind) > 0) {
<- diet.data.reduce$mean.FOO[ind] }
SCA.mat[k,j] rm(ind)
}
}
<- as_tibble(SCA.mat)
SCA.mat colnames(SCA.mat) <- prey.name
<- colnames(SCA.mat)
cols <- SCA.mat %>% mutate(across(all_of(cols), round))
SCA.mat <- as.data.frame(SCA.mat)
SCA.mat
rownames(SCA.mat) <- prey.name
<- rbind(SCA.mat,rep(0, times = dim(SCA.mat)[2]))
SCA.mat rownames(SCA.mat)[dim(SCA.mat)[1]] <- "full"
'full', names(SCA.mat) %in% predators] <- 100
SCA.mat[
<- SCA.mat
model.SCA
save(model.SCA, file = "./Data/model.ready.SCA.RData")
load("./Data/SIA.model.trial.1.RData")
load("./Data/model.ready.SCA.RData")
<- as_tibble(boot.SIA.reduce)
boot.SIA.reduce <- boot.SIA.reduce$Taxa %>% unique
sia.taxa
<- as_tibble(model.SCA)
stomach.throwaway <- colnames(stomach.throwaway)
sca.taxa
<- sia.taxa[c(which(!(sia.taxa %in% sca.taxa)))]
not.in
<- boot.SIA.reduce %>% dplyr::filter(!Taxa %in% not.in)
boot.SIA.reduce
<- boot.SIA.reduce %>% select(Taxa, bootC, bootN) %>%
boot.SIA.reduce rename(group = Taxa, d13C = bootC, d15N = bootN)
<- as.data.frame(boot.SIA.reduce)
model.SIA
save(model.SIA, file = "./Data/model.ready.SIA.RData")
Before being implemented into the model, the trophic connections present in the stomach content data were visualized. This was done using a network model.
require(igraph)
load("./Data/model.ready.SCA.RData")
<- names(model.SCA)
taxa <- model.SCA[rownames(model.SCA) %in% taxa,]
sca.matrix <- as.matrix(sca.matrix)
sca.matrix
<- graph_from_adjacency_matrix(sca.matrix, mode = "undirected")
network
V(network)$color <- "paleturquoise"
E(network)$color <- "darkgrey"
plot(network, vertex.label=NA, vertex.size = 3.5, layout= layout_nicely, asp = 0.75)
The cleaned data were implemented into the model, which took 6 days to run. The output contains the MCMC runs and various background model operations and is 2.2 GB. The portion of the model output relevant to our work was found and transformed.
require(EcoDiet)
require(rjags)
load("./Data/model.ready.SCA.RData")
load("./Data/model.ready.SIA.RData")
<- preprocess_data(stomach_data = model.SCA,
data biotracer_data = model.SIA,
trophic_discrimination_factor = c(0.8, 3.4),
literature_configuration = F)
plot_data(biotracer_data = model.SIA,
stomach_data = model.SCA)
plot_prior(data, literature_configuration = F)
<- "ecodiet.NGOM.dat.test.txt"
filename write_model(file.name = filename, literature_configuration = F, print.model = F)
<- run_model(filename, data, run_param = "long", parallelize = T)
mcmc_output
save(mcmc_output, file = "./Data/mcmc_output_GOMdata_long.rda")
require(EcoDiet)
load("./Data/model.ready.SCA.RData")
load("./Data/mcmc_output_GOMdata_long.rda")
<- mcmc_output$samples[[1]] %>% as.data.frame() %>% names()
names.mcmc <- data.frame(parameter = c(),
mcmc.samp. index1 = c(),
index2 = c(),
value = c())
for (j in 1:c(length(names.mcmc)-1)) {
<- strsplit(names.mcmc[j],",") %>% unlist()
x
if (stringr::str_detect(x[1], 'eta')) {
<- "eta"
p.name
}
if (stringr::str_detect(x[1], 'PI')) {
<- "PI"
p.name
}
1] <- stringr::str_replace(x[1],p.name,"")
x[1] <- as.numeric(stringr::str_remove(x[1],"[[:punct:]]"))
x[2] <- as.numeric(stringr::str_remove(x[2],"[[:punct:]]") )
x[
<- rbind(
mcmc.samp.
mcmc.samp., data.frame(parameter = p.name,
prey = x[1],
pred = x[2],
value = (as.numeric(mcmc_output$samples[[1]][,j]))))
}
$pred.name <- sort(colnames(model.SCA))[as.numeric(mcmc.samp.$pred)]
mcmc.samp.$prey.name <- sort(colnames(model.SCA))[as.numeric(mcmc.samp.$prey)]
mcmc.samp.
save(mcmc_samp., file = "./Data/mcmc.samp.RData")
The model is an R package and comes with built-in functions, including plotting functions. The built-in plotting functions do not require the above transformations but are difficult to interpret with our data. I created a function to plot the same metrics in a manner more suitable for our purposes, being more readable when multiple values are close together.
require(EcoDiet)
load("./Data/model.ready.SCA.RData")
load("./Data/model.ready.SIA.RData")
<- preprocess_data(stomach_data = model.SCA,
data biotracer_data = model.SIA,
trophic_discrimination_factor = c(0.8, 3.4),
literature_configuration = F)
load("./Data/mcmc_output_GOMdata_long.rda")
plot_results(mcmc_output, data, pred = "Archosargus probatocephalus")
load("./Data/mcmc.samp.RData")
<- function(predator){
eco.plot
<- function(x) {
graph <- quantile(x, probs = c(0.1, 0.1, 0.5, 0.9, 0.9))
r names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
<- mcmc.samp. %>% dplyr::filter(pred.name %in% predator)
plot.this
<- plot.this %>%
trophic.link.plot ::filter(parameter == "eta") %>%
dplyrmutate(prey.name = as_factor(prey.name)) %>%
mutate(prey.name = fct_reorder(prey.name, value)) %>%
ggplot(aes(x = value, y = prey.name)) +
stat_summary(fun.data = graph, geom = "boxplot", width = 0.3, fill = "thistle1") +
xlab(paste("Trophic Link probabilities of", predator)) +
ylab("Prey Item") +
theme_classic()
<- c(ggplot_build(trophic.link.plot)$layout$panel_params[[1]]$y$get_labels()) %>%
arrange.prey
as.factor
<- plot.this %>% dplyr::filter(parameter == "PI") %>%
diet.plot ggplot(aes(x = value, y = factor(prey.name, level = arrange.prey))) +
stat_summary(fun.data = graph, geom = "boxplot", width = 0.3, fill = "paleturquoise") +
xlab(paste("Diet Proportions of", predator)) +
ylab("Prey Item") +
theme_classic()
print(diet.plot)
print(trophic.link.plot)
ggarrange(trophic.link.plot, diet.plot, nrow = 2, ncol = 1)
# ggsave(paste0("./Figures/trophic.link.plot.", predator, ".tiff"))
}
eco.plot("Archosargus probatocephalus")