Investigation of Trophic Dynamics in the northern Gulf of Mexico using EcoDiet

Calvin Chee

2023-04-26

Introduction
Data Description
Deliverables
Computational Methods

Introduction

The northern Gulf of Mexico is a taxonomically rich ecosystem; however, the trophic dynamics of the region are not well understood. This project seeks to explore the trophic dynamics of the northern Gulf of Mexico. To do so, an extensive source of stomach content and stable isotope data will be tidied and reformatted in order to be implemented into a newly developed model, EcoDiet.

Data Description

Data come from meta-analysis of literature of the region and raw values. Stomach content data comes from a meta-analysis by a previous student, Oshima, containing over 12,500 unique observations from 69 studies. Stable isotope data come from values collected by meta-analysis, summarized, and raw values.

load("isotope.for.plotting.RData")
isotope %>% filter(!is.na(deltaC)) %>% 
  ggplot(aes(x = deltaC, y = deltaN)) +
  geom_point() +
  xlab("Mean δ13C") + ylab("Mean δ15N") +
  theme_classic() +
  theme(plot.caption = element_text(hjust = 0.5))

Figure 1. 123 observations of summarized stable isotope values.

load("isotope.for.plotting.RData")
isotope %>%
  filter(!is.na(`mean deltaC`), !is.na(`mean deltaN`)) %>%
  ggplot(aes(x = `mean deltaC`, y = `mean deltaN`)) +
  geom_point(position = "jitter") +
  xlab("Mean δ13C") + ylab("Mean δ15N") +
  theme_classic() +
  theme(plot.caption = element_text(hjust = 0.5))

Figure 2. 630 stable isotope observations of 445 taxa from meta-analysis.

load("raw.isotope.for.plotting.RData")
ggplot(raw.iso.name, aes(x = D13C, y = D15N)) + 
  geom_point(show.legend = F) +
  xlab("δ13C") + ylab("δ15N") +
  theme_classic() +
  theme(plot.caption = element_text(hjust = 0.5))

Figure 3. Raw stable isotope values for 177 taxa. n = 4433

**Table 1.** Dimensions of unprocessed data files.
	Rows	Columns
Stomach Contents	12519	45
Raw Isotope	5114	24
Meta-analysis and Summarized Isotope	1001	18

Deliverables

Reconcile stable isotope and stomach content data
Integrate stable isotope and stomach content data into the EcoDiet model
Run the EcoDiet model
Use model outputs of trophic link probabilities and diet proportions to investigate predator-prey interactions, especially of species of management interest

Computational Methods

Figure 4. Workflow for computational methods section.

Tidying stable isotope data

Given the breadth of the data collected, significant tidying and selection of relevant data was required. For taxa from the stable isotope meta-analysis and summarized values, full taxonomy was not provided. The full taxonomy of each observation was required, as part of the process was to obtain stable isotope values for higher taxonomic levels. The full taxonomy was found by running a loop over all lowest available taxa with a WORMS function.

require(worrms)
sia.data <- read_csv("./Data/master.taxa.list.calvin.csv")
un.taxa <- sia.data$Taxa %>% unique


output <- c()
for (j in 1:length(un.taxa)) {
  taxonomy <- try(wm_records_taxamatch(name = un.taxa[j]), silent = T)
  if ((taxonomy %>% as.character()) != "Error : (204) No Content - AphiaRecordsByMatchNames\n") 
  output <- rbind(output, c(un.taxa[j], taxonomy[[1]] %>% select(phylum:genus)))
  print(j)
}

taxonomic.data <- output
rm(output)

save(taxonomic.data, file = "./Data/SIA.taxonomy.data.RData")

The two data frames (taxonomy and values) were merged into a single data frame. Before this could be done, the output of the WORMS function was mutated from a list into characters. Relevant columns were also selected from the data set to make it more manageable.

sia.data <- read_csv("./Data/master.taxa.list.calvin.csv")
load("./Data/SIA.taxonomy.data.RData")


tax.data.tibble <- as_tibble(taxonomic.data) %>%
  rename("Taxa" = "V1") %>%
  mutate(Taxa = as.character(Taxa),
         phylum = as.character(phylum),
         class = as.character(class),
         order = as.character(order),
         family = as.character(family),
         genus = as.character(genus))

sia.data.only <- sia.data %>% select(Taxa:n)

sia.taxa.data <- left_join(sia.data.only, tax.data.tibble, by = "Taxa")

write.csv(sia.taxa.data, file = "./Data/SIA.and.taxonomy.csv")

The WORMS function also gave some nonsense values that needed to be corrected. This was done by merging a “data dictionary,” which was created outside of the working data frame.

dictionary <- read.csv("./Data/Replacement taxonomy.csv") %>% as_tibble %>%
  mutate_all(na_if, "")

sia.and.taxonomy <- read.csv("./Data/SIA.and.taxonomy.csv") %>% 
  as_tibble


sia.and.taxonomy$order[which(sia.and.taxonomy$order == "Ovalentaria incertae sedis")] <- NA
sia.and.taxonomy$order[which(sia.and.taxonomy$order == "Eupercaria incertae sedis")] <- NA

updated.taxa <- sia.and.taxonomy %>% rows_update(dictionary, by= "Taxa")

write.csv(updated.taxa, file = "./Data/clean.SIA.taxonomy.csv")

The stable isotope values from the meta-analysis were reported with different metrics of uncertainty. Because we wanted to take pseudo-samples from our literature values, we converted all error metrics into standard deviations. Both standard error and 95% confidence intervals were converted to standard deviations, and all reported and calculated standard deviations were combined into a single column.

updated.taxa <- read.csv("./Data/clean.SIA.taxonomy.csv")


step1 <- updated.taxa %>% mutate(n.df = n-1) %>% 
  dplyr::filter(!is.na(`X95CI.deltaN`))%>% 
  mutate(SD.from95CI.deltaC = `X95CI.deltaC` / qt(p = 0.975, df = (n.df)),
         SD.from95CI.deltaN = `X95CI.deltaN` / qt(p = 0.975, df = (n.df))) %>% 
  select(X, SD.from95CI.deltaC, SD.from95CI.deltaN)

step2 <- left_join(updated.taxa, step1, by = "X")

step3 <- step2 %>% mutate(sqrt.n = sqrt(n)) %>% 
  mutate(
    SD.from.SE.deltaC = `SE.deltaC` * `sqrt.n`,
    SD.from.SE.deltaN = `SE.deltaN` * `sqrt.n`) %>% 
  select(
    X, Taxa, deltaC, deltaN, mean.deltaC, SD.deltaC, SD.from.SE.deltaC, SD.from95CI.deltaC,
    mean.deltaN, SD.deltaN, SD.from.SE.deltaN, SD.from95CI.deltaN, n, phylum:genus
  )


only.SD <- step3 %>% 
  mutate(all.SD.deltaC = coalesce(SD.deltaC, SD.from.SE.deltaC, SD.from95CI.deltaC),
         all.SD.deltaN = coalesce(SD.deltaN, SD.from.SE.deltaN, SD.from95CI.deltaN)) %>% 
  select(-SD.deltaC, -SD.from.SE.deltaC, -SD.from95CI.deltaC, -SD.deltaN,
         -SD.from.SE.deltaN, -SD.from95CI.deltaN) %>% 
  select(X:mean.deltaC, all.SD.deltaC, mean.deltaN, all.SD.deltaN, n:genus)

rm(step1, step2, step3)

write.csv(only.SD, file = "./Data/SIA.converted.SD.csv")

The next step was to take samples based on the mean, standard deviation, and number of individuals reported. This sampling was done over each taxonomic level, and samples were combined into a single data frame.

only.SD <- read.csv("./Data/SIA.converted.SD.csv")

boot.samp.CN.literature <- only.SD %>% dplyr::filter(!is.na(all.SD.deltaC)) %>% group_by(Taxa) %>% 
  summarise(bootC = rnorm(n = max(`n`,1,na.rm = T), mean = `mean.deltaC`, sd = `all.SD.deltaC`),
            bootN = rnorm(n = max(`n`,1,na.rm = T), mean = `mean.deltaN`, sd = `all.SD.deltaN`))

un.genus <- only.SD %>% 
  dplyr::filter(!is.na(genus)) %>% 
  dplyr::filter(!is.na(all.SD.deltaC)) 

genus.samp <- c()
for (j in 1:dim(un.genus)) {
  genus.samp <- rbind(genus.samp,
  data.frame(
    Taxa = un.genus$genus[j],
    bootC = rnorm(n = max(un.genus$`n`[j],1,na.rm = T), mean = un.genus$`mean.deltaC`[j], sd = un.genus$`all.SD.deltaC`[j]),
    bootN = rnorm(n = max(un.genus$`n`[j],1,na.rm = T), mean = un.genus$`mean.deltaN`[j], sd = un.genus$`all.SD.deltaN`[j])))
}

# other taxonomic levels not shown, but followed an identical process

boot.SIA.samples <- rbind(boot.samp.CN.literature,
                          genus.samp, 
                          family.samp,
                          order.samp, 
                          class.samp,
                          phylum.samp)

write.csv(boot.SIA.samples, file = "./Data/boot.SIA.samples.csv")

Tidying stomach content data

The full taxonomy for prey in the stomach content data were obtained following a similar workflow. This included identifying unique taxa, running the WORMS loop, joining data frames, and correcting nonsense values from WORMS.

require(worrms)

diet.matrix <- read_csv("./Data/Diet Matrix GOM clean taxonomy.csv")
diet.matrix <- diet.matrix %>% select(Species.Name, n:Prey.S, Frequency.of.Occurrence, X..IRI) %>% 
  mutate(gen.sp = paste(`Prey.G`, `Prey.S`))

diet.matrix$gen.sp <- str_replace(diet.matrix$gen.sp, " NA","")

# Filter out if all taxonomic categories are NA
ind <- c()
for (j in 1:dim(diet.matrix)[1]) {
  if (all(is.na(diet.matrix[j,3:7])))
    ind <- c(ind, j) }
diet.matrix <- diet.matrix[-ind, ]

# Remove unused objects
rm(j, ind)

# populate lowest taxonomic category at lowest possible
for (j in 1:dim(diet.matrix)[1]) {
  if (stringr::str_detect(diet.matrix$gen.sp[j],"NA"))
    diet.matrix$gen.sp[j] <- diet.matrix[j,3:7][which(!is.na(diet.matrix[j,3:7])) %>% max()] %>% unlist()
}


un.taxa <- diet.matrix$gen.sp %>% unique
diet.taxonomy <- c()
for (j in 1:length(un.taxa)) {
  taxonomy <- try(wm_records_taxamatch(name = un.taxa[j]), silent = T)
  if ((taxonomy %>% as.character()) != "Error : (204) No Content - AphiaRecordsByMatchNames\n")
    diet.taxonomy <- rbind(diet.taxonomy, c(un.taxa[j], taxonomy[[1]] %>% select(phylum:genus)))
}

save(diet.taxonomy, file = "./Data/diet.taxonomy.data.RData")

load("./Data/diet.taxonomy.data.RData")
diet.dictionary <- read_csv(file = "./Data/diet.dictionary.csv")

tax.data <- as_tibble(diet.taxonomy) %>%
  rename("Prey.taxa" = "V1") %>%
  mutate(Prey.taxa = as.character(Prey.taxa),
         phylum = as.character(phylum),
         class = as.character(class),
         order = as.character(order),
         family = as.character(family),
         genus = as.character(genus))

tax.data$order[which(tax.data$order == "Ovalentaria incertae sedis")] <- NA
tax.data$order[which(tax.data$order == "Eupercaria incertae sedis")] <- NA
tax.data$order[which(tax.data$order == "Carangaria incertae sedis")] <- NA

clean.diet.taxa <- tax.data %>% rows_update(diet.dictionary, by= "Prey.taxa")

diet.matrix <- diet.matrix %>% mutate(Prey.taxa = gen.sp)

updated.data.matrix <- left_join(diet.matrix, clean.diet.taxa, by = "Prey.taxa", multiple = "all")

updated.data.matrix <- updated.data.matrix %>% select(Species.Name, n, Prey.taxa:genus, Frequency.of.Occurrence, X..IRI) %>% 
  rename(Prey.phylum = phylum,
         Prey.class = class,
         Prey.order = order,
         Prey.family = family,
         Prey.genus = genus)

# diet data with clean taxonomy, only including FOO and X..IRI from diet data
write.csv(updated.data.matrix, file = "./Data/diet.data.clean.taxonomy.csv")

The EcoDiet model takes values of frequency of occurrence. Oshima’s meta-analysis provided many metrics used in stomach content analysis; however, the only other metric found to have a linear relationship with frequency of occurrence (FOO) was index of relative importance (IRI). A new column was created that contained either the original FOO or calculated FOO. If both were present, then the original FOO was preferentially taken. Next, the weighted mean was taken for each predator/prey interaction.

diet.data <- read_csv(file = "./Data/diet.data.clean.taxonomy.csv")


diet.data$Frequency.of.Occurrence <- diet.data$Frequency.of.Occurrence  %>% as.numeric()


FOO.IRI <- lm(Frequency.of.Occurrence ~ X..IRI,data = diet.data) %>% coef()

diet.data <-  diet.data %>% mutate(predict.FOO = X..IRI*FOO.IRI[[2]] + FOO.IRI[[1]])
diet.data <-  diet.data %>% mutate(comb.FOO = coalesce(Frequency.of.Occurrence, `predict.FOO`))
diet.data <-  diet.data %>% dplyr::filter(!is.na(`comb.FOO`))

diet.data$n <- as.numeric(diet.data$n)

# replace n = NA, with n = 1
diet.data <- diet.data %>% mutate(n = ifelse(is.na(n),1,n))

# FOO.data has the weighted means of FOO
FOO.data <- diet.data %>% group_by(Species.Name, Prey.taxa) %>% summarise(mean.FOO = weighted.mean(comb.FOO, n))

write.csv(FOO.data, file= "./Data/processed.diet.data.csv")

Reconciling taxa and reformatting

The stable isotope and stomach content data were filtered to contain only species that were in both data sets.

diet.data <- read_csv("./Data/processed.diet.data.csv")
boot.SIA <- read_csv("./Data/boot.SIA.samples.CSV")

boot.SIA <- boot.SIA %>% dplyr::filter(!is.na(bootN))
boot.SIA.reduce <- boot.SIA %>% dplyr::filter(Taxa %in% c(diet.data$Species.Name, diet.data$Prey.taxa))
save(boot.SIA.reduce, file = "./Data/SIA.model.trial.1.RData")

diet.data.reduce <- diet.data %>% dplyr::filter(Species.Name %in% boot.SIA$Taxa, Prey.taxa %in% boot.SIA$Taxa)
save(diet.data.reduce, file = "./Data/diet.data.model.trial.1.RData")

Stomach content data go into the model as a matrix. An empty square matrix was formed with the dimensions of all taxa, and a loop was run to populate the empty matrix. The loop found the position of each specific taxa/taxa interaction in the data frame containing FOO values and populated the newly created matrix based on which iteration the loop was on. The model also required a row with the number of stomachs analyzed. This was set to 100 for predators and 0 for prey. The stable isotope data was in the correct format, but the columns were renamed.

load("./Data/diet.data.model.trial.1.RData")

predators <- diet.data.reduce$Species.Name %>% unique()
pred.name <- prey.name <- c(diet.data.reduce$Species.Name,diet.data.reduce$Prey.taxa) %>% unique()
SCA.mat <- matrix(0, nrow = length(pred.name), ncol = length(pred.name))
for (j in 1:length(pred.name)) {
  for (k in 1:length(prey.name)) {
    
    ind <- which(diet.data.reduce$Species.Name == pred.name[j] & diet.data.reduce$Prey.taxa == prey.name[k])
    if (length(ind) > 0) {
      SCA.mat[k,j] <- diet.data.reduce$mean.FOO[ind]   }
    rm(ind)
  }
}

SCA.mat <- as_tibble(SCA.mat)
colnames(SCA.mat) <- prey.name
cols <- colnames(SCA.mat)
SCA.mat <- SCA.mat %>%  mutate(across(all_of(cols), round))
SCA.mat <- as.data.frame(SCA.mat)


rownames(SCA.mat) <- prey.name


SCA.mat <- rbind(SCA.mat,rep(0, times = dim(SCA.mat)[2]))
rownames(SCA.mat)[dim(SCA.mat)[1]] <-  "full"

SCA.mat['full', names(SCA.mat)  %in% predators] <- 100

model.SCA <- SCA.mat

save(model.SCA, file = "./Data/model.ready.SCA.RData")

load("./Data/SIA.model.trial.1.RData")
load("./Data/model.ready.SCA.RData")

boot.SIA.reduce <- as_tibble(boot.SIA.reduce)
sia.taxa <- boot.SIA.reduce$Taxa %>% unique

stomach.throwaway <- as_tibble(model.SCA)
sca.taxa <- colnames(stomach.throwaway)

not.in <- sia.taxa[c(which(!(sia.taxa %in% sca.taxa)))]

boot.SIA.reduce <- boot.SIA.reduce %>% dplyr::filter(!Taxa %in% not.in)

boot.SIA.reduce <- boot.SIA.reduce %>% select(Taxa, bootC, bootN) %>%
  rename(group = Taxa, d13C = bootC, d15N = bootN)
model.SIA <- as.data.frame(boot.SIA.reduce)

save(model.SIA, file = "./Data/model.ready.SIA.RData")

Before being implemented into the model, the trophic connections present in the stomach content data were visualized. This was done using a network model.

require(igraph)

load("./Data/model.ready.SCA.RData")

taxa <- names(model.SCA)
sca.matrix <- model.SCA[rownames(model.SCA) %in% taxa,]
sca.matrix <-  as.matrix(sca.matrix)

network <- graph_from_adjacency_matrix(sca.matrix, mode = "undirected")

V(network)$color <- "paleturquoise"
E(network)$color <- "darkgrey"

plot(network, vertex.label=NA, vertex.size = 3.5, layout= layout_nicely, asp = 0.75)

Figure 5. Network topology of n = 217 taxa from stomach content data.

Running the model and evaluating output

The cleaned data were implemented into the model, which took 6 days to run. The output contains the MCMC runs and various background model operations and is 2.2 GB. The portion of the model output relevant to our work was found and transformed.

require(EcoDiet)
require(rjags)

load("./Data/model.ready.SCA.RData")
load("./Data/model.ready.SIA.RData")


data <- preprocess_data(stomach_data = model.SCA,
                        biotracer_data = model.SIA,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = F)

plot_data(biotracer_data = model.SIA,
          stomach_data = model.SCA)

plot_prior(data, literature_configuration = F)


filename <- "ecodiet.NGOM.dat.test.txt"
write_model(file.name = filename, literature_configuration = F, print.model = F)
mcmc_output <- run_model(filename, data, run_param = "long", parallelize = T)

save(mcmc_output, file = "./Data/mcmc_output_GOMdata_long.rda")

require(EcoDiet)

load("./Data/model.ready.SCA.RData")
load("./Data/mcmc_output_GOMdata_long.rda")

names.mcmc <- mcmc_output$samples[[1]] %>% as.data.frame() %>% names()
mcmc.samp. <- data.frame(parameter = c(),
                         index1 = c(),
                         index2 = c(),
                         value = c())

for (j in 1:c(length(names.mcmc)-1)) {
  
  x <- strsplit(names.mcmc[j],",") %>% unlist()
  
  
  if (stringr::str_detect(x[1], 'eta')) {
    p.name <- "eta"
  }
  
  if (stringr::str_detect(x[1], 'PI')) {
    p.name <- "PI"
  }
  
  x[1] <- stringr::str_replace(x[1],p.name,"")
  x[1] <- as.numeric(stringr::str_remove(x[1],"[[:punct:]]"))
  x[2] <- as.numeric(stringr::str_remove(x[2],"[[:punct:]]") )
  
  mcmc.samp. <- rbind(
    mcmc.samp., 
    data.frame(parameter = p.name, 
             prey = x[1], 
             pred = x[2], 
             value = (as.numeric(mcmc_output$samples[[1]][,j]))))
}

mcmc.samp.$pred.name <- sort(colnames(model.SCA))[as.numeric(mcmc.samp.$pred)]
mcmc.samp.$prey.name <- sort(colnames(model.SCA))[as.numeric(mcmc.samp.$prey)]

save(mcmc_samp., file = "./Data/mcmc.samp.RData")

The model is an R package and comes with built-in functions, including plotting functions. The built-in plotting functions do not require the above transformations but are difficult to interpret with our data. I created a function to plot the same metrics in a manner more suitable for our purposes, being more readable when multiple values are close together.

require(EcoDiet)

load("./Data/model.ready.SCA.RData")
load("./Data/model.ready.SIA.RData")

data <- preprocess_data(stomach_data = model.SCA,
                        biotracer_data = model.SIA,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = F)

load("./Data/mcmc_output_GOMdata_long.rda")


plot_results(mcmc_output, data, pred = "Archosargus probatocephalus")

Figure 6. EcoDiet plotting function for Diet Proportion.

Figure 7. EcoDiet plotting function for Trophic Link Probability.

load("./Data/mcmc.samp.RData")

eco.plot <- function(predator){
  
  graph <- function(x) {
    r <- quantile(x, probs = c(0.1, 0.1, 0.5, 0.9, 0.9))
    names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
    r
  }
  
  plot.this <- mcmc.samp. %>% dplyr::filter(pred.name %in% predator)

  
  trophic.link.plot <- plot.this %>% 
    dplyr::filter(parameter == "eta") %>% 
    mutate(prey.name = as_factor(prey.name)) %>% 
    mutate(prey.name = fct_reorder(prey.name, value)) %>% 
    ggplot(aes(x = value, y = prey.name)) +
    stat_summary(fun.data = graph, geom = "boxplot", width = 0.3, fill = "thistle1") +
    xlab(paste("Trophic Link probabilities of", predator)) +
    ylab("Prey Item") +
    theme_classic()
  
  
  arrange.prey <- c(ggplot_build(trophic.link.plot)$layout$panel_params[[1]]$y$get_labels()) %>% 
    as.factor
  
  
  diet.plot <- plot.this %>% dplyr::filter(parameter == "PI") %>% 
    ggplot(aes(x = value, y = factor(prey.name, level = arrange.prey))) +
    stat_summary(fun.data = graph, geom = "boxplot", width = 0.3, fill = "paleturquoise") +
    xlab(paste("Diet Proportions of", predator)) +
    ylab("Prey Item") +
    theme_classic()
  
  
  
  print(diet.plot)
  print(trophic.link.plot)
  ggarrange(trophic.link.plot, diet.plot, nrow = 2, ncol = 1)
  # ggsave(paste0("./Figures/trophic.link.plot.", predator, ".tiff"))
}

eco.plot("Archosargus probatocephalus")

Figure 8. Plots from created function.