1 Syllabus

1.1 Special Topics: Data analysis using R

Date: 06 January 2023

Instructor: Dr. Robert Leaf

Office: GCRL Oceanography 119

Office Hours: Make a time to see me in-person or online.

Email:

Phone: 2-4296

Course Meeting Day and Time: T,Th, 4:00 to 5:15 PM

Course Meeting Location: GCRL’s Caylor Computer Lab

1.2 Course Description and Objectives

This course examines the fundamental concepts and techniques for programming in the R statistical programming language. I am convinced that data analysis, data manipulation, data visualization, and reproducible research necessitates command of quantitative tools. Although there are many specialized and general programming languages, the R programming language offers exceptional utility for analysis and is used widely in academia, industry, and by federal and state scientific groups. The demand for skilled data analysis practitioners is rapidly growing and this course prepares you to tackle real-world data analysis challenges.

The primary components of the course:

  1. Introduce the basics of R programming

The course will introduce stereotypical programming concepts, in particular code modularisation, writing and using functions, and code re-usability. We will focus on understanding software engineering concepts such as project build and code testing. Participants will establish a working knowledge of R, R Studio, and relevant packages

  1. Review aspects of project organization

A typical data analysis project involves several many components, each including several data files and different binary scripts with code. Keeping these files organized can be challenging and requires a suite of analytical tools.

  1. Perform operations on vectors and understand how to use advanced functions

Learn how to wrangle, analyze and visualize data using base R operations and specialized packages (e.g. tidyverse and ggplot2)

  1. Promote a reproducible research workflow

Finally, we will examine how to write markdown documents for high throughput data presentation which permits you to incorporate text and code into a document.

1.3 At the conclusion of this course:

Students will be able to recognize problems that can be solved using statistical programming and reproducible research approaches. The skills of sharing, automation, and organization enable making research more reproducible. By practicing and reinforcing the use of quantitative tools, participants will be better able to make insights that would otherwise be hidden.

1.4 Course Materials

R for Data Science by G. Grolemund and H. Wickham (https://r4ds.had.co.nz/). This is R4DS in the syllabus.

bookdown: Authoring Books and Technical Documents with R Markdown by Y. Xie (https://bookdown.org/yihui/bookdown/). This is BD in the syllabus.

Tufte, E. R. (2001). The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press. This is Tufte in the syllabus.

1.5 Course Scheduling

Class Number Day Assignments Reading
1 Thursday, January 19, 2023 Syllabus, RStudio and R, R Packages, Useful Shortcuts Leaf 01, 02, 03, R4DS 08
2 Tuesday, January 24, 2023 Leaf Lab at Bays and Bayous Jan. 24 and 25.
3 Thursday, January 26, 2023 Data input and output Leaf 04
4 Tuesday, January 31, 2023 Data Classes Leaf 05
5 Thursday, February 2, 2023 Leaf at Southern Division of AFS Meeting
6 Tuesday, February 7, 2023 Working in R Leaf 06
7 Thursday, February 9, 2023 Indexing and Logical Operators Leaf 08
8 Tuesday, February 14, 2023 Loops Leaf 09
9 Thursday, February 16, 2023 Leaf Lab at MS Chapter of the AFS Annual Meeting
10 Tuesday, February 21, 2023 Mardi Gras Holiday
11 Thursday, February 23, 2023 Loops (more loops) Leaf 09
12 Tuesday, February 28, 2023 R for Data Science, Script Anatomy and Organization, Assignment 01 Due R4DS 01, 02
13 Thursday, March 2, 2023 Data Transformation R4DS 05
14 Tuesday, March 7, 2023 Data Transformation R4DS 05
15 Thursday, March 9, 2023 Pipes R4DS 18
16 Tuesday, March 14, 2023 USM Spring Break
17 Thursday, March 16, 2023 USM Spring Break
18 Tuesday, March 21, 2023 Data Wrangling and Tibbles R4DS 09, 10
19 Thursday, March 23, 2023 Tidy Data R4DS 12
20 Tuesday, March 28, 2023 Relational Data R4DS 13
21 Thursday, March 30, 2023 Factors R4DS 15
22 Tuesday, April 4, 2023 Dates and Times R4DS 16
23 Thursday, April 6, 2023 Pipes, Assignment 02 Due R4DS 19
24 Tuesday, April 11, 2023 Graphical Display Tufte
25 Thursday, April 13, 2023 ggplot2 I R4DS 03
26 Tuesday, April 18, 2023 ggplot2 II R4DS 03
27 Thursday, April 20, 2023 ggplot2 III R4DS 28
28 Tuesday, April 25, 2023 Rmarkdown I, Assignment 03 Due R4DS 27, 29, 30
29 Thursday, April 27, 2023 Rmarkdown II R4DS 27, 29, 30
30 Tuesday, May 2, 2023 Preliminary Project Presentations I
31 Thursday, May 4, 2023 Bookdown BD 01 to BD 02
32 May 8 to May 11, 2023 Final Project Presentation - Date and Time TBD, Assignment 04 Due

1.6 Course Workload Statement

Students are expected to invest considerable time outside of class in learning the material for this course. The expectation of the University of Southern Mississippi is that students should spend approximately 2 to 3 hours outside of class each week for every hour in class working on reading, assignments, studying, and other work for the course. Time management is thus critical for student success. All students should assess their personal circumstances and talk with their advisors about the appropriate number of credit hours to take each term. Resources for academic support can be found at https://www.usm.edu/success.

1.7 Course Evaluation

Percentage Letter Grade
93-100 A
90-92 A-
86-89 B+
83-85 B
80-82 B-
76-79 C+
73-75 C
70-72 C-
66-69 D+
63-65 D
60-62 D-
< 60 F

1.8 Assignment Policy and Procedures

All assigned work (Assignments and Project) will be due at the beginning of class on its assigned due date. You will be submitting your code to me, via email at .r files and I will check that the code runs properly, grade the assignment, and provide feedback within five business days. Late work will not be given full credit.

To receive full credit, all code must run on all my machine and return all required components of the assignment. You may turn in any assignment as many times as necessary to ensure that you receive credit.

1.9 Grading scale

Evaluation type Number Points per item Total points
Assignments 4 10 40
Preliminary Project Presentations 1 20 20
Final Project 1 10 10

1.10 Content of this online material.

The material presented on this site is derived from a few different online and published sources. These sources are not explicitly cited and the intention is for the presented material to be referenced with the following books (on reserve in the library).

  • Crawley, M. J. (2013). The R book. New York: Wiley. ISBN: 9781118448908 1118448901 9781118448946 1118448944 9781118448960 1118448960

  • Teetor, P. (2011). R cookbook. Beijing: O’Reilly. ISBN: 9780596809157 0596809158

  • Tufte, E. R. (2001). The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press. ISBN: 0-9613921-4-2

  • Wickham, Hadley (2014). Advanced R. Routledge. ISBN-10 : 9781466586963

  • Wickham, Hadley and Grolemund, Garret (2017). R for Data Science. O’Reilly Media. ISBN-13: 978-1491910399

  • Wickham, Hadley (2016).ggplot2: Elegant Graphics for Data Analysis (Use R). Springer. ISBN-13: 978-3319242750