CUNY Borough of Manhattan Community College Data Science Worksheet
1. This week, we’ll work with a dataset from sampling plankton in the Plum Island Estuary by the PIE Long Term Ecological Research site. This dataset is in an excel file with both metadata and data. There’s a lot of information in it, and we’ll come back to this dataset a few times through the semester.
1A. Load up the plankton data using the
readxl library, and generate a scatterplot of the relationship between
TotalChlA. Is there more Chlorophyll when there are more Chlorophytes?
1B. Many processes can modify this relationship. They all tend to covary with distance from the mouth of the estuary, where it empties into the ocean and is highly saline. Maybe distance from estuary mouth –
Distance – affects the relationship between Chlorophytes and total chlorophyll? Can you see any pattern of how distance alters this relationship by coloring the points by Distance? Use something other than the default color scale.
1C. As distance is continuous, any patterns might still be hard to see. What if we made a discrete variable out of distance using
cut_interval and used
facet_wrap to see its influence. What patterns do you see?
1D. As the estuary was sampled at times of year where temperature varied, and distance from mouth might have a different effect under cold v. warm temperatures, let’s look at whether temperature and distance act in concert using facets. What do you see if you create a discrete variable from
cut_interval and then make a
facet_grid plot looking at the effects of both temperature and distance from mouth?
1E. Last, are your answers from A-D made clearer or not by changing the scale of the x and y axes with log10 or any other transformation of x or why axes? Why or why not does a transformation help?
2. Let’s make this plot look good! Choose one of the plots that you worked on in part 1.
2A. Give it a title with
ggtitle(). Change the x and y axis names with
2B. Now, let’s theme it using the
ggthemes package. Look through the theme options it gives you. Choose one, and implement it (e.g., add
theme_bw(base_size=12)) to your plot. Why did you choose this theme? What about it aids in your visualization?
2C. Extra credit – look at the
theme help file. Customize your plot even more using
theme() and justify your choices.
3. What is your favorite data visualization. Grab a jpg of it and put it into this RMarkdown document (you’ll need look at how to get images into RMarkdown documents and you’ll need to submit it to us along with the homework so we can compile the document). Bonus point if you archive (think zip files) the RMD and JPEGs and submit them together!
Now tell us why this is your favorite example of a data visualization.
4. It’s time to start thinking about your final project. Either use your own data or find something in the datasets I’ve assembled for you. Find one dataset that you think might be interesting. Briefly describe it and make one plot from the data you can download.