Grammar of graphics

Lecture 2

2024-05-20

Warm up

Announcements

  • Don’t be intimidated! Anyone can code!

  • If you have not yet accepted the invite to join the course GitHub Organization, please do so asap!

  • Office hours + locations are now linked at https://sta199-summer24.github.io/. Please let me know if you aren’t able to attend any of them.

Announcements

  • Lab 1 after this class.
  • Catch up with the prepare materials
  • AEs should be submitted by midnight on Sunday. To “submit”, commit and push at least once to your ae repo for each application exercise this week.

From the survey

  • More than half of the class mentioned being nervous about coding – you’re not alone! 💙

  • A good number of students mentioned being nervous about being new to stats + data science – you’re also not alone! 💜

  • About exams: No coding on the in class (time-limited) exam + there will be practice exams

  • About teams: Peer evaluations + milestones throughout semester

  • Grading criteria: Scale of 0-5 for each exercise on lab + more details in lab instructions

  • Hardware: Code running on university containers

Toolkit: Version control and collaboration

Git and GitHub

Git logo

  • Git is a version control system – like “Track Changes” features from Microsoft Word, on steroids
  • It’s not the only version control system, but it’s a very popular one

GitHub logo

  • GitHub is the home for your Git-based projects on the internet – like DropBox but much, much better

  • We will use GitHub as a platform for web hosting and collaboration (and as our course management system!)

Versioning - done badly

Versioning - done better

Versioning - done even better

with human readable messages

How will we use Git and GitHub?

How will we use Git and GitHub?

How will we use Git and GitHub?

How will we use Git and GitHub?

Git and GitHub tips

  • There are millions of git commands – ok, that’s an exaggeration, but there are a lot of them – and very few people know them all. 99% of the time you will use git to add, commit, push, and pull.
  • We will be doing Git things and interfacing with GitHub through RStudio, but if you google for help you might come across methods for doing these things in the command line – skip that and move on to the next resource unless you feel comfortable trying it out.
  • There is a great resource for working with git and R: happygitwithr.com. Some of the content in there is beyond the scope of this course, but it’s a good place to look for help.

Tour: Git + GitHub

Just one option for now:

Sit back and enjoy the show! Remember you can access our github classroom through the course website.

Data visualization

Examining data visualization

Discuss the following for the visualization.

04:00

Source: Twitter

ae-02-bechdel-dataviz

  • Go to the course GitHub org and find your ae repo (repo name will be suffixed with your GitHub name).
  • Clone the repo in your container, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline (end of this week)

Recap of AE

  • Construct plots with ggplot().
  • Layers of ggplots are separated by +s.
  • The formula is (almost) always as follows:
ggplot(DATA, aes(x = X-VAR, y = Y-VAR, ...)) +
  geom_XXX()
  • Lots of geom_XXX() options: ggplot2.tidyverse.org/reference/!
  • Aesthetic attributes of a geometries (color, size, transparency, etc.) can be mapped to variables in the data or set by the user, e.g. color = binary vs. color = "pink".
  • Use facet_wrap() when faceting (creating small multiples) by one variable and facet_grid() when faceting by two variables.