Introduction to ggplot2

Author

Yu Cheng Hsu

Published

August 6, 2025

Introduction to ggplot2

In this chapter, we will introduce ggplot2, a powerful plotting library in R for creating elegant and complex visualizations. We will guide you through the basics of ggplot2, including its grammar of graphics approach, and demonstrate how to create various types of plots.

What is ggplot2?

ggplot2 is part of the tidyverse collection of R packages developed by Hadley Wickham in his work (Wickham 2010). He received the COPSS Presidents’ Award for his contributions to the tidyverse collections. It is based on the grammar of graphics (Wilkinson 2011). As part of the tidyverse collection, it shares a similar framework that allows users to build plots layer by layer. This approach makes it highly flexible and intuitive once you understand its core concepts.

Basic Concepts of the Grammar of Graphics

Discussion

Based on the data we collected in the first lecture:

  • What kind of message can be expressed through visualization?
  • What kind of graph will you use?
  • Why will you choose this graph to express such an idea?

Layers of a graph from Wickham (2016b)

Layers of a graph from Wickham (2016b)

The concept of the grammar of graphics was first proposed in Wilkinson (2011) (I cite the second edition, but it was actually described in the first edition in 2005). It describes the seven basic elements of a statistical graph:

  1. Data:
    • The information to visualize.
  2. Mapping:
    • How data variables connect to aesthetic attributes.
    • Displayed as x, y, color, shape, etc.
  3. Layer:
    • Combines geometric elements (geoms: points, lines, polygons) and statistical transformations (stats: e.g., binning for histograms, fitting models).
    • Represents what is visually displayed in the plot.
  4. Scales:
    • Map data values to aesthetic values (e.g., color, size).
    • Generate legends and axes for reading original data values.
  5. Coordinate System (Coord):
    • Defines how data is mapped to the plot plane.
    • Provides axes and gridlines (e.g., Cartesian, polar, map projections).
  6. Facet:
    • Breaks data into subsets for small multiple plots (also called conditioning or trellising).
  7. Theme:
    • Adjusts visual elements like fonts and colors.
    • Default settings in ggplot2 are carefully chosen, but customization may require references like Tufte (1990, 1997, 2001).

Although this approach can identify individual elements of a statistical graph, it has several critiques:

  1. What graph should I use
  2. This framework does not work well in the programming language setting, and later Wickham (2010) implicitly modified these layers
  3. It does not describe an interactive graph

Choosing the visualization

Desciding which plot to use is sometime ambiguous from the user. The following question and decision flow chart is helpful for you to sort out which kind of graph you need to use ( at least within the scope of this course)

  1. What is the purpose of displaying graph?
  2. What are the types of data you gonna present?

flowchart LR
  A{Data type} -->|Continuous| B{Purpose}
  A{Data type}  -->|Discrete| C{Purpose}

  B{Purpose}-->|Exploration| D((Histogram/Boxpot))
  B{Purpose} -->|Association| E((Scatter plot))
  B{Purpose} -->|Association+time| T((Line plot))

  C{Purpose}-->|Exploration| F((Bar chart))
  C{Purpose} -->|Association| G((Tree map))

Getting Started

To use ggplot2, you first need to install and load the package in R:

Basic Plot Example

Data and mapping layer

Let’s create a simple scatter plot using the mtcars dataset, which is built into R:

From the code, you can figure out that, data and mapping was encode in the first line of ggplot function

\[ \text{ggplot}(\text{data=}\underbrace{\text{mtcars2}}_{\text{data}}, \text{mapping=}\underbrace{\text{aes(x = mpg, y = cyl, colour=am)}}_{\text{mapping}}) \]

Occasionally (actually, very frequently), you will see people ignoring everything on the left-hand side (LHS) of the equal sign for data, mapping, x, and y as they are standard arguments for ggplot2.

There are also some other mapping options other than color

  • Size
  • Line
    • linetype
    • lineend
    • linejoin
  • Dot
    • Shape

Layers

This series of function are named in the format geom_XXXX

Function name
Histogram geom_hist()
Box chart geom_boxplot()
Bar chart geom_bar()
Scatter chart geom_scatter()
Line chart geom_line()

One variable

Two variables

Scales

Scales are functions (processes) that transform data for the graph. This process is trivial and is done by observing the type of layer and the data, so the only thing that people frequently need to use is to modify the axis/legend display. The series of functions are named in the following format: scale_(AES)_(datatype).

Discrete Continuous
X scale_x_discrete() scale_x_continuous()
Y scale_y_discrete() scale_y_continuous()
color scale_color_discrete() scale_color_discrete()

The argument and its corresponding components are listed in the below table and figure.

Fruit prices
Argument name Axis Legend
name Label Title
breaks Ticks Key
labels Tick label Key Label

Common components of a figure, figure from Wickham (2016b)

Common components of a figure, figure from Wickham (2016b)

Coordinates and facet

Coordinates refer to the coordinate system on the graph. They can help you adjust your plot. In most of the data we will encounter, you sometime need to adjust the range of x and y axis. This can bedome through arguments xlim=c(LOWER_BOUND,UPPER_BOUND) for x axis and ylim=c(LOWER_BOUND,UPPER_BOUND) for y-axis.

Facets facilitate breaking data into several subgraphs, separated by different subgroups. You can specify by the notation 1. one-factor scenario ~ FACTOR_A which generate subplots oer different level of Factor_A ( and you additional specify nrow=x or ncol=x to let it spread over columns or rows). For two-factor scenario Factor_A ~ Factor_B which spreads Factor_B over rows and Factor_Aover columns.

From time to time you might want each subplots share (or not share) same scale (axis). You can specify throught the argumnt scales=XXX in facet_wrap() functions. The options of scales is listed in below.

free x axis fixed x axis
free y axis free free_y
fixed y axis free_x fixed

Theme

There are several available options for the theme of your plot. Meanwhile, there are also third-party packages designing different themes for plots, such as ggtheme.

Reading R documentation

It is by no means possibile to introduce all functions and attribute in this library, even though ggplot2 is a relatively stable library. The key to survive in the coding world is to understand the mechanism and concept of the code. And rest of them you can checkout the official documentation from the library authors

Wrap-up

As a wrap-up, your code is usually in the following format:

\[ \small \begin{aligned} \text{ggplot()}+ &\\ \underbrace{\text{geom\_XXXX(data=DATA,mapping=aes(x,y,color,...))}}_{\text{plotting data}} + &\\ \underbrace{\text{scale\_AES\_TYPE(name="TITLE",breaks="TICK LOC",labels="TICK LAB")}}_{\text{Handeling axis, legend etc}} + & \\ \underbrace{ \text{coord\_cartesian(xlim=c(min,MAX), ylim=c(min,MAX))}}_{\text{Adjust coordinate systems}} + &\\ \underbrace{\text{ggtitle("CHART TITLE")}}_{\text{Plotting title}} \end{aligned} \]

About making graph

From the code introduced, it will be great to reflect on

  • How does the code construction differ from your human process of plotting code?
  • How does the 7 layer graphic language differ from the ggplot syntax?
  • If you could make ggplot easier to use, how would you design it?

Final remarks and acknowledgement

The materials and contents are mostly adapted from Wickham (2016a). You can get the latest edition from the book website, which also covers more advanced topics. For details on how to use the code and each function, you can find the documentation of the ggplot2 library through ??ggplot2.

Bibilography

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28.
———. 2016a. “Getting Started with Ggplot2.” In Ggplot2: Elegant Graphics for Data Analysis, 11–31. Springer.
———. 2016b. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilkinson, Leland. 2011. “The Grammar of Graphics.” In Handbook of Computational Statistics: Concepts and Methods, 375–414. Springer.