Chapter 3 Aesthetics
3.1 Introduction
In this chapter, we will focus on the aesthetics i.e. color, shape, size, alpha,
line type, line width etc. We can map these to variables or specify values for
them. If we want to map the above to variables, we have to specify them within
the aes() function. We will look at both methods in the following sections.
Explore aesthetics such as
- color
- shape
- size
- fill
- alpha
- width
3.2 Libraries, Code & Data
We will use the following libraries in this chapter:
All the data sets used in this chapter can be found here and code can be downloaded from here.
3.2.1 Data
ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv')
ecom## # A tibble: 1,000 x 11
##       id referrer device bouncers n_visit n_pages duration country      purchase
##    <dbl> <chr>    <chr>  <lgl>      <dbl>   <dbl>    <dbl> <chr>        <lgl>   
##  1     1 google   laptop TRUE          10       1      693 Czech Repub~ FALSE   
##  2     2 yahoo    tablet TRUE           9       1      459 Yemen        FALSE   
##  3     3 direct   laptop TRUE           0       1      996 Brazil       FALSE   
##  4     4 bing     tablet FALSE          3      18      468 China        TRUE    
##  5     5 yahoo    mobile TRUE           9       1      955 Poland       FALSE   
##  6     6 yahoo    laptop FALSE          5       5      135 South Africa FALSE   
##  7     7 yahoo    mobile TRUE          10       1       75 Bangladesh   FALSE   
##  8     8 direct   mobile TRUE          10       1      908 Indonesia    FALSE   
##  9     9 bing     mobile FALSE          3      19      209 Netherlands  FALSE   
## 10    10 google   mobile TRUE           6       1      208 Czech Repub~ FALSE   
## # ... with 990 more rows, and 2 more variables: order_items <dbl>,
## #   order_value <dbl>3.2.2 Data Dictionary
- id: row id
- referrer: referrer website/search engine
- os: operating system
- browser: browser
- device: device used to visit the website
- n_pages: number of pages visited
- duration: time spent on the website (in seconds)
- repeat: frequency of visits
- country: country of origin
- purchase: whether visitor purchased
- order_value: order value of visitor (in dollars)
3.3 Color
In ggplot2, when we mention color or colour, it usually refers to the color
of the geoms. The fill argument is used to specify the color of the shapes in
certain cases. In this first section, we will see how we can specify the color
for the different geoms we learnt in the previous chapter.
3.3.1 Point
For points, the color argument specifies the color of the point for certain
shapes and border for others. The fill argument is used to specify the
background for some shapes and will not work with other shapes. Let us look at
an example:
ggplot(mtcars, aes(x = disp, y = mpg, color = factor(cyl))) +
  geom_point()
We can map the variable to color in the geom_point() function as well since
it inherits the data from the ggplot() function.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(aes(color = factor(cyl)))
If you do not want to map a variable to color, you can specify it separately
using the color argument but in this case it should be outside the aes()
function.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(color = 'blue')
Now we will change the shape of the points to understand the difference between
color and fill arguments. It can be again mapped to variables or values.
Let us map shape to variables.
ggplot(mtcars, aes(x = disp, y = mpg, shape = factor(cyl))) +
  geom_point()
Let us map shape to cyl in the geom_point() function. Remember, when you
are mapping an aesthetic to a variable, it must be inside aes().
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(aes(shape = factor(cyl)))
Instead of mapping shape to a variable, let us specify a value for shape. In
this case, shape is not wrapped inside aes() as we are not mapping it to
a variable.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(shape = 5)
Let us specify a color for the point using color argument.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(shape = 5, color = 'blue')
Background color cannot be added for all shapes. In the below example, we try
to modify the background color using the fill argument but it does not work.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(shape = 5, fill = 'blue')
Since the shape number is now greater than 21, fill argument will add background color
in the below case.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(shape = 22, fill = 'blue')
In shapes greater than number 21, color argument will modify the border of the shape.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(shape = 22, color = 'blue')
Let us map size of points to a variable. It is advised to map size only to continuous variables and not categorical variables.
ggplot(mtcars, aes(x = disp, y = mpg, size = disp)) +
  geom_point()
If you map size to categorical variables, ggplot2 will throw a warning.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(size = 4)
To modify the opacity of the color, use the alpha argument.
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(aes(alpha = factor(cyl)), color = 'blue')## Warning: Using alpha for a discrete variable is not advised.
3.4 Line Chart
So far we have focussed on geom_point() to learn how to map aesthetics to
variables. To explore line type and line width, we will use geom_line(). In
the previous chapter, we used geom_line() to build line charts. Now we will
modify the appearance of the line. In the section below, we will specify values
for color, line type and width. In the next section, we will map the same to
variables in the data. We will use a new data set. You can download it from
here. It
contains GDP (Gross Domestic Product) growth data for the BRICS (Brazil,
Russia, India, China, South Africa) for the years 2000 to 2005.
3.4.1 Data
gdp <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/gdp.csv')## Warning: Missing column names filled in: 'X1' [1]A line chart can be created using geom_line(). In the below example, we
examine the GDP trend of India and modify the color of the line to 'blue'.
ggplot(gdp, aes(year, india)) +
  geom_line(color = 'blue')
To modify the line type, use the linetype argument. It can take values
between 1 and 5.
ggplot(gdp, aes(year, india)) +
  geom_line(linetype = 2)
The line type can also be mentioned in the following way:
ggplot(gdp, aes(year, india)) +
  geom_line(linetype = 'dashed')
The width of the line can be modified using the size argument.
ggplot(gdp, aes(year, india)) +
  geom_line(size = 2)
Now let us map the aesthetics to the variables. The data used in the above
example cannot be used as we need a variable with country names. We will use
gather() function from the tidyr package to reshape the data.
gdp2 <- 
  gdp %>% 
  select(year, growth, india, china) %>% 
  gather(key = country, value = gdp, -year)
gdp2## # A tibble: 18 x 3
##    year       country   gdp
##    <date>     <chr>   <dbl>
##  1 2000-01-01 growth      6
##  2 2001-01-01 growth      9
##  3 2002-01-01 growth      8
##  4 2003-01-01 growth      9
##  5 2004-01-01 growth      9
##  6 2005-01-01 growth      8
##  7 2000-01-01 india       5
##  8 2001-01-01 india       9
##  9 2002-01-01 india       8
## 10 2003-01-01 india       8
## 11 2004-01-01 india       5
## 12 2005-01-01 india       7
## 13 2000-01-01 china       8
## 14 2001-01-01 china       5
## 15 2002-01-01 china       6
## 16 2003-01-01 china       8
## 17 2004-01-01 china       9
## 18 2005-01-01 china       8To map the aesthetics to a variable, we must use the group argument. In the
below example, we map the aesthetics to country. But we cannot distinguish
between the lines as their color, width and line type are the same. We have
easily plotted the GDP trend of all countries using the group argument. Now,
let us ensure that we can distinguish and identidy them using different
aesthetics.
ggplot(gdp2, aes(year, gdp, group = country)) +
  geom_line()
Let us begin by ensuring that the lines have different color using the
color argument within aes() and assigning it the variable country.
ggplot(gdp2, aes(year, gdp, group = country)) +
  geom_line(aes(color = country))
Instead of color, now we modify the line type using the linetype argument.
ggplot(gdp2, aes(year, gdp, group = country)) +
  geom_line(aes(linetype = country))
In the below instance, we assign different width to the lines using the size
argument.
ggplot(gdp2, aes(year, gdp, group = country)) +
  geom_line(aes(size = country))## Warning: Using size for a discrete variable is not advised.
Before we wrap up, let us quickly see how we can map aesthetics to variables for different plots.
3.5 Bar Plots
Here we create a stacked bar plot by mapping fill to purchase.
ggplot(ecom, aes(device, fill = purchase)) +
  geom_bar()
3.6 Histograms
Instead of a bar chart, we create a histogram and again map fill to
purchase.
ggplot(ecom) +
  geom_histogram(aes(duration, fill = purchase), bins = 10)
3.7 Box Plots
We repeat the same exercise below, but replace the bar plot with a box plot.
ggplot(ecom) +
  geom_boxplot(aes(device, duration, fill = purchase))
In all the above cases, you can observe that when we are mapping aesthetics
such as color, fill, shape, size or linetype to variables, they are all wrapped
inside aes().