Chapter 10 Histograms
In this chapter, we will learn to
- build histogram
- specify bins
- bin width
- line type
- line size
- map aesthetics to variables
A histogram is a plot that can be used to examine the shape and spread of continuous data. It looks very similar to a bar graph and can be used to detect outliers and skewness in data. The histogram graphically shows the following:
- center (location) of the data
- spread (dispersion) of the data
- presence of multiple modes
To construct a histogram, the data is split into intervals called bins. The intervals may or may not be equal sized. For each bin, the number of data points that fall into it are counted (frequency). The Y axis of the histogram represents the frequency and the X axis represents the variable.
## # A tibble: 1,000 x 11 ## id referrer device bouncers n_visit n_pages duration country purchase ## <dbl> <chr> <chr> <lgl> <dbl> <dbl> <dbl> <chr> <lgl> ## 1 1 google laptop TRUE 10 1 693 Czech ~ FALSE ## 2 2 yahoo tablet TRUE 9 1 459 Yemen FALSE ## 3 3 direct laptop TRUE 0 1 996 Brazil FALSE ## 4 4 bing tablet FALSE 3 18 468 China TRUE ## 5 5 yahoo mobile TRUE 9 1 955 Poland FALSE ## 6 6 yahoo laptop FALSE 5 5 135 South ~ FALSE ## 7 7 yahoo mobile TRUE 10 1 75 Bangla~ FALSE ## 8 8 direct mobile TRUE 10 1 908 Indone~ FALSE ## 9 9 bing mobile FALSE 3 19 209 Nether~ FALSE ## 10 10 google mobile TRUE 6 1 208 Czech ~ FALSE ## # ... with 990 more rows, and 2 more variables: order_items <dbl>, ## # order_value <dbl>
10.2.1 Data Dictionary
- id: row id
- referrer: referrer website/search engine
- os: operating system
- browser: browser
- device: device used to visit the website
- n_pages: number of pages visited
- duration: time spent on the website (in seconds)
- repeat: frequency of visits
- country: country of origin
- purchase: whether visitor purchased
- order_value: order value of visitor (in dollars)
To create a histogram, we will use
geom_histogram() and specify the variable
aes(). In the below example, we create histogram of the variable
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Now that we know how to create a histogram, let us learn to modify its
appearance. We will begin with the background color. Use the
to modify the background color of the histogram. In the below case, we change
the color of the histogram to ‘blue’.
As we have learnt before, the transparency of the background color can be
modified using the
alpha argument. It can take any value between
The color of the histogram border can be modified using the
The color can be specified either using its name or the associated hex code.
10.5 Putting it all together…
Let us modify the bins, the background and border color of the histogram in the below example.
10.6 Bin Width
Another way to control the number of bins in a histogram is by using the
binwidth argument. In this case, we specify the width of the bins instead
of the number of bins. As you can see, in the below example, we do not use
bins argument when using the
binwidth argument. You can use either of
them but not both.
10.7 Line Type
The line type of the histogram border can be modified using the
argument. It can take any integer value between
10.8 Line Size
size argument to modify the width of the border of the histogram bins.
It can take any value greater than
10.9 Map Variables
You can map the aesthetics to variables as well. In the below example, we map
fill to the device variable. You can try mapping color, linetype and size to
variables as well.