Chapter 6 Scatter Plots
In this chapter, we will:
- build scatter plots
- modify point
- fit regression line
6.2 Basic Plot
As we did in the previous chapter, let us begin by creating a scatter plot using
geom_point() to examine the relationship between displacement and miles per
gallon using the mtcars data.
If you want to avoid over plotting, use the
position argument and supply it
'jitter'. It adds random noise to a plot and makes it easier to
Another way to avoid over plotting is to use
Now let us modify the appearance of the points. There are two ways:
- specify values
- map them to variables using
6.4.1 Specify Values
To modify the color of the points, you can use the
color argument and
supply it a valid color name. In the below example, we change the color of the
'blue'. Keep in mind that the
color argument should be outside
The transparency of the color can be modified using the
alpha argument. It
takes values between
The shape of the points can be modified using the
shape argument. It
takes values between
The size of the points can be modified using the
size argument. It can take
any value greater than
6.4.2 Map Variables
So far, we have specified values for color, shape, size etc. Now, let us map
them to variables using
You can modify the color of the points by mapping them to a variable using
aes(). It allows you to examine the relationship between two continuous
variables at different levels of a categorical variable.
The color can be mapped to a conitnuous variable as well and in this case you will be able to examine the relationship betweem two continuous variable for a range of value of a third variable.
Shape can be mapped to categorical variables. In the below example, we use
factor() to convert
cyl to categorical data before mapping shape to it.
ggplot2 will throw an error if you map shape to a continuous variable.
Size must be always mapped to continuous variables. In the below example, we
have mapped size to
If you map size to categorical data as shown in the below example, ggplot2 will throw a warning.
## Warning: Using size for a discrete variable is not advised.
6.5 Regression Line
geom_smooth() allows us to fit a regression line to the plot. By default it
will use least squares method to fit the line but you can also use the loess
method. In the below example, we fit a regression line using the least squares
technique by supplying the value
'lm' to the
se argument will add a confidence interval around the regression line,
if set to
18.104.22.168 Conf. Interval
22.214.171.124 Loess Method
In the below example, we use the loess method instead of the default least squares method to fit the regression line.
6.5.1 Intercept & Slope
If you know the intercept and the slope of the line, you can use
Let us regress
disp and then use the result to add the line.
## ## Call: ## lm(formula = mpg ~ disp, data = mtcars) ## ## Coefficients: ## (Intercept) disp ## 29.59985 -0.04122