There’s also no reason windspeed shouldn’t start at zero since we’re close to that anyway. Mtext(side = 2, "Ozone Concentration (ppb)", line = 2.8, cex = 1.4, font = 2)Īlmost done! I like to add a bit of white space around the edges of the points so that they don’t experience any “edge effects” and allow you to figuratively “stand back” when looking at all of the data. # add in the labels for each axis: mtext(side = 1, "Wind Speed (mph)", line = 2.8, cex = 1.5, font = 2) # add the Ozone axis: axis(side = 2, at = seq( 0, max(airquality $Ozone, na.rm = T), 20), hadj = 0.8, las = 2) # add the wind speed axis: axis(side = 1, seq( 0, max(airquality $Wind, na.rm = T), 2), padj = -0.8)
Plot(Ozone ~ Wind, data =airquality, xaxt = "n", yaxt = "n", ylab = "", xlab = "", pch = 16, cex = 1.5, Play around with all those parameters to see how it changes the figure. You can set ‘font’ to 1, 2, 3, or 4, for normal, bold, italic, or italic + bold respectively. ‘cex’ affects the size of the text, and finally, ‘font’ is used to make the text bold. The ‘line’ argument is how far from the edge of the plot you want the label to appear. It’s always important to add units to these labels, which I did. Next, add in the axis name labels using the ‘mtext’ function. Finally, the ‘las’ argument when set to 2, turns the y axis tick marks horizontally so that they are more easily readable and all fit on the axis neatly. Play around with those values to see what you get. The ‘padj’ and ‘hadj’ (perpendicular adjust and horizontal adjust) arguments are used to nudge the axis tickmark label text so that it lines up more neatly. # Then use that in the 'seq()' function to create the sequence of places for the tickmarks: seq(from = 0, to = max(airquality $Wind, na.rm = T), by = 2) In this case we are using the built-in dataset on air quality measurements in New York from May through September in 1973:
SCATTER PLOT IN R STUDIO HOW TO
In the future I may update this post with how to do this using ggplot2.įirst, let’s load the data. The key is just to include a few additional parameters and functions. Here is a simple tutorial on how to re-create the nice version of the plot above using the ‘base’ R package. Unfortunately, even ggplot2-which is commended for the ease with which one can make good quality visualizations-is not so pretty right out of the box. I have a hunch that the main reason plots such as the first one above are so common is simply due to a lack of knowing how to easily customize plots in R. For that reason, it is important that we take a subjective, and dare I say aesthetic, approach towards ensuring scatterplots (and all other plot types, really) are visually appealing and easy to understand on a quick glance. Unlike a statistical test, the goal of data visualizations is subjective- to help a viewer understand a particular relationship or story. In other words, while the data may be accurate, the actual visual design of scatterplots is often overlooked and unattended. There are several issues here, but without elaborating, here are the same data after a few visual tweaks: Here’s a typical example of the type of plot I’ve seen one-too-many times:
Check out this really cool article from the New Yorker about ‘When graphs are a matter of life and death’ for more history on the subject.Īll through my grad school years and beyond, I’ve repeatedly come across scatterplots that almost defeat the purpose of helping us easily understand the relationship between two variables. They are a powerful tool, but one that I believe merits a bit more attention. We might take them for granted by their simplicity, but we shouldn’t assume the seeming intuition with which we can see and comprehend these figures. They present the relationship between two continuous variables. The formats I use the most are comma and label_number_si() which format large numbers in a more-readable way.Scatterplots are one of the most common types of data visualizations you will encounter as a biologist.
To keep it short, graphics in R can be done in three ways, via the: R is known to be a really powerful programming language when it comes to graphics and visualizations (in addition to statistics and data science of course!).