R Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. Note that R Markdown works in R Studio, and the packages rmarkdown and knitr must have been installed. In this handout we will discuss the basics of R Markdown. If you are able to duplicate this handout, you have covered the basics!
Starting a new R Markdown document is very easy. Simply follow the path File -> New File -> R Markdown, or use the drop-down menu on the top-left corner of R Studio. A dialogue box will appear where you will be able to enter document title and author name, and select the output format to be HTML, PDF or WORD. Choose HTML for this exercise. The default document will come with some basic code. You may update this code and start creating your own document. When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
Let’s first discuss how to add web links to a document. If you simply type the http address, it will appear as a link in the final document. For example, http://rmarkdown.rstudio.com. More often, we want to type a linked phrase, which can be done by typing the phrase in square brackets, followed by the http address in regular paranthesis. For example, R Markdown.
R Markdown website has all the resources you need, just follow the Get Started tab to browse the resources. The site contains an excellent Cheat Sheet that you you may consult anytime you use R Markdown. Another very useful introductory resource is the Markdown Basics
As you may see in the Cheat Sheet (part 3, syntax) you may create an unordered list by simply putting a star at the beginning of a new line. Let us create an unordered list which includes the Basic Components of a Professional Document.
In what follows, we will address each component briefly.
A very important keystroke in R Markdown is the backtick: `
. While knitting, R Markdown will not compile the code inside two backticks, instead it will only include it in the text with a typewriter font
. In general typesetting practice, this is known as verbatim mode. In this document we use verbatim mode for writing a syntax without seeing its usual effect. If you cannot find the backtick sign on your keyboard, simply copy from here and paste.
As you may see in the Cheat Sheet (part 3, syntax) or the Markdown Basics, headers must start with #
signs. More #
signs produce smaller headers, indicating a subsection. Depending on your taste, a section or subsection may start with a number.
You may use **boldface**
for boldface and *italic*
for italic text. If you want to change the color of some text, use <span style="color:red">your text here</span>
In ordered lists, items start on a new line with numbers followed with a dot. In unordered lists, items start on a new line with *
. See the Cheat Sheet for details and sub-items. Here is an example:
The leading typesetting tool in the scientific community is LaTex (read as lay-tech), which is known for its beautiful mathematical expressions. LaTex type expressions can easily be included in an R Markdown document. If you want to produce an inline expression, just write the LaTex expression inside $
signs, which will put you to the math mode. For example, $y=x^2$
will produce \(y=x^2\). See the below section More on LaTex Equations for more details.
Simple tables may be created by separating columns with |
sign and adding -|-
below the first row to indicate that this is a table. See the Cheat Sheet for a simple example and produce the following.
Column 1 | Column 2 |
---|---|
Cell a | Cell b |
Cell c | Cell d |
R data frames may be displayed as tables in R Markdown. Details are in the below section Tables in R Markdown.
The biggest advantage of using R Markdown is the fact that R code may be embedded to the source code, and the output for this code may be included in the final document. The R code must be written into the so called code chunks. You can quickly insert a code chunk into your file with one of these three methods:
Keyboard shortcut Ctrl + Alt + I (OS X: Cmd + Option + I)
Add Chunk command in the editor toolbar (green insert button in R studio)
Typing the chunk delimiters ```{r}
and ```
.
For example, let us examine the mtcars
data set available in R. In this simple code, we will display the first six rows of data using head()
and get a simple numerical summary using summary()
.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
As you see, the final document displays the original code and the R output generated from this code. Sometimes we may want the output in the final document, but not the code. Other times we want the code, but not the output. Sometimes we want both. To accomodate all these needs, you may use the chunk options. Chunk output can be customized with these options, which are set in the chunk header. Below are some basic options. Click here for more details on code chunks.
include = FALSE
prevents code and results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks.echo = FALSE
prevents code, but not the results from appearing in the finished file. This is a useful way to embed figures.message = FALSE
prevents messages that are generated by code from appearing in the finished file.As a practice, here is a little code for calculating the mean of the mpg
column in the mtcars
data. I will set echo
parameter to FALSE
by typing ```{r,echo=FALSE}
in the chunk header, so you will only see the output, not the code.
## [1] 20.09062
Hiding the code is especially very useful for adding powerful R graphics to your documents. Here is a simple example producing the histogram of mpg
column in the mtcars
data. Code is hidden.
We covered hot to create simple tables above, but for larger tables this is not practical. Moreover, the way we displayed the mtcars
data above is also not very professional. Since we have the data available as an R data frame, we may use the kable()
function of the knitr
package to display the data nicely. Again, we wil only display the first six rows to save space here. As you see, this way of presenting data is much more professional.
library(knitr)
kable(mtcars[1:6,])
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Sometimes we may want to include images to our document. In R Markdown, this can be done by include_graphics()
function of the knitr
package. You have to specify the path for your image file which have either .pdf
, .jpg
, or .png
extensions. Here is one example.
include_graphics("jamaica.jpg")
Beautiful mathematical typesetting is a very valuable skill. As mentioned above, R Markdown may call LaTex functions to produce high quality expressions. LaTex is not easy to learn, and that is not the purpose of this course. However, using it inside R Markdown is easy. Here is a nice web page summarizing some LaTex functions. Let us practice some basics now.
$x^2$
will produce \(x^2\), $H_0$
will produce \(H_0\)$\alpha$
will produce \(\alpha\), $\beta_0$
will produce \(\beta_0\)$\sqrt{2}$
will produce \(\sqrt{2}\), $\bar{x}$
will produce \(\bar{x}\), $A\cup B$
will produce \(A\cup B\), $\log{x}$
will produce \(\log{x}\)$\frac{1}{2}$
will produce \(\frac{1}{2}\)If you want to write a mathematical expression as a new line (equation), write the LaTex expression inside $$
signs. For example, $$y=\frac{x^2}{2}.$$
will produce \[y=\frac{x^2}{2}.\] Let us write the simple linear regression model: \[
Y=\beta_0+\beta_1 X + \epsilon,
\] where \(Y\) is the dependent variable, \(X\) is the independent variable, \(\beta_0\) nad \(\beta_1\) are the regression coefficients, and \(\epsilon\) is the random error term with \(\epsilon\sim N(0,\sigma)\). Let us write the probability density function (p.d.f.) of a normal distribution with mean \(\mu\) and standard deviation \(\sigma\):
\[
f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}},
\] where \(-\infty<x<\infty\), \(-\infty<\mu<\infty\), and \(\sigma>0\). Just for fun, let us plot the p.d.f. of a standard normal distribution. You may control the size of your plot using the fig.height
and fig.width
parameters by typing ```{r,fig.height=3,fig.width=5}
in the code chunk header.
x=seq(-4,4,by=0.1)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l",lwd=2,col="steelblue",main="Standard Normal Distribution",xlab="",ylab="")