Chapter 1 Getting Started With R
1.1 What is R?
R is a freely available “computational language and environment for data analysis and graphics.” R is indispensable for anyone that uses and interprets data. As medical, public health, and research epidemiologists, we use R in the following ways:
 Fullfunction calculator
 Extensible statistical package
 Highquality graphics tool
 Multiuse programming language
We use R to explore, analyze, and understand epidemiological data. We analyze data straight out of tables provided in reports or articles as well as analyze usual data sets. The data might be a large, individuallevel data set imported from another source (e.g., cancer registry); an imported matrix of grouplevel data (e.g, population estimates or projections); or some data extracted from a journal article we are reviewing. The ability to quantitatively express, graphically explore, and describe epidemiologic data and processes enables one to work and strengthen one’s epidemiologic intuition.
In fact, we only use a very small fraction of the R package. For those who develop an interest or have a need, R also has many of the statistical modeling tools used by epidemiologists and statisticians, including logistic and Poisson regression, and Cox proportional hazard models. However, for many of these routine statistical models, almost any package will suffice (SAS, Stata, SPSS, etc.). The real advantage of R is the ability to easily manipulate, explore, and graphically display data. Repetitive analytic tasks can be automated or streamlined with the creation of simple functions (programs that execute specific tasks). The initial learning curve is steep, but in the long run one is able to conduct analyses that would otherwise require a tremendous amounts of programming and time.
Some may find R challenging to learn if they are not familiar with statistical programming. R was created by statistical programmers and is more often used by analysts comfortable with matrix algebra and programming. However, even for those unfamiliar with matrix algebra, there are many analyses one can accomplish in R without using any advanced mathematics, which would be difficult in other programs. The ability to easily manipulate data in R will allow one to conduct good descriptive epidemiology, life table methods, graphical displays, and exploration of epidemiologic concepts. R allows one to work with data in any way they come.
1.2 What is RStudio?
To get started quickly we need tools that makes the process of writing and compiling R code quick and (mostly) pain free. Fortunately for us there is RStudio—it is a free, open source, and powerful integrated development environment (IDE) for R that runs on most operating systems (Windows, Mac, or Linux). After installing R, install RStudio—it has all the tools we need to learn and apply R.
1.3 Who should learn R?
Anyone that uses a calculator or spreadsheet, or analyzes numerical data at least weekly should seriously consider learning and using R. This includes epidemiologists, statisticians, physician researchers, engineers, health economists, health systems analysts, business analysts, and faculty and students of mathematics and science courses, to name just a few. We jokingly tell our staff analysts that once they learn R they will never use a spreadsheet program again (well almost never!).
1.4 Why should I learn R?
To implement numerical methods we need a computational tool. On one end of the spectrum are calculators and spreadsheets for simple calculations, and on the other end of the spectrum are specialized computer programs for such things as statistical and mathematical modeling. However, many numerical problems are not easily handled by these approaches. Calculators, and even spreadsheets, are too inefficient and cumbersome for numerical calculations whose scope and scale change frequently. Statistical packages are usually tailored for the statistical analysis of data sets and often lack an intuitive, extensible, open source programming language for tackling new problems efficiently.^{1} R can do the simplest and the most complex analysis efficiently and effectively.
When we learn and use R regularly we will save significant amounts of time and money. It’s powerful and it’s free! It’s a complete environment for data analysis and graphics. Its straightforward programming language facilitates the development of functions to extend and improve the efficiency of our analyses.
1.5 Where can I get R?
R is available for many computer platforms, including Mac OS, Linux, Microsoft Windows, and others. R comes as source code or a binary file. Source code needs to be compiled into an executable program for our computer. Those not familiar with compiling source code (and that’s most of us) just install the binary program. We assume most readers will be using R in the Mac OS or MS Windows environment. Listed here are useful R links:
 R Project home page at http://www.rproject.org
 R download page at http://cran.rproject.org
 Numerous free tutorials are at http://cran.rproject.org/otherdocs.html
 R Wikibook at http://en.wikibooks.org/wiki/R_Programming
 R Journal at http://journal.rproject.org
To install R for Windows or Mac OS, do the the following:
 Go to http://www.rproject.org;
 From the left menu list, click on the “CRAN” (Comprehensive R Archive Network) link;
 Select a nearby geographic site (e.g., http://cran.cnr.berkeley.edu);
 Select appropriate operating system;
 Select on “base” link;
 For Windows, save
RX.X.Xwin.exe
to the computer; and for Mac OS, save theRX.X.Xpkg
installer package.  Run the installation program and accept the default installation options. That’s it!
1.6 How do I use R?
1.6.1 Using R on our computer
Use R by typing commands at the R command line prompt (>
) and pressing Enter on our keyboard. This is how to use R interactively. Alternatively, from the R command line, we can also execute a list of R commands that we have saved in a text file (more on this later). Here is an example of using R as a calculator:
(4 + 6)^3  2*500/4
## [1] 750
Use the c
function to collect data entered at the console. Name each collection of data, and then perform a numercal operation. In this example we conduct an analysis that is analogous to working in a spreadsheet.
quantity < c(34, 56, 22) # quantity data
price < c(19.95, 14.95, 10.99) # price data
subtotal < quantity*price # subtotal cost
cbind(quantity, price, subtotal) # column bind into spreadsheetlike display
## quantity price subtotal
## [1,] 34 19.95 678.30
## [2,] 56 14.95 837.20
## [3,] 22 10.99 241.78
1.6.2 Does R have epidemiology programs?
The default installation of R does not have packages that specifically implement epidemiologic applications; however, many of the statistical tools that epidemiologists use are readily available, including statistical models such as unconditional logistic regression, conditional logistic regression, Poisson regression, Cox proportional hazards regression, and much more. Table 1.1 lists selected R packages with biostatistical or mathematical modeling methods applied to epidemiologic problems. The focus of this book is learning how to use R without relying on a specific packages. Learning the R basics covered in this book will help one take full advantage of these and other R packages, some of which address advanced topics such as network modeling of epidemics.
Packages  Description 

dagbag  Learning directed acyclic graphs (DAGs) through bootstrap aggregating 
dagitty  Graphical Analysis of Structural Causal Models 
dagR  R functions for directed acyclic graphs 
amei  Adaptive Management of Epidemiological Interventions 
Epi  A Package for Statistical Analysis in Epidemiology 
epibasix  Elementary Epidemiological Functions for Epidemiology and Biostatistics 
EpiBayes  Implements Hierarchical Bayesian Models for Epidemiological Applications 
EpiContactTrace  Epidemiological Tool for Contact Tracing 
epiDisplay  Epidemiological Data Display Package 
EpiDynamics  Dynamic Models in Epidemiology 
EpiEstim  EpiEstim: estimate time varying reproduction numbers from epidemic curves 
epifit  Flexible Modelling Functions for Epidemiological Data Analysis 
EpiModel  Mathematical Modeling of Infectious Disease 
epinet  Epidemic/NetworkRelated Tools 
epiR  Tools for the Analysis of Epidemiological Data 
episensr  Basic Sensitivity Analysis of Epidemiological Results 
episheet  Rothman’s Episheet 
epitools  Epidemiology Tools 
EpiWeek  Conversion Between Epidemiological Weeks and Calendar Dates 
1.6.3 How should I use these notes?
The best way to learn R is to use it! Use it as your calculator! Use it as your spreadsheet! Finally read these notes sitting at a computer and use R interactively (this works best sitting in a cafe that brews great coffee and plays good music). Although we initially encourage you to use R interactively by typing expressions at the console, as a general rule, it is much better to type your code as a R script. Save your code with a convenient file name such as job01.R
.^{2}
RStudio comes with a text editor for creating and editing R scripts. Our focus will be learning how to use RStudio to edit and run R scripts.
The code in your text editor can be run in the following ways:  Highlight and run selected expressions in the RStudio;  Copy and paste the code directly into R console;  Run the file in batch mode from the R console using the source
function (e.g., source("job01.R")
).
Operator  Description  Try these examples 

+ 
addition  5+4 
 
subtraction  54 
* 
multiplication  5*4 
/ 
division  5/4 
^ 
exponentiation  5^4 
 
unary minus (change current sign)  5 
abs 
absolute value  abs(23) 
exp 
exponentiation (\(e\) to a power)  exp(8) 
log 
logarithm (default is natural log)  log(exp(8)) 
sqrt 
square root  sqrt(64) 
%/% 
integer divide  10%/%3 
%% 
modulus  10%%3 
%*% 
matrix multiplication  xx <matrix(1:4, 2, 2) 
xx%*%c(1, 1) 

c(1, 1)%*%xx 
1.7 Just do it!
1.7.1 Using R as your calculator
Open R and start using it as our calculator. The most common math operators are displayed in Table 1.2. From now on make R our default calculator! Study the examples and spend a few minutes experimenting with R as a calculator. Use parentheses as needed to group operations. Use the keyboard Uparrow to recall what we previously entered at the command line prompt.
1.7.2 Useful R concepts
1.7.2.1 Types of evaluable expressions
Every expression that is entered at the R console is evaluated by R and returns a value. A literal is the simplist expression that can be evaluated (number, character string, or logical value). Mathematical operations involve numeric literals. For example, R evaluates the expression 4*4
and returns the value 16
. The exception to this is when an evaluable expression is assigned an object name: x < 4*4
. To display the assigned expression, wrap the expression in parentheses: (x < 4*4)
, or type the object name. Finally, evaluable expressions must be separated by either newline breaks or a semicolon. Table 1.3 summarizes evaluable R expressions.
4 * 4
## [1] 16
x < 4 * 4 # assign expression to object named 'x'
x # display 'x' object
## [1] 16
(x < 4 * 4) # evaluates expression and displays 'x'
## [1] 16
x < 4*4; x # expressions can be separated by semicolons
## [1] 16
Expression type  Try these examples 

literal  "hello, I'm John Snow" #character 
3.5 #numeric 

TRUE; FALSE #logical 

math operation  6*7 
assignment  x < 4*4 
data object  x 
function  log(x) 
1.7.2.2 Using the assignment operator
Most calculators have a memory function: the ability to assign a number or numerical result to a key for recalling that number or result at a later time. The same is true in R but it is much more flexible. Any evaluable expression can be assigned a name and recalled at a later time. We refer to these variables as data objects. We use the assignment operator (<
) to name an evaluable expression and save it as a data object.
xx < "hello, what's your name"; xx
## [1] "hello, what's your name"
Multiple assignments work and are read from right to left:
aa < bb < 99
aa; bb
## [1] 99
## [1] 99
Finally, the equal sign (=
) can be used for assignments, although we prefer and the <
symbol.
ages < c(34, 45, 67) # equivalent
ages = c(34, 45, 67) # equivalent
The reason I prefer <
for assigning object names in the workspace is because later we use =
for assigning values to function arguments. For example,
x < 1:10 # assigning object name (x)
sample(x = 1:10, size = 5) # assigning value to argument x
The first x
is an object name assignment in the workspace which persist during the R session. The second x
is a function argument assignment which is only recognized locally in the function and only for the duration of the function execution. For clarity, it is better to keep these types of assignments separate in our mind by using different assignment symbols.
Study these previous examples and spend a few minutes using the assignment operator to create and call data objects. Try to use descriptive names if possible. For example, suppose we have data on age categories; we might name the data agecat
, age.cat
, or age_cat
.^{3}
1.7.3 Useful R functions
When we start R you have opened a workspace. The first time we use R, the workspace is empty. Every time we create a data object, it is in the workspace. If a data object with the same name already exists, the old data object will be overwritten without warning, so be careful. To list the objects in your workspace use the ls
or objects
functions:
ls() # lists objects in the workspace
## [1] "aa" "ages" "bb" "epi.packs" "price" "quantity"
## [7] "subtotal" "x" "xx"
objects() # equivalent
## [1] "aa" "ages" "bb" "epi.packs" "price" "quantity"
## [7] "subtotal" "x" "xx"
Data objects can be saved between sessions. We will be prompted with “Save workspace image?” You can also use save.image()
at the console prompt. The workspace image is saved in a file called .RData
.^{4} Use getwd()
to display the file path to the .RData
file. Table 1.4 has more useful R functions.
Function  Description  Try these examples 

q  Quit R  q() 
ls  List objects  ls() 
objects  objects() #equivalent 

rm  Remove object(s)  yy < 1:5; ls() 
remove  rm(yy); ls() 

#removes all: caution! 

rm(list = ls()); ls() 

help  Open help instructions;  help() 
or get help on specific topic  help(plot) 

?plot #equivalent 

help.search  Search help system given character string  help.search("print") 
help.start  Start help browser  help.start() 
apropos  Displays all objects in the search list matching topic  apropos(plot) 
getwd  Get working directory  getwd() 
setwd  Set working directory  setwd("c:\mywork\rproject") 
args  Display arguments of function  args(sample) 
example  Runs example of a function  example(plot) 
data  Information on available R data sets;  data() #displays data sets 
Load data set  data(Titanic) #load data set 

save.image  Saves current workspace to .RData  save.image() 
1.7.3.1 What are packages?
R has many available functions. When we open R, several packages are attached by default. Each package has its own suite of functions. To display the list of attached packages use the search
function. To display the file paths to the packages use the searchpaths
function.
search() # Linux
## [1] ".GlobalEnv" "package:foreign" "package:survival"
## [4] "ESSR" "package:stats" "package:graphics"
## [7] "package:grDevices" "package:utils" "package:datasets"
## [10] "package:methods" "Autoloads" "package:base"
searchpaths() # Linux
## [1] ".GlobalEnv"
## [2] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library/foreign"
## [3] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library/survival"
## [4] "ESSR"
## [5] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library/stats"
## [6] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library/graphics"
## [7] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library/grDevices"
## [8] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library/utils"
## [9] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library/datasets"
## [10] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library/methods"
## [11] "Autoloads"
## [12] "/Library/Frameworks/R.framework/Resources/library/base"
To learn more about a specific package enter library(help=
“Package_Name”)
. Alternatively, we can get more detailed information by entering help.start()
which opens the HTML help page. On this page click on the Packages link to see the available packages. If we need to load a package enter library(
Package_Name)
. For example, when we cover survival analysis we will need to load the survival
package.
1.7.3.2 What are function arguments?
We will be using many R functions for data analysis, so we need to know some function basics. Suppose we are interested in taking a random sample of days from the month of June, which has 30 days. We want to use the sample
function but we forgot the syntax. Let’s explore:
sample
## function (x, size, replace = FALSE, prob = NULL)
## {
## if (length(x) == 1L && is.numeric(x) && is.finite(x) && x >=
## 1) {
## if (missing(size))
## size < x
## sample.int(x, size, replace, prob)
## }
## else {
## if (missing(size))
## size < length(x)
## x[sample.int(length(x), size, replace, prob)]
## }
## }
## <bytecode: 0x7feea69b0c30>
## <environment: namespace:base>
Whoa! What happened? Whenever we type the function name without any parentheses it usually returns the whole function code. This is useful when we start programming and we need to alter an existing function, borrow code for our own functions, or study the code for learning how to program. If we are already familiar with the sample
function we may only need to see the syntax of the function arguments. For this we use the args
function:
args(sample)
## function (x, size, replace = FALSE, prob = NULL)
## NULL
The terms x
, size
, replace
, and prob
are the function arguments. First, notice that replace
and prob
have default values; that is, we do not need to specify these arguments unless we want to override the default values. Second, notice the order of the arguments. If you enter the argument values in the same order as the argument list we do not need to specify the argument.
dates < 1:30
sample(dates, 16) # sample "size = 16"
## [1] 7 20 9 30 22 14 27 15 5 21 24 18 26 3 11 4
Third, if we enter the arguments out of order then we will get either an error message or an undesired result. Arguments entered out of their default order need to be specified.
sample(16, dates) # undesired results; wanted "size = 16"
## [1] 2
sample(size = 16, x = dates) # gives desired result
## [1] 10 26 14 16 12 5 24 4 15 28 3 20 9 29 21 1
Fourth, when we specify an argument we only need to type a sufficient number of letters so that R can uniquely identify it from the other arguments.
sample(s = 16, x = dates, r = TRUE) # sampling w/ replacement
## [1] 15 20 11 22 23 22 27 4 18 21 21 2 17 18 13 16
Fifth, argument values can be any valid R expression (including functions) that evaluates to an appropriate value. In the following example we see two sample functions that provide random values to the sample
function arguments.<
sample(s = sample(1:36, 1), x = sample(1:10, 5), r=T)
## [1] 6 7 7 7 7 8 1 1 6 8 6 9 9 8 1 8 8 6 1 9 8 9 8 9 9 1 6 9 1
Finally, if we need more guidance on how to use the sample
function enter ?sample
or help(sample)
.
1.7.4 How do I get help?
RStudio has extensive help capabilities. From the RStudio main menu select Help
\(\rightarrow\) R Help
to get you started. The Frequently Asked Questions (FAQ) and R manuals are available from this menu. From the R console, try the following options to learn about the help capabilities:
?help # opens help page for 'help' function
help.start() # launches HTML help page
help.search("help") # searches help system for "help"
apropos("help") # displays 'help' objects in search list
To learn about about available data sets use the data
function:
data() # display avail. data sets
try(data(package = "survival")) # list 'survival' data sets
help(pbc, package = "survival") # display pbc data help page
1.7.5 Is there anything else that I need?
Not really. RStudio has everything you will need to use R productively. Some analysts will select to use R with a text editor, rather than RStudio. Like RStudio, a good text editor makes programming and data processing easier and more efficient. If you are considering a text editor, the functionality we look for in a text editor are the following:
 Toggle between wrapped and unwrapped text
 Block cutting and pasting (also called column editing)
 Easy macro programming
 Search and replace using regular expressions
 Ability to import large datasets for editing
When we are programming we want our text to wrap so we can read all of your code. When we import a data set that is wider than the screen, we do not want the data set to wrap: we want it to appear in its tabular format. Column editing allows us to cut and paste columns of text at will. A macro is just a way for the text editor to learn a set of keystrokes (including search and replace) that can be executed as needed. Searching using regular expressions means searching for text based on relative attributes. For example, suppose you want to find all words that begin with “b”, end with “g”, have any number of letters in between but not “r” and “f”. Regular expression searching makes this a trivial task. These are powerful features that once we use regularly, we will wonder how we ever got along without them.
If we do not want to install a text editing program then we can use the default text editor that comes with our computer operating system (gedit in Ubuntu Linux, TextEdit in Mac OS, Notepad in Windows). However, it is much better to install a text editor that works with R. My favorite text editor is the free and open source GNU Emacs. GNU Emacs can be extended with the “Emacs Speaks Statistics” (ESS) package. For more information on Emacs and ESS preinstalled for Windows and Mac OS, visit http://ess.rproject.org.
1.7.6 What’s ahead?
To the novice user, R may seem complicated and difficult to learn. In fact, for its immense power and versatility, R is easier to learn and deploy compared to other statistical software (e.g. SAS, Stata, SPSS). This is because R was built from the ground up to be an efficient and intuitive programming language and environment. If you understand the logic and structure of R, then learning proceeds quickly. Just like a spoken language, once you know its rules of grammar, syntax, and pronunciation, and can write legible sentences, you can figure out how to communicate almost anything. Before we get into the “trees” (next chapter), we want to describe the “forest”: the logic and structure of working with R objects and epidemiologic data.
1.7.6.1 Working with R objects
For our purposes, there are only five types of data objects in R^{5} and five types of actions we take on these objects (Table 1.5). That’s it! No more, no less. You will learn to create, name, index (subset), replace components of, and operate on these data objects using a systematic, comprehensive approach. As you learn about each new data object type, it will reinforce and extend what you learned previously.
Action  Vector  Matrix  Array  List  Data frame 

Creating  Table 2.5  Table 2.12  Table 2.20  Table 3.1  Table 3.7 
Naming  Table 2.6  Table 2.13  Table 2.21  Table 3.2  Table 3.8 
Indexing  Table 2.7  Table 2.14  Table 2.22  Table 3.3  Table 3.9 
Replacing  Table 2.8  Table 2.15  Table 2.23  Table 3.4  Table 3.10 
Operating on  Table 2.9  Table 2.16  Table 2.24  Table 3.5  Table 3.11 
Table 2.10 
A vector
is a collection of elements (often numbers):
x < c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12); x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
A matrix
is a 2dimensional representaton of a vector:
y < matrix(x, nrow = 2); y
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 3 5 7 9 11
## [2,] 2 4 6 8 10 12
An array
is a 3 or more dimensional represention of a vector:
z < array(x, dim = c(2, 3, 2)); z
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
A list
is a collection of “bins”, each containing any kind of R object:
mylist < list(x, y, z); mylist
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
##
## [[2]]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 3 5 7 9 11
## [2,] 2 4 6 8 10 12
##
## [[3]]
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
A data frame
is a list in tabular form where each “bin” contains a data vector of the same length. A data frame is the usual tabular data set familiar to epidemiologists. Each row is an record and each column (“bin”) is a field.
kids < c("Tomasito", "Lusito", "Angelita")
gender < c("boy", "boy", "girl")
age < c(8, 7, 4)
mydf < data.frame(kids, gender, age)
mydf
## kids gender age
## 1 Tomasito boy 8
## 2 Lusito boy 7
## 3 Angelita girl 4
In the next chapter we explore these R data objects in greater detail.
1.8 Exercises
If you have not done so already,
 install R on your computer (https://cran.rstudio.com/),
 install RStudio on your computer (https://www.rstudio.com/), and
 register for a RPubs account (http://www.rpubs.com/), and open RStudio.
 Open a new Rmarkdown template file (.Rmd extension).
 Learn Rmarkdown and use it to answer the exercises in this chapter.
 Submit the exercises as a HTML link, Word document, or PDF document.
1.8.1
What is the R workspace file on your operating systems? What is the file path to your R workspace file? What is the name of this workspace file?
1.8.2
By default, which R packages come already loaded? What are the file paths to the default R packages?
1.8.3
List all the object in the current workspace. If there are none, create some data objects. Using one expression, remove all the objects in the current workspace.
1.8.4
One inch equals 2.54 centimeters. Correct the following R code and create a conversion table.
inches < 1:12
centimeters < inches/2.54
cbind(inches, centimeters)
1.8.5
To convert between temperatures in degrees Celsius (\(C\)) and Farenheit (\(F\)), we use the following conversion formulas:
\[ C = (F  32) \frac{5}{9} \]
\[ F = \frac{9}{5} C + 32 \]
At standard temperature and pressure, the freezing and boiling points of water are 0 and 100 degrees Celsius, respectively. What are the freezing and boiling points of water in degrees Fahrenheit?
1.8.6
For the Celsius temperatures 0, 5, 10, 15, 20, 25, …, 80, 85, 90, 95, 100, construct a conversion table that displays the corresponding Fahrenheit temperatures. Hint: to create the sequence of Celsius temperatures use seq(0, 100, 5)
.
1.8.7
BMI is a reliable indicator of total body fat, which is related to the risk of disease and death. The score is valid for both men and women but it does have some limits. BMI does have some limitations: it may overestimate body fat in athletes and others who have a muscular build, it may underestimate body fat in older persons and others who have lost muscle mass.
BMI  Classification 

\(<18.5\)  Underweight 
\([18.5, 25)\)  Normal weight 
\([25, 30)\)  Overweight 
\(\ge 30\)  Obesity 
Body Mass Index (BMI) is calculated from your weight in kilograms and height in meters:
\[ BMI = \frac{kg}{m^2} \]
\[ 1\,\mbox{kg} \approx 2.2\,\mbox{lb} \]
\[ 1\,\mbox{m} \approx 3.3\,\mbox{ft} \]
Calculate your BMI (don’t report it to us).
Using Table 1.2, explain in words, and use R to illustrate, the difference between modulus and integer divide.
1.8.8
In mathematics, a logarithm (to base \(b\)) of a number \(x\) is written \(\log_b(x)\) and equals the exponent \(y\) that satisfies \(x = b^y\). In other words,
\[ y = \log_b(x) \] is equivalent to
\[ x = b^y \]
In R, the log
function is to the base \(e\). Implement the following R code and study the graph:
curve(log(x), 0, 6)
abline(v = c(1, exp(1)), h = c(0, 1), lty = 2)
What kind of generalizations can you make about the natural logarithm and its base—the number \(e\)?
1.8.9
Risk (\(R\)) is a probability bounded between 0 and 1. Odds is the following transformation of \(R\):
\[ Odds = \frac{R}{1R} \]
Use the following code to plot the odds:
curve(x/(1x), 0, 1)
Now, use the following code to plot the \(\log\)(odds):
curve(log(x/(1x)), 0, 1)
What kind of generalizations can you make about the \(\log\)(odds) as a transformation of risk?
Exposure route  Risk per 10,000 exposures 

Blood transfusion (BT)  9,000 
Needlesharing injectiondrug use (IDU)  67 
Receptive anal intercourse (RAI)  50 
Percutaneous needle stick (PNS)  30 
Receptive penilevaginal intercourse (RPVI)  10 
Insertive anal intercourse (IAI)  6.5 
Insertive penilevaginal intercourse (IPVI)  5 
Receptive oral intercourse on penis (ROI)  1 
Insertive oral intercourse with penis (IOI)  0.5 
Use the data in Table 1.7. Assume one is HIVnegative. If the probability of infection per act is \(p\), then the probability of not getting infected per act is \((1p)\). The probability of not getting infected after 2 consecutive acts is \((1p)^2\), and after 3 consecutive acts is \((1p)^3\). Therefore, the probability of not getting infected infected after \(n\) consecutive acts is \((1p)^n\), and the probability of getting infected after \(n\) consecutive acts is \(1(1p)^n\). For each nonblood transfusion transmission probability (per act risk) in Table 1.7, calculate the cumulative risk of being infected after one year (365 days) if one carries out the same act once daily for one year with an HIVinfected partner. Do these cumulative risks make intuitive sense? Why or why not?
1.8.10
The source
function in R is used to “source” (read in) ASCII text files. Take a group of R commands that worked from a previous problem above and paste them into an ASCII text file and save it with the name job01.R
. Then from R command line, source the file. Here is how it looked on my Linux computer running R:
> source("~/Documents/courses/ph251d/jobs/job01.R")
Describe what happened. Now, set option to .
> source("~/Documents/courses/ph251d/jobs/job01.R", echo = TRUE)
Describe what happened. To improve your understanding read the help file on the source
function.
1.8.11
Now run the source
again (without and with echo = TRUE
) but each time create a log file using the sink
function. Create two log files: job01.log1a
and job01.log1b
.
> sink("~/Documents/courses/ph251d/jobs/job01.log1a")
> source("~/Documents/courses/ph251d/jobs/job01.R")
> sink() #closes connection
>
> sink("~/Documents/courses/ph251d/jobs/job01.log1b")
> source("~/Documents/courses/ph251d/jobs/job01.R", echo = TRUE)
> sink() #closes connection
Examine the log files and describe what happened.
1.8.12
Create a new job file () with the following code:
n < 365
per.act.risk < c(0.5, 1, 5, 6.5, 10, 30, 50, 67)/10000
risks < 1(1per.act.risk)^n
show(risks)
Source this file at the R command line and describes what happens.
Recommendations for mostly free and open source software (FOSS) at http://phds.io.↩
The .R extension, although not necessary, is useful when searching for R command files. Additionally, this file extension is recognized by RStudio and many text editors.↩
To improve readability, a period (
.
) or underscore (_
) symbol can be used in your object name↩In some operating systems files names that begin with a period (.) are hidden files and are not displayed by default. You may need to change the viewing option to see the file.↩
The sixth type of R object is a function. Functions can create, manipulate, operate on, and store data; however, we will use functions primarily to execute a series of R “commands” and not as primary data objects.↩