6 Background

6.1 What is \(\mathcal{R}\)?



Authors of research articles in scientific journals now appear to overwhelmingly employ \(\mathcal{R}\) for executing and displaying published statistical results. Joseph M. Hilbe, Journal of Statistical Software, Sept 2010


This manual is equal parts review and hands-on computer lab. For the latter, we will be working with the statistical package \(\mathcal{R}\). Depending on your research supervisor or lab environment, you may eventually prefer to adopt a different package, such as Stata, SAS, JMP, or SPSS. Regardless of your ultimate choice, you cannot avoid the steep learning curve associated with each package. To make the most of the investment in time and energy, we would urge you to select a system that you can continue to work with beyond your training.

  • \(\mathcal{R}\) is an open-source (free) software environment for statistical computing and graphics. It is both a statistical package and an interactive programming language with a heavy emphasis on statistics and data visualization
  • \(\mathcal{R}\) is the open-source version of what used to be called \(\mathcal{S}\), developed by John Chambers, Richard Becker, and Allan Wilks at Bell Labs in the 1980s, where it won many professional awards. In 1991, Ross Ihaka and Robert Gentleman in Auckland re-implemented the entire language, which they called \(\mathcal{R}\) (a pun, based on their names!). At the time, I was paying $5000/year for an annual \(\mathcal{S}\) site license at McGill for a commercial version of \(\mathcal{S}\)Plus.
  • In 1995, Ikaha and Gentleman made \(\mathcal{R}\) open source under the GNU Public License, similar to Linux. It is currently maintained by the ‘R Core Team’, which consists of 19 international developers including John Chambers.`



is

  • Free (open-source) and well-documented, with 400 textbooks describing either core \(\mathcal{R}\) or specialized library packages
  • Multiplatform, with pre-compiled binaries for MacOSX, Linux, Windows
  • Interactive
  • Comprehensive, with more than 50,000 functions, thousands of add-on packages, and the development environment of choice for new algorithms in the statistical literature.

We will be concentrating on the command line version of \(\mathcal{R}\), where you type commands into the command window. There are also open-source (free), point-and-click, graphical user interfaces (GUIs) for \(\mathcal{R}\):

RCommander was created by John Fox at McMaster University and is used to teach introductory statistics at Cambridge’s Sanger Institute.

RKWard is a newer point-and-click interface to the R environment, also avaiable for MacOSX, Windows, and Linux.

While most of us are familiar with point-and-click GUIs, they will probably never replace text commands because:

  • They rarely provide full access to all the commands and options.
  • They are repetitive and inefficient, particularly when re-applying an analysis to a revised dataset. While this may seem counter-intuitive, many of your predecessors tell us that the biggest mistake they made during their scholarly projects was to assume that a point-and-click interface would be simpler, which was not the case when they inevitably had to re-do their analyses multiple times as observations were added/ removed/ revised.
  • Without a documentary record of the options selected as you wade through multiple layers of menus and dialogues, it may be difficult to reproduce an analysis, which is both frustrating and unscientific.
  • Unless you truly understand what’s going on under the hood - e.g. data types, calendar arithmetic, analysis options - the default settings will eventually lead you astray.
  • It’s not that hard to learn a few basic commands, as we will do in our lab sessions. Free free to copy and paste our examples for your own work. There are also numerous on-line resources:

6.2 Resources

  • The R Book 2013, by Michael Crawley: Available at NJM as an eBook, this is encylopedic text with more than 1000 pages and 29 focused chapters. You don’t need to read it from cover-to-cover, but the examples are ready to cut and paste or revise for your own projects.
  • Quick-R: A comprehensives on-line teaching resource by Rob Kabacoff. Light on theory, but rich with cut and paste examples.
  • Introductory Statistics with R by Peter Dalgaard, 2008: An NJM eBook from one of the creators.
  • On-line User Forums: RSeek or University of Pennsylvania. If your question isn’t in the archive, someone will usually answer within minutes! As a courtesy, please check the built-in help and search the archive before asking others.

If you want to cite \(\mathcal{R}\), just type the following command and hit [Return]

citation()

R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. `

Ditto to examine the installed package list:

library()

6.3 Software Installation

Prior to our first meeing, you should install \(\mathcal{R}\), which is available from the Comprehensive R Archive Network (CRAN).

From the CRAN site,

  • click Download for Windows, MacOSX, or Linux operating systems

  • for Windows, choose R base. For MacOSX, choose R-3.4.1.pkg (the latest), which will run on anything more recent that OS 10.11 (El Capitan). If you’re, running an older operating system choose the appropriate installer.

  • Double-click the downloaded installer program and choose default answers for all questions.

You should also install RStudio, an enhanced command editor for Mac, Windows, and Linux. As long as you’ve installed \(\mathcal{R}\) first, RStudio will automatically find the \(\mathcal{R}\) installation.

From the RStudio website

  • Choose RStudio and select open-source Desktop version, which is free. As you can see, there are professional and server versions, which are commercial products. Don’t download them by mistake, as they’re quite expensive. Unless you have $29k to spare, in which case I’ve always wanted a pro server with RSConnect! .

  • For Windows, double-click the downloaded installer program (.exe file). Choose default answers for all questions.

  • For MacOSX, double-click to open the disk image (.dmg) file and drag the RStudio application to your Applications folder. Make sure you put an alias to the application on your Dock for later access.

With an active internet connection, you can install additional library packages via the RStudio Tools:Install Packages menu menu, the RStudio Packages tab or the install.packages() command. For this course, you will need some additional packages. Simply copy and paste this line into the RStudio command window and hit [Return].

install.packages(c("car","Hmisc","pROC","effects","gmodels","psych", "readxl", "EnvStats"))

The course website includes a link to the ancillary materials needed for our workshop sessions. You should take a minute to download the code and data sets before each session. Your practice data sets are the form of comma separated variable or .csv spreadsheets. The example code is in the form of .r text files. For convenience, you may want to assign the .r file type to automatically open RStudio when double-clicked. Just follow the usual routine for Windows or MacOSX. At the same time, you may want to ensure that the .csv spreadsheets open with your favorite spreadsheet application e.g. Numbers, Excel, or Libre Office.

Please contact me if you have questions at . We will also be available prior to our first lab to troubleshoot installation problems.

A more detailed installation guide for both \(\mathcal{R}\) and RStudio can be found in A (very) short introduction to R by Torfs and Brauer.