Data import and export

Install required packages

foreign, RODBC

wants <- c("foreign", "RODBC")
has   <- wants %in% rownames(installed.packages())
if(any(!has)) install.packages(wants[!has])

Use R for data entry

Input data with the keyboard

myVar <- c(4, 19, 22)

Read data from the console with scan(). Lines are terminated by hitting the Return key, scan() quits when Return is hit on a blank line.

vec     <- scan()
charVec <- scan(what="character")
# not shown

R's own graphical data editor

myDf <- data.frame(IV=factor(rep(c("A", "B"), 5)), DV=rnorm(10))
myDf <- edit(myDf)
# not shown

Create an empty data frame

newDf <- edit(data.frame())
# not shown

Save R data to file

Create a logfile for commands and output

sink("d:/daniel/logfile.txt", split=TRUE)
# not shown

Save and read R objects in text format

dump("myDf", file="dumpMyDf.txt")
# not shown

Save data frame to a text file

myDf <- data.frame(IV=factor(rep(c("A", "B"), 5)), DV=rnorm(10))
write.table(myDf, file="data.txt", row.names=FALSE)
# not shown

Read in data in text format

Read text data from a local file

If reading from a file, check with getwd() that you are in the correct directory - or specify full file path.

To read a raw text file, use readLines(). The result is a character vector with one element per line.

# not shown

If the result should be a data frame, use read.table().

read.table(file="data.txt", header=TRUE)
read.table(file="data.txt", header=FALSE)
read.table(file="data.txt", sep="\t")
read.table(file="data.txt", stringsAsFactors=FALSE)
# not shown

To read comma-separated-value files, use read.csv(), for fixed-with-files read.fwf().

Read text data from other sources

The described R functions can also read data from standard-input (if R is used in batch mode via Rscript.exe), from the clipboard, or directly from an online source.

read.table(file=stdin(), header=TRUE)
read.table(file="clipboard", header=TRUE))
read.table(file="", header=TRUE)
# not shown

Read and write data in R binary format

myDf <- data.frame(IV=factor(rep(c("A", "B"), 5)), DV=rnorm(10))
save(myDf, file="data.RData")
# not shown

Exchange data with other statistics software and spreadsheets

Exchange data with SPSS, SAS and Stata

One option is to use text files (tab-separated or comma-separated) as described above to exchange data with other statistics software packages.

Another option to exchange data with SPSS, SAS and Stata (among others) is the foreign package. Example for SPSS:

read.spss(file="data.sav", use.value.labels=TRUE,,
# not shown
write.foreign(df=myDf, datafile="d:/daniel/dataGoesHere.dat",
              codefile="d:/daniel/syntaxGoesHere.sps", package="SPSS")
# not shown

To read these files with SPSS, you may have to modify the created .sps syntax file: First write down the full path to the data file in the first line because SPSS' current working directory is probably not where that file is located. You may also have to make SPSS recognize the . as a decimal point if it's a german SPSS installation.

SET LOCALE='English'.

To set SPSS back to using a , as a decimal point:

SET LOCALE='German'.

Exchanging data with SAS and Stata works the same way: Package foreign provides functions read.xport() for reading files in SAS XPORT format, read.dta() and write.dta() read and write Stata files, respectively.

Use SPSS essentials for R

If you have SPSS available, install the "Essentials for R" add-in (instructions pdf). This allows you to run R within SPSS. The add-in includes an R package with functions that transfer the active SPSS data frame to R (and back) - including labeled factor levels, dates and German umlauts. Once installed, you can use it like this in the SPSS syntax window:

# from here on, you can use R syntax
myDf <- spssdata.GetDataFromSPSS(missingValueToNA=TRUE,
save(myDf, file="d:/path/to/your/myDf.Rdata")

Exchange data with Excel

One option is to use text files (tab-separated or comma-separated) as described above to exchange data with spreadsheet applications.

To read and write Excel files directly, use package XLConnect.

Read and write data from a database

Excel files can also be treated as a database with the RODBC package. One can then use standard SQL commands like query and fetch to select data.

xlsCon <- odbcConnectExcel2007("data.xls", readOnly=FALSE)
(myDfXls <- sqlFetch(xlsCon, "sheet1"))
sqlQuery(xlsCon, "select IV, DV from [sheet1$] order by IV")
sqlQuery(xlsCon, "select * from [sheet1$] where IV = 'A' AND DV < 10")
myDfXls$newDV <- rnorm(nrow(myDfXls))
sqlSave(xlsCon, myDfXls, tablename="newSheet")
# not shown

There are R packages that provide an interface to all common database types, e.g. to MySQL, Oracle or SQLite.

Useful documents

  • R Data Import/Export
  • Muenchen, R. A. (2011). R for SAS and SPSS Users (2nd ed.). New York, NY: Springer. URL
  • Muenchen, R. A. & Hilbe, J. M. (2010). R for Stata Users. New York, NY: Springer. URL

Useful packages

Scrape HTML pages directly with XML (e.g., readHTMLTable()).

Detach (automatically) loaded packages (if possible)


Get the article source from GitHub

R markdown - markdown - R code - all posts