haven
, readxl
, writexl
, RSQLite
wants <- c("haven", "readxl", "writexl", "RSQLite")
has <- wants %in% rownames(installed.packages())
if(any(!has)) install.packages(wants[!has])
If reading from a file, check with getwd()
that you are in the correct directory - or specify full file path. If the result should be a data frame, use read.table()
.
## write text file first
myDf <- data.frame(IV=factor(rep(c("A", "B"), 5)),
DV=rnorm(10))
write.table(myDf, file="data.txt", row.names=FALSE)
## read this text file
myDf_read <- read.table("data.txt", header=TRUE)
str(myDf_read)
## not shown
Some relevant options for read.table()
read.table(file="data.txt", header=TRUE)
read.table(file="data.txt", header=FALSE)
read.table(file="data.txt", sep="\t")
read.table(file="data.txt", na.strings="9999")
read.table(file="data.txt", stringsAsFactors=TRUE)
# not shown
Some relevant options for write.table()
write.table(file="data.txt", row.names=FALSE, col.names=TRUE)
write.table(file="data.txt", sep="\t")
# not shown
To read comma-separated-value files, use read.csv()
, for fixed-with-files read.fwf()
. To read a raw (unstructured) text file, use readLines()
. The result is a character vector with one element per line.
The described R functions can also read data from a character string.
X Y Z
1 123 A B C 34.8
2 99 D E F 3543.2
Other sources can be standard-input (if R is used in batch mode via Rscript
), the clipboard, or directly from an online address.
read.table(file=stdin(), header=TRUE)
## Windows: file="clipboard", MacOS: file=pipe("pbpaste")
read.table(file="clipboard", header=TRUE, sep="\t"))
read.table(file="http://dwoll.de/r/data.txt", header=TRUE)
# not shown
Save multiple objects together with their respective symbol (=name). Object names do not need to be assigned when reading data.
myDf <- data.frame(IV=factor(rep(c("A", "B"), 5)), DV=rnorm(10))
save(myDf, file="data.RData")
load("data.RData")
# not shown
Save objects separately without their symbol (=name) -> preferred. Object name needs to be assigned when reading data.
myDf <- data.frame(IV=factor(rep(c("A", "B"), 5)), DV=rnorm(10))
saveRDS(myDf, file="data.rda")
myDf_read <- readRDS("data.rda")
# not shown
Read data from the console with scan()
. Lines are terminated by hitting the Return
key, scan()
quits when Return
is hit on a blank line.
One option is to use text files (tab-separated or comma-separated) as described above to exchange data with other statistics software packages.
Another option to exchange data with SPSS, SAS and Stata (among others) is the haven
package. Example for SPSS:
library(haven)
write_sav(myDf, path="myDf.sav")
myDf_sav <- read_sav("myDf.sav")
head(myDf_sav, n=4)
str(myDf_sav)
## not shown
SPSS has the concept of “variable labels” and “value labels” which do not directly map to R concepts. Variables read from SPSS data files therefore have special labels and are of class haven_labelled
. Categorical variables can be converted to regular R factors using as_factor()
.
When writing data to an SPSS file, labels can be set manually, and the class can be changed to haven_labelled
.
myDf <- data.frame(IV=factor(rep(c("A", "B"), 5)),
DV=rnorm(10),
intVar=sample(1:3, 10, replace=TRUE))
attr(myDf$IV, "label") <- "Grouping factor"
attr(myDf$DV, "label") <- "Measurement values"
class(myDf$intVar) <- "haven_labelled"
attr(myDf$intVar, "labels") <- c("Val1"=1, "Val2"=2, "Val3"=3)
library(haven)
write_sav(myDf, "myDf.sav")
If you have SPSS available, install the “Integration plug-in for R” add-in. This allows you to run R within SPSS. The plug-in includes an R package with functions that transfer the active SPSS data frame to R (and back) - including labeled factor levels, dates and German umlauts. Once installed, you can use it like this in the SPSS syntax window:
BEGIN PROGRAM R.
# from here on, you can use R syntax
myDf <- spssdata.GetDataFromSPSS(missingValueToNA=TRUE,
factorMode="labels",
rDate="POSIXct")
save(myDf, file="d:/path/to/your/myDf.Rdata")
END PROGRAM.
One option is to use text files (tab-separated or comma-separated) as described above to exchange data with spreadsheet applications.
library(writexl)
write_xlsx(myDf, path="myDf.xlsx")
library(readxl)
myDf_read <- read_xlsx("myDf.xlsx")
When date columns are not correctly formatted in Excel, they sometimes get imported as integer numbers. For dates, this is the number of days since 1899-12-30. For date-times, it’s the number of seconds since 1899-12-30. To convert these numbers to dates or date-times, package openxlsx
has functions convertToDate()
and convertToDateTime()
.
There are R packages that provide an interface to all common database types. Using databases is described in more detail in:
SQL commands are described here and here.
RSQLite
Simulate data
IQ <- rnorm(2*10, mean=100, sd=15)
rating <- sample(LETTERS[1:3], 2*50, replace=TRUE)
sex <- factor(rep(c("f", "m"), times=50))
myDf <- data.frame(sex, IQ, rating, stringsAsFactors=FALSE)
Save data frame in SQLite database. This is usually a file. In this example, the file is created in memory only. Use dbConnect(<driver object>, dbname="file_name.db")
to create a file on disk.
library("RSQLite")
drv <- dbDriver("SQLite")
con <- dbConnect(drv, dbname=":memory:")
dbWriteTable(con, name="MyDataFrame", value=myDf, row.names=FALSE)
Find out which tables are present, and which fields are in a specific table.
[1] "MyDataFrame"
[1] "sex" "IQ" "rating"
Read complete table, then send SQL-query.
sex IQ rating
1 f 77.32935 B
2 m 86.09951 A
3 f 97.07726 C
4 m 115.73843 C
sex mIQ sIQ
1 f 98.73237 4936.618
2 m 98.63117 4931.559
Query database and read results in smaller partial chunks. Useful for large queries.
res <- dbSendQuery(con, "SELECT IQ, rating FROM MyDataFrame WHERE rating = 'A'")
while(!dbHasCompleted(res)) {
partial <- dbFetch(res, n=4)
print(partial)
}
IQ rating
1 86.09951 A
2 97.30923 A
3 109.71035 A
4 115.46680 A
IQ rating
1 99.28235 A
2 89.13792 A
3 86.09951 A
4 115.73843 A
IQ rating
1 109.71035 A
2 114.90525 A
3 115.46680 A
4 99.28235 A
IQ rating
1 93.28827 A
2 97.30923 A
3 78.64821 A
4 109.71035 A
IQ rating
1 115.46680 A
2 100.00818 A
3 101.47441 A
4 97.07726 A
IQ rating
1 113.28772 A
2 115.46680 A
3 93.28827 A
4 88.56834 A
IQ rating
1 116.61545 A
2 77.32935 A
3 97.07726 A
4 115.73843 A
IQ rating
1 83.76086 A
2 78.64821 A
3 109.71035 A
4 114.90525 A
IQ rating
1 101.4744 A
Clean query, remove the created table, and close the database connection.
data.table
provides function fread()
for high performance reading of large plain text data files.try(detach(package:haven))
try(detach(package:readxl))
try(detach(package:writexl))
try(detach(package:RSQLite))
try(detach(package:DBI))
R markdown - markdown - R code - all posts