c("foreign", "RODBC", "RSQLite")
wants <- wants %in% rownames(installed.packages())
has <-if(any(!has)) install.packages(wants[!has])
The downloaded source packages are in
'/tmp/RtmpNLA8Wb/downloaded_packages'
c(4, 19, 22) myVar <-
Read data from the console with scan()
. Lines are terminated by hitting the Return
key, scan()
quits when Return
is hit on a blank line.
scan()
vec <- scan(what="character")
charVec <-# not shown
data.frame(IV=factor(rep(c("A", "B"), 5)), DV=rnorm(10))
myDf <- edit(myDf)
myDf <-fix(myDf)
# not shown
Create an empty data frame
edit(data.frame())
newDf <-# not shown
sink("d:/daniel/logfile.txt", split=TRUE)
# not shown
dump("myDf", file="dumpMyDf.txt")
source("dumpMyDf.txt")
# not shown
data.frame(IV=factor(rep(c("A", "B"), 5)), DV=rnorm(10))
myDf <-write.table(myDf, file="data.txt", row.names=FALSE)
# not shown
If reading from a file, check with getwd()
that you are in the correct directory - or specify full file path.
To read a raw text file, use readLines()
. The result is a character vector with one element per line.
readLines(file="data.txt")
# not shown
If the result should be a data frame, use read.table()
.
read.table(file="data.txt", header=TRUE)
read.table(file="data.txt", header=FALSE)
read.table(file="data.txt", sep="\t")
read.table(file="data.txt", stringsAsFactors=FALSE)
# not shown
To read comma-separated-value files, use read.csv()
, for fixed-with-files read.fwf()
.
The described R functions can also read data from standard-input (if R is used in batch mode via Rscript.exe
), from the clipboard, or directly from an online source.
read.table(file=stdin(), header=TRUE)
read.table(file="clipboard", header=TRUE))
read.table(file="http://www.uni-kiel.de/psychologie/dwoll/data.txt", header=TRUE)
# not shown
data.frame(IV=factor(rep(c("A", "B"), 5)), DV=rnorm(10))
myDf <-save(myDf, file="data.RData")
load("data.RData")
# not shown
One option is to use text files (tab-separated or comma-separated) as described above to exchange data with other statistics software packages.
Another option to exchange data with SPSS, SAS and Stata (among others) is the foreign
package. Example for SPSS:
library(foreign)
read.spss(file="data.sav", use.value.labels=TRUE, to.data.frame=FALSE,
trim.factor.names=FALSE)
# not shown
write.foreign(df=myDf, datafile="d:/daniel/dataGoesHere.dat",
codefile="d:/daniel/syntaxGoesHere.sps", package="SPSS")
# not shown
To read these files with SPSS, you may have to modify the created .sps
syntax file: First write down the full path to the data file in the first line because SPSS’ current working directory is probably not where that file is located. You may also have to make SPSS recognize the .
as a decimal point if it’s a german SPSS installation.
SET LOCALE='English'.
To set SPSS back to using a ,
as a decimal point:
SET LOCALE='German'.
Exchanging data with SAS and Stata works the same way: Package foreign
provides functions read.xport()
for reading files in SAS XPORT format, read.dta()
and write.dta()
read and write Stata files, respectively.
If you have SPSS available, install the “Essentials for R” add-in (instructions pdf). This allows you to run R within SPSS. The add-in includes an R package with functions that transfer the active SPSS data frame to R (and back) - including labeled factor levels, dates and German umlauts. Once installed, you can use it like this in the SPSS syntax window:
BEGIN PROGRAM R.
# from here on, you can use R syntax
myDf <- spssdata.GetDataFromSPSS(missingValueToNA=TRUE,
factorMode="labels",
rDate="POSIXct")
save(myDf, file="d:/path/to/your/myDf.Rdata")
END PROGRAM.
One option is to use text files (tab-separated or comma-separated) as described above to exchange data with spreadsheet applications.
To read and write Excel files directly, use package XLConnect
.
There are R packages that provide an interface to all common database types. Using databases is described in more detail in:
RODBC
Excel files can also be treated as a database with the RODBC
package. First, you have to register a data source name (DSN) for the file under your operating system. One can then use standard SQL commands like SELECT
to select data.
# data.xls is the registered DSN
library(RODBC)
odbcConnectExcel2007("data.xls", readOnly=FALSE)
xlsCon <-odbcGetInfo(xlsCon)
sqlTables(xlsCon)
sqlFetch(xlsCon, "sheet1"))
(myDfXls <-sqlQuery(xlsCon, "SELECT IV, DV FROM [sheet1$] ORDER BY IV")
sqlQuery(xlsCon, "SELECT * FROM [sheet1$] where (IV = 'A') AND (DV < 10)")
$newDV <- rnorm(nrow(myDfXls))
myDfXlssqlSave(xlsCon, myDfXls, tablename="newSheet")
odbcClose(xlsCon)
# not shown
RSQLite
Simulate data first.
rnorm(2*10, mean=100, sd=15)
IQ <- sample(LETTERS[1:3], 2*50, replace=TRUE)
rating <- factor(rep(c("f", "m"), times=50))
sex <- data.frame(sex, IQ, rating, stringsAsFactors=FALSE) myDf <-
Save data frame in SQLite database. This is usually a file. In this example, the file is created in memory only. Use dbConnect(<driver object>, dbname="file_name.db")
to create a file on disk.
library("RSQLite")
dbDriver("SQLite")
drv <- dbConnect(drv, dbname=":memory:")
con <-dbWriteTable(con, name="MyDataFrame", value=myDf, row.names=FALSE)
[1] TRUE
Find out which tables are present, and which fields are in a specific table.
dbListTables(con)
[1] "MyDataFrame"
dbListFields(con, "MyDataFrame")
[1] "sex" "IQ" "rating"
Read complete table, then send SQL-query.
dbReadTable(con, "MyDataFrame")
out <-head(out, n=4)
sex IQ rating
1 f 118.44900 A
2 m 84.39989 A
3 f 116.40531 A
4 m 119.94145 B
dbGetQuery(con, "SELECT sex, AVG(IQ) AS mIQ, SUM(IQ) AS sIQ FROM MyDataFrame GROUP BY sex")
sex mIQ sIQ
1 f 98.49065 4924.532
2 m 99.20998 4960.499
Query database and read results in smaller partial chunks. Useful for large queries.
dbSendQuery(con, "SELECT IQ, rating FROM MyDataFrame WHERE rating = 'A'")
res <-
while(!dbHasCompleted(res)) {
dbFetch(res, n=4)
partial <-print(partial)
}
IQ rating
1 118.44900 A
2 84.39989 A
3 116.40531 A
4 78.45338 A
IQ rating
1 99.16116 A
2 90.33680 A
3 84.55555 A
4 98.49379 A
IQ rating
1 85.98571 A
2 84.39989 A
3 119.94145 A
4 99.16116 A
IQ rating
1 98.56461 A
2 85.98571 A
3 118.44900 A
4 116.40531 A
IQ rating
1 105.2867 A
2 134.9761 A
3 121.1204 A
4 106.3329 A
IQ rating
1 78.45338 A
2 90.33680 A
3 98.56461 A
4 84.39989 A
IQ rating
1 116.40531 A
2 119.94145 A
3 86.59827 A
4 73.50803 A
IQ rating
1 106.33290 A
2 78.45338 A
3 98.49379 A
Clean query, remove the created table, and close the database connection.
dbClearResult(res)
[1] TRUE
dbRemoveTable(con, "MyDataFrame")
[1] TRUE
dbDisconnect(con)
[1] TRUE
The CRAN Web Technologies Task View presents packages to directly scrape data from online sources. data.table
provides function fread()
for high performance reading of large plain text data files.
try(detach(package:foreign))
try(detach(package:RODBC))
try(detach(package:RSQLite))
try(detach(package:DBI))
R markdown - markdown - R code - all posts