Data import and export

Install required packages

haven, readxl, writexl, RSQLite

Read and write data in text format

Read and write text data from a local file

If reading from a file, check with getwd() that you are in the correct directory - or specify full file path. If the result should be a data frame, use read.table().

Some relevant options for read.table()

Some relevant options for write.table()

To read comma-separated-value files, use read.csv(), for fixed-with-files read.fwf(). To read a raw (unstructured) text file, use readLines(). The result is a character vector with one element per line.

Read text data from other sources

The described R functions can also read data from a character string.

    X     Y      Z
1 123 A B C   34.8
2  99 D E F 3543.2

Other sources can be standard-input (if R is used in batch mode via Rscript), the clipboard, or directly from an online address.

Read and write data in R binary format

Save multiple objects together with their respective symbol (=name). Object names do not need to be assigned when reading data.

Save objects separately without their symbol (=name) -> preferred. Object name needs to be assigned when reading data.

Use R for data entry

Read data from the console with scan(). Lines are terminated by hitting the Return key, scan() quits when Return is hit on a blank line.

Save R data to file

Create a logfile for commands and output

Save and read R objects in text format

Exchange data with other statistics software and spreadsheets

Exchange data with SPSS, SAS and Stata

One option is to use text files (tab-separated or comma-separated) as described above to exchange data with other statistics software packages.

Another option to exchange data with SPSS, SAS and Stata (among others) is the haven package. Example for SPSS:

SPSS has the concept of “variable labels” and “value labels” which do not directly map to R concepts. Variables read from SPSS data files therefore have special labels and are of class haven_labelled. Categorical variables can be converted to regular R factors using as_factor().

When writing data to an SPSS file, labels can be set manually, and the class can be changed to haven_labelled.

Use the SPSS Integration Plug-in for R

If you have SPSS available, install the “Integration plug-in for R” add-in. This allows you to run R within SPSS. The plug-in includes an R package with functions that transfer the active SPSS data frame to R (and back) - including labeled factor levels, dates and German umlauts. Once installed, you can use it like this in the SPSS syntax window:

# from here on, you can use R syntax
myDf <- spssdata.GetDataFromSPSS(missingValueToNA=TRUE,
save(myDf, file="d:/path/to/your/myDf.Rdata")

Exchange data with Excel

One option is to use text files (tab-separated or comma-separated) as described above to exchange data with spreadsheet applications.

When date columns are not correctly formatted in Excel, they sometimes get imported as integer numbers. For dates, this is the number of days since 1899-12-30. For date-times, it’s the number of seconds since 1899-12-30. To convert these numbers to dates or date-times, package openxlsx has functions convertToDate() and convertToDateTime().

Read and write data from a database

There are R packages that provide an interface to all common database types. Using databases is described in more detail in:

  • Adler, J. (2012). R in a Nutshell (2nd ed.). Sebastopol, CA: O’Reilly.
  • Spector, P. (2008). Data Manipulation with R. New York, NY: Springer.

SQL commands are described here and here.

Using the DBI interface with RSQLite

Simulate data

Save data frame in SQLite database. This is usually a file. In this example, the file is created in memory only. Use dbConnect(<driver object>, dbname="file_name.db") to create a file on disk.

Find out which tables are present, and which fields are in a specific table.

[1] "MyDataFrame"
[1] "sex"    "IQ"     "rating"

Read complete table, then send SQL-query.

  sex        IQ rating
1   f  77.32935      B
2   m  86.09951      A
3   f  97.07726      C
4   m 115.73843      C
  sex      mIQ      sIQ
1   f 98.73237 4936.618
2   m 98.63117 4931.559

Query database and read results in smaller partial chunks. Useful for large queries.

         IQ rating
1  86.09951      A
2  97.30923      A
3 109.71035      A
4 115.46680      A
         IQ rating
1  99.28235      A
2  89.13792      A
3  86.09951      A
4 115.73843      A
         IQ rating
1 109.71035      A
2 114.90525      A
3 115.46680      A
4  99.28235      A
         IQ rating
1  93.28827      A
2  97.30923      A
3  78.64821      A
4 109.71035      A
         IQ rating
1 115.46680      A
2 100.00818      A
3 101.47441      A
4  97.07726      A
         IQ rating
1 113.28772      A
2 115.46680      A
3  93.28827      A
4  88.56834      A
         IQ rating
1 116.61545      A
2  77.32935      A
3  97.07726      A
4 115.73843      A
         IQ rating
1  83.76086      A
2  78.64821      A
3 109.71035      A
4 114.90525      A
        IQ rating
1 101.4744      A

Clean query, remove the created table, and close the database connection.

Useful resources

Detach (automatically) loaded packages (if possible)

Get the article source from GitHub

R markdown - markdown - R code - all posts