Importing and exporting data is critical to learning how to use R and analyze data. Most R scripts start by importing a dataset and many script end by exporting some data or images.
Let’s learn together how to import and export all of the common file types you will encounter when using R.
Summary Table
File Type | Package Name(s) | Import Function | Import Documentation | Export Function | Export Documentation |
.xlsb | readxlsb | read_xlsb() | docs | N/A | N/A |
.xls/.xlsx | readxl, writexl | read_xls(), read_excel() | docs | write_xlsx() | docs |
.csv | readr, AlphaPart | read_csv(), read_excel() | docs | write.csv() | docs |
.xml | XML, methods | xmlParseDoc() | docs | saveXML() | docs |
.json | rjson | fromJSON() | docs | toJSON(), write() | docs |
.txt | utils | read.table() | docs | write.table() | docs |
Importing data in R as a Dataframe versus a Tibble
Whenever you can, you should use the Tidyverse and tibble object as opposed to the classic R data frame object.
Tidyverse is a collection of well-maintained packages that were created with data science and analytics in mind. The Tidyverse represents some of the best that R has to offer.
A tibble object is essentially a data frame object without any of the annoying or burdensome features that users often are forced to code around in R. After using tibbles for a while, you will never want to use classic data frame objects again.
Luckily for us, there are Tidyverse functions that will read in virtually any file type as a tibble.
Hooray!
Why import data as tibbles instead of dataframe objects
There are a number of reasons that tibbles are superior to data frame objects. In no particular order:
- Tibbles automatically only display the head of the data without needing to call head()
- Tibbles don’t assume factors as default but rather strings
- Tibbles also automatically displays each columns data type
- Tibbles clearly delineates the use of “[” and “[[“
- Tibbles are much more strict about subsetting rules forcing you to write more precise code
- Tibbles have columns as lists rather than vectors
- Columns and values can start with a number or a space
So don’t be afraid of the tibble and make it a habit of importing data into R as a tibble as opposed to a classic data frame. You will have fewer headaches and be a happier analyst/programmer!
Importing and Exporting a CSV file in R
Importing
## Read in libraries
library(readr)
# Import CSV
df= read_csv("example.csv")
df
Exporting
## Read in libraries
# There are no needed libraries
# Export CSV
write.csv(df, "output_example.csv")
Tips
Note that when it comes down to it, a CSV file is just a delimited text (.txt) file. Actually, the read_csv() function here is defined as a special case of the read_txt() function.
Importing and Exporting an XLS or XLSX file in R
Importing
## Read in libraries
library(readxl)
# Import XLS or XLSX
df= read_xls("example.xls",sheet = "example")
df
Exporting
## Read in libraries
library(writexl)
# Export XLSX or XLS
write_xlsx(df, "output_example.xls")
Tips
You can use the same functions and package for both .xls and .xlsx files. Don’t get caught up on the extra “x” on the end of the file extension. These are essentially the same file type.
Importing and Exporting an XLSB file in R
Importing
## Read in libraries
library(readxlsb)
# Import XLSB
df= read_xlsb("example.xlsb",sheet = "example")
df
Exporting
Excel Binary format is a proprietary encoding/compression format used by Microsoft that is not shared. For this reason, it is best to save and export your data as a .csv or .xlsx file type.
Using XLSB files
This is just a fancy compressed form of an excel file so don’t be afraid of the extra “b” at the end of the file name.
Importing and Exporting a JSON file in R
Importing
## Read in libraries
library("rjson")
# Import JSON
df <- fromJSON(file = "example.json")
df
Exporting
## Read in libraries
library("rjson")
# Export JSON
# Note that df needs to be a json object. Use toJSON() to convert to a json object.
df <- toJSON(df)
write(df, "example_export.json")
When to use JSON files
JSON files are an alternative to XML and is used when sending data across the web (ie from server to server).
There are not many situations where you will want to write your data to a JSON file, but it is likely that you might need to read in JSON data at some point in your data analyst career.
Importing and Exporting an XML file in R
Importing
## Read in libraries
library("XML")
library("methods")
# Import XML
df <- xmlParseDoc(file = "example.xml")
df
Exporting
## Read in libraries
library("XML")
library("methods")
# Export XML
df <- saveXML(df, file = "example_export.xml")
df
When to use XML files
You might come across XML files that you will need to input data from especially if you are getting data from an API or web service. Data is transmitted across the web using either XML or JSON.
I would never write data to an XML file unless you for some reason want to test a web API endpoint, but even then I can’t imagine using R to export data for this.
Importing and Exporting a TXT file in R
Importing
## Read in libraries
# There are no needed libraries
# Import TXT
df <- read.table("example.txt", header=T, sep='\t')
df
Exporting
## Read in libraries
# There are no needed libraries
# Export TXT
write.table(df, 'example_export.txt', quote=FALSE, sep='\t', row.names=FALSE)
When to use TXT files
Data stored in raw text files are difficult to work with.
Unless they are delimited by a character or spacing in a predictable way, the data will be unusable. So expect some trouble anytime you receive data in a .txt format.
My suggestion is to never store data in text (.txt) files unless for some reason you absolutely have to as importing data from .txt files is probably the most difficult data format to work with. I cannot imagine a reason you would need to make things difficult for yourself. So just don’t do it.
Conclusion
Importing and exporting data in R is one of the first things you will learn and for good reason. You can’t do anything with data that you can’t import.
In this article we learned how to import and export various types of file formats into the R environment. Usually, the methods we used in this article read in data as tibble data types, which are preferable to classic data frames.
Thanks for reading and look out for more content from RTL Coding. Hope this article was useful and if you would like to support this blog and this content feel free to donate via Paypal to help support more helpful content.
Sources
- MySQL SQL Dump Options and Examples Documentation
- Amazon Web Service (AWS) SQL Dump Docs
- Microsoft Azure SQL Dump Docs