Pre-formatted files

require(quanteda)
require(readtext)

First, we will show you how to import pre-formatted files that come in a “spreadsheet format”. path_data is the location of sample files on your computer that come with the readtext package.

path_data <- system.file("extdata/", package = "readtext")

If your text data is stored in a pre-formatted file where one column contains the text and additional columns might store document-level variables (e.g. year, author, or language), you can use read.csv() to import.

dat_inaug <- read.csv(paste0(path_data, "/csv/inaugCorpus.csv"))

Alternatively, you can use the readtext package to import character (comma- or tab-separated) values. readtext reads files containing text, along with any associated document-level variables.

dat_dail <- readtext(paste0(path_data, "/tsv/dailsample.tsv"), text_field = "speech")

The most common problem related to loading data into R are misspecified locations of files or directories. If a path is relative, check where you are using getwd() and set the root directory of your project using setwd(). On Windows, you also have to replace all \ in a path with /.

If you have more than a few R files in a project, you should create an RStudio Project to better manage files and settings. You can create an RStudio project from the menu (File > New Project).