quanteda runs solely on base R, but RStudio makes it easy to write your code and inspect your objects. You will need to have base R installed, and we also recommend to install the latest version of RStudio.
First, you need to have quanteda installed. You can do this from inside RStudio, from the Tools > Install Packages, or executing a command.
Since the release of quanteda version 3.0,
textplot_* functions are available in separate packages. We will use several of these functions in the chapters below and strongly recommend to install these packages.
install.packages("quanteda.textmodels") install.packages("quanteda.textstats") install.packages("quanteda.textplots")
If you are feeling adventurous, you can install the latest build of quanteda from its GitHub code page.
If you use the quanteda package in your research, please cite:
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. 2018 “quanteda: An R package for the quantitative analysis of textual data." Journal of Open Source Software 3(30), 774. https://doi.org/10.21105/joss.00774.
We will use the readtext package to read in different types of text data in these tutorials. Again, you can do this using RStudio menu (Tools > Install Packages), or executing the following command.
We will also use extra datasets in tutorials that are available in quanteda.corpora. This package is not on CRAN, but can be installed with the
install_github() function from the devools package
install.packages("devtools") # get devtools to install quanteda.corpora devtools::install_github("quanteda/quanteda.corpora")
If you already have quanteda and other packages installed, run Tools > Check for Package Updates to install the latest versions. We recommend to update all the packages using
update.packages() to avoid errors caused by dependencies.
The tutorials do not cover syntactical analysis, but you should install spacyr for part-of-speech tagging, entity recognition, and dependency parsing. It provides an interface to the spaCy library and works well with quanteda. Note that you need to have Python installed to use the spacyr package. See the package description for more information.
To sum up, you need to load the following packages to run all examples:
require(quanteda) require(quanteda.textmodels) require(quanteda.textstats) require(quanteda.textplots) require(readtext) require(devtools) require(quanteda.corpora) require(newsmap) require(seededlda)
quanteda_options() you can get or set global options affecting functions across quanteda. One very useful feature is changing the number of threads to use in parallelised functions. By default, quanteda uses two threads, but depending on the RAM of your machine, you can use more than two threads.
quanteda_options("threads" = 10) will use ten threads which massively reduces the time to execute the parallelised functions.