quanteda runs solely on base R, but RStudio makes it easy to write your code and inspect your objects. You will need to have base R installed, and we also recommend to install the latest version of RStudio.
First, you need to have quanteda installed. You can do this from inside RStudio, from the Tools > Install Packages, or executing a command.
If you are feeling adventurous, you can install the latest build of quanteda from its GitHub code page.
If you use the quanteda package in your reserach, please cite:
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. 2018 “quanteda: An R package for the quantitative analysis of textual data.” Journal of Open Source Software 3(30), 774. https://doi.org/10.21105/joss.00774.
We will use the readtext package to read in different types of text data in this tutorials. Again, you can do this using RStudio menu (Tools > Install Packages), or executing the following command.
We will also use extra datasets in tutorials that are available in quanteda.corpora.
If you already have quanteda and other packages installed, run Tools > Check for Package Updates to install the latest versions. We recommmend to update all the packages using
update.packages() to avoid errors caused by dependencies.
The tutorials do not cover syntactical analysis, but you should install spacyr for part-of-speech tagging, entity recognition, and dependency parsing. It provides an interface to the spaCy library and works well with quanteda. Note that you need to have Python installed to use the spacyr package. See the package description for more information.
To sum up, you need to load the following packages to run all examples:
require(quanteda) require(readtext) require(quanteda.corpora) require(newsmap)
quanteda_options() you can specify get or set global options affecting functions across quanteda. One very useful feature is changing the number of threads to use in parralelised functions. By default, quanteda uses two threads, but depending on the RAM of your machine, you can use more than two threads.
quanteda_options("threads" = 10) will use ten threads which massively reduces the time to execute the parralelised functions.