This website contains a step-by-step introduction to quantitative text analysis using quanteda. The chapters cover a brief introduction to the statistical programming language R, how to import text data, basic operations of quanteda, how to construct a corpus, tokens objects, a document-feature matrix, and how to conduct advanced operations. The final chapter deals with text scaling (e.g., Wordscores, Wordfish, correspondence analysis), document classification using Naive Bayes and topic models.
The six chapters consist of over 30 sections. If you click on the name of a chapter on the left-hand side of this page, the sections will pop up. You can also use the “Search” field in the top-left corner to look up the occurrence of certain terms or R functions covered in the tutorials.
This website is created for workshops held by the quanteda team and for users who look for a comprehensible step-by-step introduction to text analysis using R. We have also created several additional useful resources, such as vignettes, replications, a cheatsheet and a comparison to other text analysis packages (in terms of functions and performance) to get you started.
You can not only see the R commands but execute them yourself if you download the source code of this website from the Github repository. You should unzip the files on your machine and click
quanteda_tutorials.Rproj to open RStudio. Executable R commands are in the
.Rmarkdown files under the
Contributions in the form of feedback, comments, code, and bug reports are most welcome. If you have questions on how to use quanteda, please post them to the quanteda channel on StackOverflow. If you find a bug, please report it to the quanteda issues. We prefer these platforms to emails in communicating with our users because the records will help other users who have similar problems.
Examples in this tutorial are written for quanteda version 1.3.4. Please check if you have the same version installed by a command