1.
Introduction
Install packages
R commands
2.
Data Import
Pre-formatted files
Multiple text files
Different encodings
3.
Basic Operations
Workflow
Corpus
Construct a corpus
Document-level variables
Subset corpus
Change units of texts
Extract tags from texts
Tokens
Construct a tokens object
Keyword-in-contexts
Select tokens
Compound tokens
Look up dictionary
Generate n-grams
Document-feature matrix
Construct a DFM
Select features
Look up dictionary
Group documents
Feature co-occurence matrix
Construct a FCM
4.
Statistical Analysis
Simple frequency analysis
Lexical diversity
Document/feature similarity
Relative frequency analysis (keyness)
Collocation analysis
5.
Advanced Operations
Compute similarity between authors
Compound multi-word expressions
Apply dictionary to specific contexts
Identify related words of keywords
6.
Scaling and Classification
Naive Bayes classifier
Regularized regression classifier
Wordscores
Wordfish
Correspondence analysis
Topic models
Newsmap
Latent Semantic Scaling
7.
Different Languages
Overview
English and German
Russian
Arabic and Hebrew
Chinese
Japanese
More
Github repo
Documentation
Clear History
Built with
from
Grav
and
Hugo
Edit this page
quanteda tutorials
>
Basic Operations
> Tokens
Tokens
Learn how to construct and modify a tokens object