You do not need to have advanced knowledge of the R programming language to perform text analysis with **quanteda** because the package has a wide range of functions. However, you still have to understand a number of basic R commands.

R has three types of objects: *vector*, *data frame* and *matrix*. Since many of the **quanteda** objects behave similarly to these objects, you need to understand how to interact with them.

As a language for statistical analysis, R"s most basic objects are vectors. Vectors contain a set of values. In the examples below, `vec_num`

is a *numeric vector*, while `vec_char`

is a *chracter vector*. We use `c()`

to combine elements of a vector and `<-`

to assign a vector to a variable.

```
vec_num <- c(1, 5, 6, 3)
print(vec_num)
```

```
## [1] 1 5 6 3
```

```
vec_char <- c("apple", "banana", "mandarin", "melon")
print(vec_char)
```

```
## [1] "apple" "banana" "mandarin" "melon"
```

Once a vector is created, you can extract elements of vectors with the `[]`

operator and index numbers of desired elements.

```
print(vec_num[1])
```

```
## [1] 1
```

```
print(vec_num[1:2])
```

```
## [1] 1 5
```

```
print(vec_char[c(1, 3)])
```

```
## [1] "apple" "mandarin"
```

You can apply arithmetical operations such as addition, subtraction, multiplication or division on numeric vectors. If only a single value is given for multiplication, for example, each element of the vector will be multiplied by the same value.

```
vec_num2 <- vec_num * 2
print(vec_num2)
```

```
## [1] 2 10 12 6
```

You can also compare elements of a vector by relational operators such as `==`

, `>=`

, `>`

, `<=`

, `<`

. The result of these operations will be a *logical vector* that contains either `TRUE`

or `FALSE`

.

```
vec_logi_gt5 <- vec_num >= 5
print(vec_logi_gt5)
```

```
## [1] FALSE TRUE TRUE FALSE
```

You cannot apply arithmetical operations on character vectors, but can apply the equality operator.

```
vec_logi_apple <- vec_char == "apple"
print(vec_logi_apple)
```

```
## [1] TRUE FALSE FALSE FALSE
```

You can also concatenate elements of character vectors using `paste()`

. Since the two vectors in the example have the same length, elements at the same positions of the vectors are concatenated.

```
vec_char2 <- paste(c("red", "yellow", "orange", "green"), vec_char)
print(vec_char2)
```

```
## [1] "red apple" "yellow banana" "orange mandarin" "green melon"
```

Finally, you can set names to elements of a numeric vector using `names()`

.

```
names(vec_num) <- vec_char
print(vec_num)
```

```
## apple banana mandarin melon
## 1 5 6 3
```

A data frame combines multiple vectors to construct a dataset. You can combine vectors into a data frame only if they have the same lengths. However, they can be different types. `nrow()`

and `ncol()`

show the number of rows (observations) and variables in a data frame.

```
dat_fruit <- data.frame(name = vec_char, count = vec_num)
print(dat_fruit)
```

```
## name count
## apple apple 1
## banana banana 5
## mandarin mandarin 6
## melon melon 3
```

```
print(nrow(dat_fruit))
```

```
## [1] 4
```

```
print(ncol(dat_fruit))
```

```
## [1] 2
```

You can use `subset()`

to select records in the data frame.

```
dat_fruit_sub <- subset(dat_fruit, count >= 5)
print(dat_fruit_sub)
```

```
## name count
## banana banana 5
## mandarin mandarin 6
```

```
print(nrow(dat_fruit_sub))
```

```
## [1] 2
```

```
print(ncol(dat_fruit_sub))
```

```
## [1] 2
```

We use `print()`

to show values and structures of objects in the examples, but you do not need to use the `print()`

command in the console, because it is triggered automatically when objects are returned to the global environment.

Similar to a data frame, a matrix contains multi-dimensional data. In contrast to a data frame, its values must all be the same type.

```
mat <- matrix(c(1, 3, 6, 8, 3, 5, 2, 7), nrow = 2)
print(mat)
```

```
## [,1] [,2] [,3] [,4]
## [1,] 1 6 3 2
## [2,] 3 8 5 7
```

You can use `colnames()`

or `rownames()`

to set/retrieve names to rows or columns of a matrix.

```
colnames(mat) <- vec_char
print(mat)
```

```
## apple banana mandarin melon
## [1,] 1 6 3 2
## [2,] 3 8 5 7
```

```
rownames(mat) <- c("bag1", "bag2")
print(mat)
```

```
## apple banana mandarin melon
## bag1 1 6 3 2
## bag2 3 8 5 7
```

You can obtain the size of a matrix by `dim()`

that returns a two-element numeric vector.

```
print(dim(mat))
```

```
## [1] 2 4
```

If a matrix has column and row names, you can extract rows or columns by their names.

```
print(mat["bag1", ])
```

```
## apple banana mandarin melon
## 1 6 3 2
```

```
print(mat[, "banana"])
```

```
## bag1 bag2
## 6 8
```

Finally, you can obtain marginals of matrix by `colSums()`

or `rowSums()`

.

```
print(rowSums(mat))
```

```
## bag1 bag2
## 12 23
```

```
print(colSums(mat))
```

```
## apple banana mandarin melon
## 4 14 8 9
```

If you want to know the details of R commands, prepend `?`

to the command and execute. For example, `?subset()`

will show you how to use the subset function with different types of objects.