You do not need to have advanced knowledge of the R programming language to perform text analysis with **quanteda** because the package has wide range of functions. However, you still have to understand a number of basic R commands.

R has three types of objects: *vector*, *data frame* and *matrix*. Since many of the **quanteda** objects behave similarly to these objects, it is essential for you to understand how to interact with them.

As a language for statistical analysis, R’s most basic objects are vectors. Vectors contain a set of values. In the examples below, `num_vec`

is a *numeric vector*, while `char_vec`

is a *chracter vector*. We use `c()`

to combine elements of a vector and `<-`

to assign a vector to a variable.

```
num_vec <- c(1, 5, 6, 3)
print(num_vec)
```

```
## [1] 1 5 6 3
```

```
char_vec <- c('apple', 'banana', 'mandarin', 'melon')
print(char_vec)
```

```
## [1] "apple" "banana" "mandarin" "melon"
```

Once a vector is created, you can extract elements of vectors with the `[]`

operator and index numbers of desired elements.

```
print(num_vec[1])
```

```
## [1] 1
```

```
print(num_vec[1:2])
```

```
## [1] 1 5
```

```
print(char_vec[c(1, 3)])
```

```
## [1] "apple" "mandarin"
```

You can apply arithmetical operations such as addition, subtraction, multiplication or division on numeric vectors. If only a single value is given for multiplication, for example, each element of the vector will be multiplied by the same value.

```
num_vec2 <- num_vec * 2
print(num_vec2)
```

```
## [1] 2 10 12 6
```

You can also compare elements of a vector by relational operators such as `==`

, `>=`

, `>`

, `<=`

, `<`

. The result of these operations will be a *logical vector* that contains either `TRUE`

or `FALSE`

.

```
logi_gt5_vec <- num_vec >= 5
print(logi_gt5_vec)
```

```
## [1] FALSE TRUE TRUE FALSE
```

You cannot apply arithmetical operations on character vectors, but can apply the equality operator.

```
logi_apple_vec <- char_vec == 'apple'
print(logi_apple_vec)
```

```
## [1] TRUE FALSE FALSE FALSE
```

You can also concatenate elements of character vectors using `paste()`

. Since the two vectors in the example have the same length, elements at the same positions of the vectors are concatenated.

```
char_vec2 <- paste(c('red', 'yellow', 'orange', 'green'), char_vec)
print(char_vec2)
```

```
## [1] "red apple" "yellow banana" "orange mandarin" "green melon"
```

Finally, you can set names to elements of a numeric vector using `names()`

.

```
names(num_vec) <- char_vec
print(num_vec)
```

```
## apple banana mandarin melon
## 1 5 6 3
```

A data frame combines multiple vectors to construct a dataset. You can combine vectors into a data frame only if they have the same lengths. However, they can be different types. `nrow()`

and `ncol()`

show the number of rows (observations) and variables in a data frame.

```
fruit_df <- data.frame(name = char_vec, count = num_vec )
print(fruit_df)
```

```
## name count
## apple apple 1
## banana banana 5
## mandarin mandarin 6
## melon melon 3
```

```
print(nrow(fruit_df))
```

```
## [1] 4
```

```
print(ncol(fruit_df))
```

```
## [1] 2
```

You can use `subset()`

to select records in the data frame.

```
fruit_df2 <- subset(fruit_df, count >= 5)
print(fruit_df2)
```

```
## name count
## banana banana 5
## mandarin mandarin 6
```

```
print(nrow(fruit_df2))
```

```
## [1] 2
```

```
print(ncol(fruit_df2))
```

```
## [1] 2
```

We use `print()`

to show values and structures of objects in the examples, but you do not need to use the `print()`

command in the console, because it is triggered automatically when objects are returned to the global environment.

Similar to a data frame, a matrix contains multi-dimensional data. In contrast to a data frame, its values must all be the same type.

```
mat <- matrix(c(1, 3, 6, 8, 3, 5, 2, 7), nrow = 2)
print(mat)
```

```
## [,1] [,2] [,3] [,4]
## [1,] 1 6 3 2
## [2,] 3 8 5 7
```

You can use `colnames()`

or `rownames()`

to set/retrieve names to rows or columns of a matrix.

```
colnames(mat) <- char_vec
print(mat)
```

```
## apple banana mandarin melon
## [1,] 1 6 3 2
## [2,] 3 8 5 7
```

```
rownames(mat) <- c('bag1', 'bag2')
print(mat)
```

```
## apple banana mandarin melon
## bag1 1 6 3 2
## bag2 3 8 5 7
```

You can obtain the size of a matrix by `dim()`

that returns a two-element numeric vector.

```
print(dim(mat))
```

```
## [1] 2 4
```

If a matrix has column and row names, you can extract rows or columns by their names.

```
print(mat['bag1', ])
```

```
## apple banana mandarin melon
## 1 6 3 2
```

```
print(mat[, 'banana'])
```

```
## bag1 bag2
## 6 8
```

Finally, you can obtain marginals of matrix by `colSums()`

or `rowSums()`

.

```
print(rowSums(mat))
```

```
## bag1 bag2
## 12 23
```

```
print(colSums(mat))
```

```
## apple banana mandarin melon
## 4 14 8 9
```

If you want to know the details of R commands, prepend `?`

to the command and execute. For example, `?subset()`

will show you how to use the subset function with different types of objects.