# Indexing Rows in Data.Table

During the day (and night), I’m a military scientist at Ft. Detrick. I have to tell you…I use R to do everything on a daily basis. Whether it’s a hard core research project, administrative tasks, such as a manpower analysis, financial modeling, or communicating my work to senior leaders with reproducible research documents in **RMarkdown**, I use R. R adds efficiency, effectiveness and innovation to the way I manage and perform daily tasks. Imagine performing your daily work without the use of Excel. That’s right, you likely need it to be effective. But, R is Excel on Human Growth Hormone, plus a few other super Steroids. The problem is you’ve learned to think about data through an Excel lens (kudos Bill Gates). Anyways, I digress…

One of the most common tasks I’ve run into is having to index a dataset. The package **data.table** provides powerful tools to address many problems like this. When you have a chance, check out my blog on the data.table package here. Like, Share, and Subscribe!

# Solution

- Use special read-only symbols that can be used in the j position of a data.table: .SD, .N, .I, .GRP, .BY.
- Use .I to index the data.table

# Let’s Try it Out!

## Create a Data.Table to Work With

- First let’s make a data.table with letters in the ‘x’ column.

- I’m using the
**set.seed**function to ensure it is reproducible if you try this out later.

- Also, we will use the
**head**function to see the first few rows of data.

- BTW, most of the time, I use ‘dt’ to name objects that are a data.table, ‘df’ for data.frame, ‘ls’ for list, etc…

```
set.seed(111)
l <- sample(letters[21:26], 20, TRUE)
dt <- data.table(x = l)
head(dt)
```

x 1: z 2: w 3: x 4: w 5: u 6: w

- By the way, you can set ‘letters’ to ‘LETTERS’ to return capitalized versions.

- Also, change 20:26 to the specific index of each letter in the alphabet to return only the letter you want. For example, 1:5 returns a, b, c, d, and e.

## How to Index the Data.Table?

- You may want to index the whole data.table, especially if you plan to split it up and merge it back together at later time .
- While you can simple just add dt$index <- 1:length(nrow(dt)) to add an indexed column, data.table provides special read-only symbols that can be used in the j position: .SD, .N, .I, .GRP, .BY.
- recall that a data.table has similar SQL terminology: dt[i,j,by]
- i is the row index/filter etc…WHERE
- j is to SELECT
- by is to GROUP BY
- Take dt, subset the rows using ‘i’, then calculate ‘j’ grouped by ‘by

- Here we will use .I to index in various ways.

## How Many Times Does ‘x’ Occur?

- You may want to determine how many times a given variable occurs in your dataset
- WHERE x == ‘x’
- SELECT .I (the indexed number)
- GROUP BY n/a

`dt[x == "x", .I]`

[1] 1 2 3

## At Which Positions Does ‘x’ Occur?

- You may want to determine where (i.e. similar to the function
**which**) a given variable occurs in your dataset- WHERE x == ‘x’ (inside the j position)
- SELECT .I (the indexed number)
- GROUP BY n/a

`dt[, .I[x == "x"]]`

[1] 3 9 15

## At Which Positions Does Each Unique Letter First Occur?

- You may want to determine the row of the first occurrence for each unique value.
- WHERE n/a
- SELECT .I[1] (the indexed first occurrence)
- GROUP BY x (the column of interest)

`dt[, .I[1], by = x]`

x V1 1: z 1 2: w 2 3: x 3 4: u 5 5: y 7 6: v 10

## At Which Positions Does Each Unique Letter Occur a Second Time?

- You may want to determine the row of the second occurrence for each unique value.
- WHERE n/a
- SELECT .I[2] (the indexed second occurrence)
- GROUP BY x (the column of interest)

`dt[, .I[2], by = x]`

x V1 1: z 16 2: w 4 3: x 9 4: u 11 5: y 12 6: v 14

Data.table is a power package to manipulate data and has the backbone to handle big data. Once you transition to thinking about data in terms of dt[i,j,by] you can apply these tools to a lot of your daily tasks.

Until next time…