## Optimization tips

Performance tip: know the limitations of the Mata compiler; it is not as sophisticate as a C++ compiler.

### Example 1: don’t evaluate `rows(x)`

in loops

```
for (i=1; i<=rows(x); i++) {
// ...
}
```

Defining `n = rows(n)`

and then replacing `rows(x)`

by `n`

inside the loop reduces the execution time of the loop by 25%.

### Example 2: unroll loops

If you have

```
y = 1
// ...
for (i=1; i<=n; i++) {
// Lots of code
if (y == 1) {
// Some code when y==1
}
else {
// Some code when y==2
}
}
```

Then it would be faster if you avoid evaluating the condition for every iteration of the loop, even if that makes the codebase larger:

```
y = 1
// ...
if (y == 1)
for (i=1; i<=n; i++) {
// Lots of code
// Some code when y==1
}
}
else {
for (i=1; i<=n; i++) {
// Lots of code
// Some code when y==2
}
}
```

### Example 3: In a conditional within a loop, evaluate more likely cases first

If you have `if else`

conditions in a loop, put the cases that happen more often first. EG:

```
for (i=1; i<=n; i++) {
if (unlikely_event) {
// ...
}
else if (more_likely) {
y = y + 100
}
else { // very likely
y = y + 10
}
}
```

If you move the unlikely events to the bottom, you will have less evaluations on average per loop.

(This do-file shows a 25% time savings)

### Example 4: Whenever possible, use `x[i]`

instead of `x[i,j]`

and `x[i,.]`

If a matrix is often a vector, dealing with these cases separately will likely be much faster. Moreover, even `x[i, .]`

is slower than `x[i]`

even if your matrix is a vector.

(This do-file shows a 33% time savings!)

### Example 5: On very hot loops, decrement instead of increment

Suppose we want to compute the sum of a vector and forgot that the `sum()`

function exists. The standard approach is to do

```
ans = 0
n = rows(x)
for (i=1; i<=n; i++) {
ans = ans + x[i]
}
```

However, this equivalent code below takes only 80% of the time:

```
ans = 0
n = rows(x)
i = n + 1
while (--i) {
ans = ans + x[i]
}
```

See this do-file for a quick benchmark, and this post for an explanation.

But don’t forget that `sum(x)`

takes only **2%** of the time of the for-loop.

### Example 6: Dont’ use `select(data, rowmask)`

to select certain columns

Instead, use normal indexing with selectindex, which is up to 3x faster:

```
// Create data
x = J(5000000, 20, 1)
mask = 1,1,1,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0
z1 = select(x, mask) // 3x slower
z2 = x[., selectindex(mask)] // Faster alternative
assert(all(z1 :== z2)) // Both give equal output
```

### Example 7: use masks to get the list and count of unique IDs in a vector

This is a very neat trick that I first learned from Andrew Maurer and Aljar Meesters

Suppose you have a vector of IDs, and these IDs are positive integers that don’t have huge values. In Stata, you would use `contract`

or `levelsof`

to get the list of unique IDs and `distinct`

to get the number of unique IDs. This Mata trick is instead *much* faster:

```
// Create data (1mm obs and up to 100k ids)
ids = ceil(runiform(1e6, 1)*1e5)
// Implement -distinct- for integers
mask = J(max(ids), 1, 0)
mask[ids] = J(rows(ids), 1, 1)
sum(mask) // number of distinct ids; to view the IDs just do "selectindex(mask)"
```

Given a vector of size N with integers between 0 and K, this trick has two steps:

- Create an Nx1 zero vector “mask”
- Use the input vector to select items of the mask and assign them the value of 1 (see
`help m1_permutation`

). We can then use either the`sum()`

function to count the 1s, or the`selectindex()`

function to retrieve their indices (i.e. the unique values).

As long as the IDs take values below one million, this will be much faster than all other implementations that I know of (distinct, gdistinct, etc.). Also, for more information on how this work, you can also look at a similar tool in numpy. Moreover, the trick is so common with large datasets that is implemented in ftools:

```
import ftools.mata, adopath // ssc install ftools
mata: mask = create_mask(max(ids), 0, ids, 1) // create a mask of size max(ids), filled 1/0 based on ids.
mata: sum(mask)
```

Finally, you can do more advanced tricks, such as Aljar’s `vec_inlist()`

in the link above (which can be used to do fast merges across large datasets), as well as the code below, which can be used as a faster version of `panelsetup()`

(and `uniqrows()`

when the vector is already sorted):

```
// Input: sorted list of IDs
x=sort((5,4,1,1,4,4,6)',1)
n = rows(x)
start_pos = end_pos = J(n, 1, 0)
start_pos[x[n::1]] = n::1
end_pos[x] = 1::n
start_pos, end_pos
levels = selectindex(start_pos)
counts = end_pos[levels] - start_pos[levels] :+ 1
// Same results
levels, counts
uniqrows(x, 1)
// Same results
start_pos[levels], end_pos[levels]
panelsetup(x, 1)
```

As a benchmark, when used with a vector of 5mm IDs between 1 and 100,000, this approach took 0.2s, while uniqrows() took 5.8s and panelsetup took 0.9s.

## Expanding data (similar to Stata’s `expand`

)

It’s easy to expand a matrix:

```
expanded_data = data[index]
```

Example:

```
mata
data = (10, 20, 30)'
index = (1, 1, 3, 1, 2, 3)'
data[index]
end
```

Complex example:

```
sysuse auto
collapse (sum) price, by(foreign)
mata: data = st_data("price")
sysuse auto, clear
mata: F = factor("foreign")
mata: st_store(., "sumprice", data[F.levels])
```

Note that we need a few things for the complex example:

- A
`data`

vector or matrix that is already sorted by the key variable (e.g. foreign) `data`

must not have gaps (so if the master dataset has foreign = {1,2,3,4,5}, and using only has foreign = {2,4}, then we*must*add an empty row at foreign={1,3}, but not foreign=5)- We need to encode the master dataset so the key variable takes values {1,2,3,…}. We can use
`ftools`

for that. - However, the factor object must have all the factors that
`data`

uses. If not, then if using has foreign={1,2,3} and master has {2,3},`ftools`

would erroneously recode master into {1,2}.

## Type names

I include type aliases at the beginning of every Mata file. EG:

```
findfile "ftools_type_aliases.mata"
include "`r(fn)'"
```

## Extracting samples

The `select*`

commands are useful for extracting samples:

`selectindex(mask)`

returns the indexes/observations for which mask!=0`select(data, mask)`

returns the corresponding rows of data instead (so`select(data, mask)==data[selectindex(mask), .]`

)

Note: For Stata 12 or older, you can create a local that replaces selectindex:

```
loc selectindex "selectindex(mask)"
if (c(stata_version) < 13) {
loc selectindex "select(1::rows(mask), mask)"
}
```

## More sample extraction

You can also combine indexes with masks (i.e. subscripting with selects):

```
// Create normal random vector and truncate it at zero
mata
data = rnormal(10, 1, 0, 1)
mask = (data :> 0) // Create 0|1 mask
idx = selectindex(mask) // Select rows where mask is active
data[idx] = J(rows(idx), 1, 0) // Set those rows to zero
data
end
```