## Optimization tips

Performance tip: know the limitations of the Mata compiler; it is not as sophisticate as a C++ compiler.

### Example 1: don’t evaluate `rows(x)` in loops

``````    for (i=1; i<=rows(x); i++) {
// ...
}
``````

Defining `n = rows(n)` and then replacing `rows(x)` by `n` inside the loop reduces the execution time of the loop by 25%.

(do-file)

### Example 2: unroll loops

If you have

``````    y = 1
// ...
for (i=1; i<=n; i++) {
// Lots of code
if (y == 1) {
// Some code when y==1
}
else {
// Some code when y==2
}
}
``````

Then it would be faster if you avoid evaluating the condition for every iteration of the loop, even if that makes the codebase larger:

``````    y = 1
// ...
if (y == 1)
for (i=1; i<=n; i++) {
// Lots of code
// Some code when y==1
}
}
else {
for (i=1; i<=n; i++) {
// Lots of code
// Some code when y==2
}
}
``````

### Example 3: In a conditional within a loop, evaluate more likely cases first

If you have `if else` conditions in a loop, put the cases that happen more often first. EG:

``````  for (i=1; i<=n; i++) {
if (unlikely_event) {
// ...
}
else if (more_likely) {
y = y + 100
}
else { // very likely
y = y + 10
}
}
``````

If you move the unlikely events to the bottom, you will have less evaluations on average per loop.

(This do-file shows a 25% time savings)

### Example 4: Whenever possible, use `x[i]` instead of `x[i,j]` and `x[i,.]`

If a matrix is often a vector, dealing with these cases separately will likely be much faster. Moreover, even `x[i, .]` is slower than `x[i]` even if your matrix is a vector.

(This do-file shows a 33% time savings!)

### Example 5: On very hot loops, decrement instead of increment

Suppose we want to compute the sum of a vector and forgot that the `sum()` function exists. The standard approach is to do

``````  ans = 0
n = rows(x)
for (i=1; i<=n; i++) {
ans = ans + x[i]
}
``````

However, this equivalent code below takes only 80% of the time:

``````  ans = 0
n = rows(x)
i = n + 1
while (--i) {
ans = ans + x[i]
}
``````

See this do-file for a quick benchmark, and this post for an explanation.

But don’t forget that `sum(x)` takes only 2% of the time of the for-loop.

### Example 6: Dont’ use `select(data, rowmask)` to select certain columns

Instead, use normal indexing with selectindex, which is up to 3x faster:

``````  // Create data
x = J(5000000, 20, 1)
z1 = select(x, mask) // 3x slower
z2 = x[., selectindex(mask)] // Faster alternative
assert(all(z1 :== z2)) // Both give equal output
``````

### Example 7: use masks to get the list and count of unique IDs in a vector

This is a very neat trick that I first learned from Andrew Maurer and Aljar Meesters

Suppose you have a vector of IDs, and these IDs are positive integers that don’t have huge values. In Stata, you would use `contract` or `levelsof` to get the list of unique IDs and `distinct` to get the number of unique IDs. This Mata trick is instead much faster:

``````// Create data (1mm obs and up to 100k ids)
ids = ceil(runiform(1e6, 1)*1e5)

// Implement -distinct- for integers
sum(mask) // number of distinct ids; to view the IDs just do "selectindex(mask)"
``````

Given a vector of size N with integers between 0 and K, this trick has two steps:

1. Create an Nx1 zero vector “mask”
2. Use the input vector to select items of the mask and assign them the value of 1 (see `help m1_permutation`). We can then use either the `sum()` function to count the 1s, or the `selectindex()` function to retrieve their indices (i.e. the unique values).

As long as the IDs take values below one million, this will be much faster than all other implementations that I know of (distinct, gdistinct, etc.). Also, for more information on how this work, you can also look at a similar tool in numpy. Moreover, the trick is so common with large datasets that is implemented in ftools:

``````import ftools.mata, adopath // ssc install ftools
mata: mask = create_mask(max(ids), 0, ids, 1) // create a mask of size max(ids), filled 1/0 based on ids.
``````

Finally, you can do more advanced tricks, such as Aljar’s `vec_inlist()` in the link above (which can be used to do fast merges across large datasets), as well as the code below, which can be used as a faster version of `panelsetup()` (and `uniqrows()` when the vector is already sorted):

``````// Input: sorted list of IDs
x=sort((5,4,1,1,4,4,6)',1)
n = rows(x)

start_pos = end_pos = J(n, 1, 0)
start_pos[x[n::1]] = n::1
end_pos[x] = 1::n

start_pos, end_pos
levels = selectindex(start_pos)
counts = end_pos[levels] - start_pos[levels] :+ 1

// Same results
levels, counts
uniqrows(x, 1)

// Same results
start_pos[levels], end_pos[levels]
panelsetup(x, 1)
``````

As a benchmark, when used with a vector of 5mm IDs between 1 and 100,000, this approach took 0.2s, while uniqrows() took 5.8s and panelsetup took 0.9s.

## Expanding data (similar to Stata’s `expand`)

It’s easy to expand a matrix:

``````expanded_data = data[index]
``````

Example:

``````mata
data = (10, 20, 30)'
index = (1, 1, 3, 1, 2, 3)'
data[index]
end
``````

Complex example:

``````sysuse auto
collapse (sum) price, by(foreign)
mata: data = st_data("price")
sysuse auto, clear
mata: F = factor("foreign")
mata: st_store(., "sumprice", data[F.levels])
``````

Note that we need a few things for the complex example:

• A `data` vector or matrix that is already sorted by the key variable (e.g. foreign)
• `data` must not have gaps (so if the master dataset has foreign = {1,2,3,4,5}, and using only has foreign = {2,4}, then we must add an empty row at foreign={1,3}, but not foreign=5)
• We need to encode the master dataset so the key variable takes values {1,2,3,…}. We can use `ftools` for that.
• However, the factor object must have all the factors that `data` uses. If not, then if using has foreign={1,2,3} and master has {2,3}, `ftools` would erroneously recode master into {1,2}.

## Type names

I include type aliases at the beginning of every Mata file. EG:

``````findfile "ftools_type_aliases.mata"
include "`r(fn)'"
``````

## Extracting samples

The `select*` commands are useful for extracting samples:

• `selectindex(mask)` returns the indexes/observations for which mask!=0
• `select(data, mask)` returns the corresponding rows of data instead (so `select(data, mask)==data[selectindex(mask), .]`)

Note: For Stata 12 or older, you can create a local that replaces selectindex:

``````loc selectindex "selectindex(mask)"
if (c(stata_version) < 13) {
}
``````

## More sample extraction

You can also combine indexes with masks (i.e. subscripting with selects):

``````// Create normal random vector and truncate it at zero
mata
data = rnormal(10, 1, 0, 1)