Optimization tips

Performance tip: know the limitations of the Mata compiler; it is not as sophisticate as a C++ compiler.

Example 1: don’t evaluate rows(x) in loops

for (i=1; i<=rows(x); i++) {
// ...
}

Defining n = rows(n) and then replacing rows(x) by n inside the loop reduces the execution time of the loop by 25%.

(do-file)

Example 2: unroll loops

If you have

y = 1
// ...
for (i=1; i<=n; i++) {
// Lots of code
if (y == 1) {
// Some code when y==1
}
else {
// Some code when y==2
}
}

Then it would be faster if you avoid evaluating the condition for every iteration of the loop, even if that makes the codebase larger:

y = 1
// ...
if (y == 1)
for (i=1; i<=n; i++) {
// Lots of code
// Some code when y==1
}
}
else {
for (i=1; i<=n; i++) {
// Lots of code
// Some code when y==2
}
}

Example 3: In a conditional within a loop, evaluate more likely cases first

If you have if else conditions in a loop, put the cases that happen more often first. EG:

for (i=1; i<=n; i++) {
if (unlikely_event) {
// ...
}
else if (more_likely) {
y = y + 100
}
else { // very likely
y = y + 10
}
}

If you move the unlikely events to the bottom, you will have less evaluations on average per loop.

(This do-file shows a 25% time savings)

Example 4: Whenever possible, use x[i] instead of x[i,j] and x[i,.]

If a matrix is often a vector, dealing with these cases separately will likely be much faster. Moreover, even x[i, .] is slower than x[i] even if your matrix is a vector.

(This do-file shows a 33% time savings!)

Example 5: On very hot loops, decrement instead of increment

Suppose we want to compute the sum of a vector and forgot that the sum() function exists. The standard approach is to do

ans = 0
n = rows(x)
for (i=1; i<=n; i++) {
ans = ans + x[i]
}

However, this equivalent code below takes only 80% of the time:

ans = 0
n = rows(x)
i = n + 1
while (--i) {
ans = ans + x[i]
}

See this do-file for a quick benchmark, and this post for an explanation.

But don’t forget that sum(x) takes only 2% of the time of the for-loop.

Example 6: Dont’ use select(data, rowmask) to select certain columns

Instead, use normal indexing with selectindex, which is up to 3x faster:

// Create data
x = J(5000000, 20, 1)
z1 = select(x, mask) // 3x slower
z2 = x[., selectindex(mask)] // Faster alternative
assert(all(z1 :== z2)) // Both give equal output

Example 7: use masks to get the list and count of unique IDs in a vector

This is a very neat trick that I first learned from Andrew Maurer and Aljar Meesters

Suppose you have a vector of IDs, and these IDs are positive integers that don’t have huge values. In Stata, you would use contract or levelsof to get the list of unique IDs and distinct to get the number of unique IDs. This Mata trick is instead much faster:

// Create data (1mm obs and up to 100k ids)
ids = ceil(runiform(1e6, 1)*1e5)

// Implement -distinct- for integers
sum(mask) // number of distinct ids; to view the IDs just do "selectindex(mask)"

Given a vector of size N with integers between 0 and K, this trick has two steps:

1. Create an Nx1 zero vector “mask”
2. Use the input vector to select items of the mask and assign them the value of 1 (see help m1_permutation). We can then use either the sum() function to count the 1s, or the selectindex() function to retrieve their indices (i.e. the unique values).

As long as the IDs take values below one million, this will be much faster than all other implementations that I know of (distinct, gdistinct, etc.). Also, for more information on how this work, you can also look at a similar tool in numpy. Moreover, the trick is so common with large datasets that is implemented in ftools:

import ftools.mata, adopath // ssc install ftools
mata: mask = create_mask(max(ids), 0, ids, 1) // create a mask of size max(ids), filled 1/0 based on ids.

Finally, you can do more advanced tricks, such as Aljar’s vec_inlist() in the link above (which can be used to do fast merges across large datasets), as well as the code below, which can be used as a faster version of panelsetup() (and uniqrows() when the vector is already sorted):

// Input: sorted list of IDs
x=sort((5,4,1,1,4,4,6)',1)
n = rows(x)

start_pos = end_pos = J(n, 1, 0)
start_pos[x[n::1]] = n::1
end_pos[x] = 1::n

start_pos, end_pos
levels = selectindex(start_pos)
counts = end_pos[levels] - start_pos[levels] :+ 1

// Same results
levels, counts
uniqrows(x, 1)

// Same results
start_pos[levels], end_pos[levels]
panelsetup(x, 1)

As a benchmark, when used with a vector of 5mm IDs between 1 and 100,000, this approach took 0.2s, while uniqrows() took 5.8s and panelsetup took 0.9s.

Expanding data (similar to Stata’s expand)

It’s easy to expand a matrix:

expanded_data = data[index]

Example:

mata
data = (10, 20, 30)'
index = (1, 1, 3, 1, 2, 3)'
data[index]
end

Complex example:

sysuse auto
collapse (sum) price, by(foreign)
mata: data = st_data("price")
sysuse auto, clear
mata: F = factor("foreign")
mata: st_store(., "sumprice", data[F.levels])

Note that we need a few things for the complex example:

• A data vector or matrix that is already sorted by the key variable (e.g. foreign)
• data must not have gaps (so if the master dataset has foreign = {1,2,3,4,5}, and using only has foreign = {2,4}, then we must add an empty row at foreign={1,3}, but not foreign=5)
• We need to encode the master dataset so the key variable takes values {1,2,3,…}. We can use ftools for that.
• However, the factor object must have all the factors that data uses. If not, then if using has foreign={1,2,3} and master has {2,3}, ftools would erroneously recode master into {1,2}.

Type names

I include type aliases at the beginning of every Mata file. EG:

findfile "ftools_type_aliases.mata"
include "`r(fn)'"

Extracting samples

The select* commands are useful for extracting samples:

Note: For Stata 12 or older, you can create a local that replaces selectindex:

if (c(stata_version) < 13) {
}

More sample extraction

You can also combine indexes with masks (i.e. subscripting with selects):

// Create normal random vector and truncate it at zero
mata
data = rnormal(10, 1, 0, 1)