Optimization tips
Performance tip: know the limitations of the Mata compiler; it is not as sophisticate as a C++ compiler.
Example 1: don’t evaluate rows(x)
in loops
for (i=1; i<=rows(x); i++) {
// ...
}
Defining n = rows(n)
and then replacing rows(x)
by n
inside the loop reduces the execution time of the loop by 25%.
Example 2: unroll loops
If you have
y = 1
// ...
for (i=1; i<=n; i++) {
// Lots of code
if (y == 1) {
// Some code when y==1
}
else {
// Some code when y==2
}
}
Then it would be faster if you avoid evaluating the condition for every iteration of the loop, even if that makes the codebase larger:
y = 1
// ...
if (y == 1)
for (i=1; i<=n; i++) {
// Lots of code
// Some code when y==1
}
}
else {
for (i=1; i<=n; i++) {
// Lots of code
// Some code when y==2
}
}
Example 3: In a conditional within a loop, evaluate more likely cases first
If you have if else
conditions in a loop, put the cases that happen more often first. EG:
for (i=1; i<=n; i++) {
if (unlikely_event) {
// ...
}
else if (more_likely) {
y = y + 100
}
else { // very likely
y = y + 10
}
}
If you move the unlikely events to the bottom, you will have less evaluations on average per loop.
(This do-file shows a 25% time savings)
Example 4: Whenever possible, use x[i]
instead of x[i,j]
and x[i,.]
If a matrix is often a vector, dealing with these cases separately will likely be much faster. Moreover, even x[i, .]
is slower than x[i]
even if your matrix is a vector.
(This do-file shows a 33% time savings!)
Example 5: On very hot loops, decrement instead of increment
Suppose we want to compute the sum of a vector and forgot that the sum()
function exists. The standard approach is to do
ans = 0
n = rows(x)
for (i=1; i<=n; i++) {
ans = ans + x[i]
}
However, this equivalent code below takes only 80% of the time:
ans = 0
n = rows(x)
i = n + 1
while (--i) {
ans = ans + x[i]
}
See this do-file for a quick benchmark, and this post for an explanation.
But don’t forget that sum(x)
takes only 2% of the time of the for-loop.
Example 6: Dont’ use select(data, rowmask)
to select certain columns
Instead, use normal indexing with selectindex, which is up to 3x faster:
// Create data
x = J(5000000, 20, 1)
mask = 1,1,1,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0
z1 = select(x, mask) // 3x slower
z2 = x[., selectindex(mask)] // Faster alternative
assert(all(z1 :== z2)) // Both give equal output
Example 7: use masks to get the list and count of unique IDs in a vector
This is a very neat trick that I first learned from Andrew Maurer and Aljar Meesters
Suppose you have a vector of IDs, and these IDs are positive integers that don’t have huge values. In Stata, you would use contract
or levelsof
to get the list of unique IDs and distinct
to get the number of unique IDs. This Mata trick is instead much faster:
// Create data (1mm obs and up to 100k ids)
ids = ceil(runiform(1e6, 1)*1e5)
// Implement -distinct- for integers
mask = J(max(ids), 1, 0)
mask[ids] = J(rows(ids), 1, 1)
sum(mask) // number of distinct ids; to view the IDs just do "selectindex(mask)"
Given a vector of size N with integers between 0 and K, this trick has two steps:
- Create an Nx1 zero vector “mask”
- Use the input vector to select items of the mask and assign them the value of 1 (see
help m1_permutation
). We can then use either thesum()
function to count the 1s, or theselectindex()
function to retrieve their indices (i.e. the unique values).
As long as the IDs take values below one million, this will be much faster than all other implementations that I know of (distinct, gdistinct, etc.). Also, for more information on how this work, you can also look at a similar tool in numpy. Moreover, the trick is so common with large datasets that is implemented in ftools:
import ftools.mata, adopath // ssc install ftools
mata: mask = create_mask(max(ids), 0, ids, 1) // create a mask of size max(ids), filled 1/0 based on ids.
mata: sum(mask)
Finally, you can do more advanced tricks, such as Aljar’s vec_inlist()
in the link above (which can be used to do fast merges across large datasets), as well as the code below, which can be used as a faster version of panelsetup()
(and uniqrows()
when the vector is already sorted):
// Input: sorted list of IDs
x=sort((5,4,1,1,4,4,6)',1)
n = rows(x)
start_pos = end_pos = J(n, 1, 0)
start_pos[x[n::1]] = n::1
end_pos[x] = 1::n
start_pos, end_pos
levels = selectindex(start_pos)
counts = end_pos[levels] - start_pos[levels] :+ 1
// Same results
levels, counts
uniqrows(x, 1)
// Same results
start_pos[levels], end_pos[levels]
panelsetup(x, 1)
As a benchmark, when used with a vector of 5mm IDs between 1 and 100,000, this approach took 0.2s, while uniqrows() took 5.8s and panelsetup took 0.9s.
Expanding data (similar to Stata’s expand
)
It’s easy to expand a matrix:
expanded_data = data[index]
Example:
mata
data = (10, 20, 30)'
index = (1, 1, 3, 1, 2, 3)'
data[index]
end
Complex example:
sysuse auto
collapse (sum) price, by(foreign)
mata: data = st_data("price")
sysuse auto, clear
mata: F = factor("foreign")
mata: st_store(., "sumprice", data[F.levels])
Note that we need a few things for the complex example:
- A
data
vector or matrix that is already sorted by the key variable (e.g. foreign) data
must not have gaps (so if the master dataset has foreign = {1,2,3,4,5}, and using only has foreign = {2,4}, then we must add an empty row at foreign={1,3}, but not foreign=5)- We need to encode the master dataset so the key variable takes values {1,2,3,…}. We can use
ftools
for that. - However, the factor object must have all the factors that
data
uses. If not, then if using has foreign={1,2,3} and master has {2,3},ftools
would erroneously recode master into {1,2}.
Type names
I include type aliases at the beginning of every Mata file. EG:
findfile "ftools_type_aliases.mata"
include "`r(fn)'"
Extracting samples
The select*
commands are useful for extracting samples:
selectindex(mask)
returns the indexes/observations for which mask!=0select(data, mask)
returns the corresponding rows of data instead (soselect(data, mask)==data[selectindex(mask), .]
)
Note: For Stata 12 or older, you can create a local that replaces selectindex:
loc selectindex "selectindex(mask)"
if (c(stata_version) < 13) {
loc selectindex "select(1::rows(mask), mask)"
}
More sample extraction
You can also combine indexes with masks (i.e. subscripting with selects):
// Create normal random vector and truncate it at zero
mata
data = rnormal(10, 1, 0, 1)
mask = (data :> 0) // Create 0|1 mask
idx = selectindex(mask) // Select rows where mask is active
data[idx] = J(rows(idx), 1, 0) // Set those rows to zero
data
end