Miscellaneous Mata Tips

It’s easy to expand a matrix:

expanded_data = data[index]

Example:

mata
    data = (10, 20, 30)'
    index = (1, 1, 3, 1, 2, 3)'
    data[index]
end

Complex example:

sysuse auto
collapse (sum) price, by(foreign)
mata: data = st_data("price")
sysuse auto, clear
mata: F = factor("foreign")
mata: st_store(., "sumprice", data[F.levels])

Note that we need a few things for the complex example:

  • A data vector or matrix that is already sorted by the key variable (e.g. foreign)
  • data must not have gaps (so if the master dataset has foreign = {1,2,3,4,5}, and using only has foreign = {2,4}, then we must add an empty row at foreign={1,3}, but not foreign=5)
  • We need to encode the master dataset so the key variable takes values {1,2,3,…}. We can use ftools for that.
  • However, the factor object must have all the factors that data uses. If not, then if using has foreign={1,2,3} and master has {2,3}, ftools would erroneously recode master into {1,2}.

I include type aliases at the beginning of every Mata file. EG:

findfile "ftools_type_aliases.mata"
include "`r(fn)'"

The select* commands are useful for extracting samples:

  • selectindex(mask) returns the indexes/observations for which mask!=0
  • select(data, mask) returns the corresponding rows of data instead (so select(data, mask)==data[selectindex(mask), .])

Note: For Stata 12 or older, you can create a local that replaces selectindex:

loc selectindex "selectindex(mask)"
if (c(stata_version) < 13) {
  loc selectindex "select(1::rows(mask), mask)"
}

You can also combine indexes with masks (i.e. subscripting with selects):

// Create normal random vector and truncate it at zero
mata
  data = rnormal(10, 1, 0, 1)
  mask = (data :> 0)  // Create 0|1 mask
  idx = selectindex(mask)  // Select rows where mask is active
  data[idx] = J(rows(idx), 1, 0)  // Set those rows to zero
  data
end

Performance tip: know the limitations of the Mata compiler; it is not as sophisticate as a C++ compiler.

Example 1: don’t evaluate rows(x) in loops

    for (i=1; i<=rows(x); i++) {
      // ...
    }

Defining n = rows(n) and then replacing rows(x) by n inside the loop reduces the execution time of the loop by 25%.

(do-file)

Example 2: unroll loops

If you have

    y = 1
    // ...
    for (i=1; i<=n; i++) {
      // Lots of code
      if (y == 1) {
        // Some code when y==1
      }
      else {
        // Some code when y==2
      }
    }

Then it would be faster if you avoid evaluating the condition for every iteration of the loop, even if that makes the codebase larger:

    y = 1
    // ...
    if (y == 1)
      for (i=1; i<=n; i++) {
        // Lots of code
        // Some code when y==1
      }
    }
    else {
      for (i=1; i<=n; i++) {
          // Lots of code
          // Some code when y==2
      }
    }

Example 3: In a conditional within a loop, evaluate more likely cases first

If you have if else conditions in a loop, put the cases that happen more often first. EG:

  for (i=1; i<=n; i++) {
    if (unlikely_event) {
      // ...
    }
    else if (more_likely) {
      y = y + 100
    }
    else { // very likely
      y = y + 10
    }
  }

If you move the unlikely events to the bottom, you will have less evaluations on average per loop.

(This do-file shows a 25% time savings)

Example 4: Whenever possible, use x[i] instead of x[i,j] and x[i,.]

If a matrix is often a vector, dealing with these cases separately will likely be much faster. Moreover, even x[i, .] is slower than x[i] even if your matrix is a vector.

(This do-file shows a 33% time savings!)

Example 5: On very hot loops, decrement instead of increment

Suppose we want to compute the sum of a vector and forgot that the sum() function exists. The standard approach is to do

  ans = 0
  n = rows(x)
  for (i=1; i<=n; i++) {
    ans = ans + x[i]
  }

However, this equivalent code below takes only 80% of the time:

  ans = 0
  n = rows(x)
  i = n + 1
  while (--i) {
    ans = ans + x[i]
  }

See this do-file for a quick benchmark, and this post for an explanation.

But don’t forget that sum(x) takes only *2% of the time of the for loop.