[D] generate — | Create or change contents of variable |
Create new variable
generate
[type] [ newvar[:
lblname] =
exp [if] [in] [,
before(varname)
| after(varname)
]
Replace contents of existing variable
replace
oldvar =
exp [if] [in] [,
nopromote
]
Specify default storage type assigned to new variables
set
type
{float
|double
} [,
permanently
]
where type is one of
byte
|int
|long
|float
|double
|str
|str1
|str2
|...|str{ccl maxstrvarlen}
See Description below for an explanation of str
. For the other types, see [D] data types.
by
is allowed with generate
and replace
; see [D] by.
Data > Create or change data > Create new variable
Data > Create or change data > Change contents of variable
generate
creates a new variable. The values of the variable are specified by =
exp.
If no type is specified, the new variable type is determined by the type of result returned by =
exp. A float
variable (or a double
, according to set type
) is created if the result is numeric, and a string variable is created if the result is a string. In the latter case, if the string variable contains values greater than 2,045 characters or contains values with a binary 0 (\0
), a strL
variable is created. Otherwise, a str
# variable is created, where # is the smallest string that will hold the result.
If a type is specified, the result returned by =
exp must be string or numeric according to whether type is string or numeric. If str
is specified, a strL
or a str
# variable is created using the same rules as above.
See [D] egen for extensions to generate
.
replace
changes the contents of an existing variable. Because replace
alters data, the command cannot be abbreviated.
set type
specifies the default storage type assigned to new variables (such as those created by generate
) when the storage type is not explicitly specified.
before(varname)
or after(varname)
may be used with generate
to place the newly generated variable in a specific position within the dataset. These options are primarily used by the Data Editor and are of limited use in other contexts. A more popular alternative for most users is order.
nopromote
prevents replace
from promoting the variable type to accommodate the change. For instance, consider a variable stored as an integer type (byte
, int
, or long
), and assume that you replace
some values with nonintegers. By default, replace
changes the variable type to a floating point (float
or double
) and thus correctly stores the changed values. Similarly, replace
promotes byte
and int
variables to longer integers (int
and long
) if the replacement value is an integer but is too large in absolute value for the current storage type. replace
promotes strings to longer strings. nopromote
prevents replace
from doing this; instead, the replacement values are truncated to fit the current storage type.
permanently
specifies that, in addition to making the change right now, the new limit be remembered and become the default setting when you invoke Stata.
Setup
webuse genxmpl3
Create new variable age2
containing the values of age
squared
generate age2 = age^2
Setup
webuse genxmpl3, clear
Create variable age2
with a storage type of int
and containing the values of age
squared
generate int age2 = age^2
Setup
webuse genxmpl1, clear
Replace the values in age2
with those of age^2
replace age2 = age^2
Setup
webuse genxmpl2, clear
List the name
variable
list name
Create variable lastname
containing the second word of name
generate lastname = word(name,2)
Setup
webuse genxmpl3, clear
Create variable age2
with a storage type of int
and containing the values of age
squared for all observations for which age
is more than 30
generate int age2 = age^2 if age > 30
Setup
webuse genxmpl4, clear
Replace the value of odd
in the third observation
replace odd = 5 in 3
Setup
webuse stan2, clear
Create duplicate of every observation for which transplant
is true (!=0)
expand 2 if transplant
Sort observations into ascending order of id
sort id
Create variable posttran
, with storage type of byte
, equal to 1 for the second observation of each id
and equal to 0 otherwise
by id: generate byte posttran = (_n==2)
Create variable t1
equal to stime
for the last observation of id
by id: generate t1 = stime if _n==_N