Implementation details: The vexpr expression compiler

Release .zip Release .tar.gz Other versions

vexpr, the Tcl level compiler and execution engine

Using the commands in the numarray ensemble, it is possible to perform computations on numerical arrays in prefix form:

set a {1 2 3}
set c [numarray neg [numarray * 2 [numarray + $a {4 5 6}]]]

But most mathematics is actually better expressed in infix form; the above sequence of commands is equal to

vexpr {
	a={1 2 3}
	c=-2*(a+{4 5 6})

Of course, both forms can be freely mixed, there is no perceived distinction at the script level between variables accessed and modified within a vexpr or outside. Therefore vexpr can be used in inline form similar to expr

set c [vexpr {-2*(a+{4 5 6})}]

or as a self-contained command with a longer sequence of statements. The language is quite complete and supports not only simple assignments, but loops and conditions as well.

The expression handed over to vexpr is compiled into the prefix form using a compiler based on the small, but clever example compiler from the wiki by Donal Fellows. The resulting Tcl code is not evaluated directly, but put into a proc under the namespace numarray. For every variable that is accessed in the expression, the compiler emits an upvar instruction so as to access the variable in the callers scope.

Thus, the above vexpr is equivalent to

proc numarray::compiledexpressionXX {} {
	upvar 1 a a
	upvar 1 c c
	set a {1 2 3}
	set c [numarray::neg [numarray::* 2 [numarray::+ [set a] {4 5 6}]]]

where XX denotes a sequence number. vexpr maintains a cache of compiled expressions in a dict and compiles every expression only once. Repeated invocation of the same expression merely invokes the associated proc via uplevel.

Loops, branches and function calls are translated to their Tcl equivalent. The execution under the numarray namespace ensures that the commands contained there appear as built-in functions, but any Tcl proc can be called as a function. This ensures compatibility on the command level, without the need for special commands or syntax to extend the math language. As a performance improvement,vproc allows the creation of a Tcl proc entirely defined by a single vexprprogram. The only difference is that the compiler doesn’t emit upvar instructions for the local variables, and that the vexpr call itself is circumvented.

Comparison to expr

Since vexpr is a functional superset of expr, most expressions which are executed by expr should run within vexpr and produce the same results. Apart from the obvious differences that vexpr handles complex numbers and multi-component values, there are a few syntactic differences, which should briefly by explained in the following paragraphs.

The case with $

expr requires variable references to be prefixed with $, while vexpr expects that the variable names are bare words. This choice is mostly due to the belief of the VecTcl author, that the $ in expr is largely a historic relict from the times when expr was used with unbraced expressions and the Tcl interpreter had to substitute the string value for a variable. Today, passing an unbraced expression or multiple arguments to expr is considered bad style, and the $ is still required but technically not needed anymore in most cases. Since the $ is used to indicate value substitution, it would also be irritating in case of assignments. $a=$b as an assignment looks odd to the eye of the Tcl programmer (in contrast to, e.g., Perl or PHP). The $ could optionally be supported for the right hand side of expressions. This is currently not implemented, but will be useful especially to access variables with odd names. Currently, neither variables with spaces in it can be referenced in VecTcl nor array elements, since the array syntax collides with the syntax for calling a function. These cases could be disambiguated by allowing quoted syntax such as ${a(2)} to refer to an array element. The exact substitution rules inside such a braced expression is yet to be defined, such that both odd names containing spaces and metacharacters, as well as array elements with variable indices can be referred to. Further it needs to be defined, how variables can be accessed using this syntax on the left hand side of an assignment.

Command substitution using []

Inside expr, command substitution using the square brackes is used to call out Tcl procs. In VecTcl, no special syntax is needed to call a Tcl proc, because VecTcl makes no distinction between a math function and a Tcl proc. The special expr syntax is likewise believed to be a historic relict, coming from the way that command substitution is handled within Tcl. Inside the brackets usual Tcl substitution rules apply. That leads to the consequence, that computed arguments to such a function itself require another nested expr call, leading to verbose and hard to read code. For instance, to extract twice the element with index from a list using lindex

set te [expr {2*[lindex $l [expr {2*$n+1}]]}]

is needed, whereas using VecTcl, the same code can be compressed to

vexpr {te=2*lindex(l,2*n+1)}

Such code is not uncommon for index calculations. To remedy this issue, an internal treatment of simple offsets has been incorporated into lindex and similar commands. The issue can be solved in a more general way by allowing for any Tcl command to be part of a math expression.


expr supports arbitrary string values for operators which have a useful definition for general strings. For example, strings are allowed as the result of the ternary operator ?: or input to the comparison operators. Further, additional operators are defined to test if a string is contained within a given list. Inside literal double-quoted strings, again the full range of Tcl substitutions (including nested command substitution) is supported. VecTcl, on the other hand, does not support strings at the current stage. NumArrays can not be defined on lists of general strings as elemental values. The reason is that elemental values must be free of whitespace, otherwise the dimensionality of the values can’t be unambiguously derived from the string representation. This is a requirement of EIAS, and unless some additional restriction is posed onto the elemental values such as obligatory quoting, arrays of strings are impossible. Passing literal strings or variables containing general strings to functions, on the other hand, poses no such problems. The current implementation supports the latter string containing variables, as long as they are not accessed using NumArray functions or operators. The former embedded literal strings are currently not supported, but will make a good companion to enable calling commands with non-numerical arguments, most notably ensemble subcommands. It is open to discussion, how much of Tcls substitution syntax should be enabled within these strings, whether just literal strings should be allowed, or variable and command substitution as in expr.

Under the assumption that strings are implemented, VecTcl would provide a complete alternative syntax to the Tcl interpreter. For instance, this program shows how a dictionary could be unfolded into local variable names from within vexpr

vproc dict_inflate {d} {
	for key=dict("keys", d) {
		upvar("1", key, "var")
		var = dict("get", d, key)

Of course it is not recommended to program in this style, but such an exercise demonstrates, that the drafted language inside vexpr is quite complete and scalable. The programming style can vary smoothly from a style similar to expr, where only a single expression is passed and the result further processed by Tcl code, over multiple assignments and expressions with loops into a full-blown alternative scripting language, which only shares the runtime execution engine with Tcl. This scalability accounts for the fact that some code, notably heavily numerical code, is best expressed within an expression-oriented language as provided by VecTcl, whereas other pieces of code, most notably string processing, are best performed in a substitution-oriented language such as Tcl.

Read more about