Implementation details: NumArray Tcl_ObjType

Release .zip Release .tar.gz Other versions

Implementation details

VecTcl is implemented as a two-layered system. At the bottom lies a new Tcl_ObjType, NumArray, implemented in C together with a number of commands to manipulate it. The commands live in the namespace numarray and comprise elementary operations on numerical data like array shaping and slicing, elementwise binary operators, logical comparisons, matrix product operations, linear equation solving, elementary transcendental functions. On top of that, an expression compiler written in Tcl transforms a sequence of mathematical expressions from infix notation into nested function calls, which are then executed by Tcl.

The NumArray Tcl_ObjType

NumArray is a polymorphic datatype which represents an N-rank tensor of integer, floating point or complex floating point type. The internal representation consists of two data structures, pointed to by the twoPtrValue-fields in Tcl_Obj. The first structure NumArraySharedBuffer stores the data of the NumArray in a contiguous memory buffer in native machine representation, refcounted and shared between a NumArray and derived slices. The second structure NumArrayInfo contains the describing metadata

typedef struct  {
	NumArrayType type;
	int nDim;
	int bufsize;
	int offset;
	int canonical;
	int *dims;
	int *pitches;
} NumArrayInfo;

The important fields in this structure are the number of dimensions nDim, number of elements in each dimension dims, the offset of the first element into the buffer offset, the byte increments for advancing along each dimension pitches, and the data type of the stored data type.

This representation, a contiguous memory buffer together with byte increments, is the most suitable to do fast computations and to interface with existing libraries like BLAS and LAPACK. In the implemented form, it allows for a wide range of array manipulations without touching the data itself; for example, to select a subset of an array along one axis, merely the corresponding value in dims needs to be adjusted and the offset be recomputed, if the slice does not start from the front. Similarly, it is possible to select every second or third element in a dimension by increasing the pitch value, and negative pitches can be used to reverse an array. Transposition can likewise be achieved by swapping the metadata of two axes. In accordance with the value semantics of Tcl, VecTcl implements copy-on-write on these slices and creates shared buffers for all of the above transformations.

The other face of NumArray is the string representation. NumArray was designed to reconcile the list representation and the need of different dimensionality and numerical data types with EIAS. Contrary to a common misbelief, EIAS does not exclude the existence of data types; it merely requires that an unambiguous deserialization exists, such that a round trip from serialization/deserialization leads to a value that behaves the same as the original, possibly having the same internal representation. As an example, consider the standard expr. expr’s operators treat a string that parses as an integer as an integer value, else interpret it as a double, and if that fails, treat it as a non-numeric string error. In this way, every string can be unambiguously interpreted as being an integer, a floating point value or a general string, even though every integer parses correctly as a floating point value, too. NumArray enhances these data types with N-rank tensors and complex values. A NumArray can formally be described with reference to Tcl lists as follows:

A NumArray is one of the following, which are tried in this order:

  1. An empty list
  2. A list of values, which can all be parsed as integers
  3. A list of values, which can all be parsed as doubles
  4. A list of values, which can all be parsed as complex values
  5. A list of NumArrays which are not the empty list, all having the same number of elements and same data type

Using this grammar, the degree (i.e. number of dimensions), data type and number of elements of a NumArray can be unambiguously derived. Examples for the interpretation are given below

set x {1 2 3} ;# an integer vector of length 3
set y {{1.0 3.0} {3.0 5.0}} ;# a 2x2 floating-point matrix
set y {{1.0 3.0 5.0}} ;# a 1x3 floating-point matrix, i.e., a row vector
set z {0+1i 2+3.5i 3.0+0i} ;# a complex vector of length 3
set u {{1 2 3}} ;# a 1x3 integer matrix (a row vector)
set v {{{1 2} {3 4}} {{5 6} {7 8}}} ;# a 2x2x2 integer tensor

set e {1.0 2 3} ;# a floating point vector of length 3
set e1 {1.0 2 3a} ;# error: 3a can't be parsed as a number
set e2 {{1 2} 3 4} ;# error: Dimensions don't match

One peculiarity, which comes from the Tcl list representation, needs further consideration: a value with no spaces in it can alternatively interpreted as a list with a single value. And due to EIAS, a string containing a space is indistinguishable from a list consisting of two elements. This has two consequences, first single-element values (or scalars) may not contain spaces in their string representation. Otherwise, the following two NumArrays could’nt be disambiguated:

set c1 {3.0+4.0i} ;# a complex number 
 # with real part 3.0 and imaginary part 4.0
set c2 {3.0 +4.0i} ;# a complex vector of length 2,
 # equal to {3.0+0.0i 0.0+4.0i}

Second, trailing singleton dimensions must be insignificant, i.e. a Nx1 matrix is identical to a vector of length N, a scalar is identical to a vector of length 1 and a 1x1 matrix. Fortunately, this coincides with the usual linear algebra interpretation of these entities. For two-dimensional objects, a human-understable representation can be printed using

puts [join $thing \n]

Thus, a vector is interpreted as a column vector, while a row vector must be represented as a 1xN matrix.

Following this encoding, element retrieval from NumArrays can be done using lindex as well as with the indexing operator of vexpr:

set A {{1 2} {3 4} {5 6}} 
set a_11 [lindex $A 2 0] ;# 5
vexpr { a_11=A[2,0] } ;# 5

For incomplete indices, a slice is returned

set A {{1 2} {3 4} {5 6}} 
set a_1 [lindex $A 2] ;# {5 6}
vexpr { a_1=A[2] } ;# should also be {5 6}
 # doesn't work currently, BUG

Of course, lindex causes shimmering if the Tcl_Obj has an internal NumArray representation, which is particularly inefficient if the list patch is not applied (see the section on performance).

Read more about