The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Eight (Part 4)

Table of Content

Chapter Eight (Part 6)

CHAPTER EIGHT:
MASM: DIRECTIVES & PSEUDO-OPCODES (Part 5)
8.12.4 - Type Operators
8.12.5 - Operator Precedence

8.12.4 Type Operators

The "xxxx ptr" coercion operator is an example of a type operator. MASM expressions possess two major attributes: a value and a type. The arithmetic, logical, and relational operators change an expression's value. The type operators change its type. The previous section demonstrated how the ptr operator could change an expression's type. There are several additional type operators as well.

Type Operators
Operator Syntax Description
PTR byte ptr expr

word ptr expr

dword ptr expr

qword ptr expr

tbyte ptr expr

near ptr expr

far ptr expr
Coerce expr to point at a byte.

Coerce expr to point at a word.

Coerce expr to point at a dword.

Coerce expr to point at a qword.

Coerce expr to point at a tbyte.

Coerce expr to a near value.

Coerce expr to a far value.
short short expr expr must be within ±128 bytes of the current jmp instruction (typically a JMP instruction). This operator forces the JMP instruction to be two bytes long (if possible).
this this type Returns an expression of the specified type whose value is the current location counter.
seg seg label Returns the segment address portion of label.
offset offset label Returns the offset address portion of label.
.type type label Returns a byte that indicates whether this symbol is a variable, statement label, or structure name. Superceded by opattr.
opattr opattr label Returns a 16 bit value that gives information about label.
length length variable Returns the number of array elements for a single dimension array. If a multi-dimension array, this operator returns the number of elements for the first dimension.
lengthof lengthof variable Returns the number of items in array variable.
type type symbol Returns a expression whose type is the same as symbol and whose value is the size, in bytes, for the specified symbol.
size size variable Returns the number of bytes allocated for single dimension array variable. Useless for multi-dimension arrays. Superceded by sizeof.
sizeof sizeof variable Returns the size, in bytes, of array variable.
low low expr Returns the L.O. byte of expr.
lowword lowword expr Returns the L.O. word of expr.
high high expr Returns the H.O. byte of expr.
highword highword expr Returns the H.O. word of expr.

The short operator works exclusively with the jmp instruction. Remember, there are two jmp direct near instructions, one that has a range of 128 bytes around the jmp, one that has a range of 32,768 bytes around the current instruction. MASM will automatically generate a short jump if the target address is up to 128 bytes before the current instruction. This operator is mainly present for compatibility with old MASM (pre-6.0) code.

The this operator forms an expression with the specified type whose value is the current location counter. The instruction mov bx, this word, for example, will load the bx register with the value 8B1Eh, the opcode for mov bx, memory. The address this word is the address of the opcode for this very instruction! You mostly use the this operator with the equ directive to give a symbol some type other than constant. For example, consider the following statement:

HERE            equ     this near

This statement assigns the current location counter value to HERE and sets the type of HERE to near. This, of course, could have been done much easier by simply placing the label HERE: on the line by itself. However, the this operator with the equ directive does have some useful applications, consider the following:

WArray          equ     this word
BArray          byte    200 dup (?)

In this example the symbol BArray is of type byte. Therefore, instructions accessing BArray must contain byte operands throughout. MASM would flag a mov ax, BArray+8 instruction as an error. However, using the symbol WArray lets you access the same exact memory locations (since WArray has the value of the location counter immediately before encountering the byte pseudo-opcode) so mov ax,WArray+8 accesses location BArray+8. Note that the following two instructions are identical:

                mov     ax, word ptr BArray+8
                mov     ax, WArray+8

The seg operator does two things. First, it extracts the segment portion of the specified address, second, it converts the type of the specified expression from address to constant. An instruction of the form mov ax, seg symbol always loads the accumulator with the constant corresponding to the segment portion of the address of symbol. If the symbol is the name of a segment, MASM will automatically substitute the paragraph address of the segment for the name. However, it is perfectly legal to use the seg operator as well. The following two statements are identical if dseg is the name of a segment:

                mov     ax, dseg
                mov     ax, seg dseg

Offset works like seg, except it returns the offset portion of the specified expression rather than the segment portion. If VAR1 is a word variable, mov ax, VAR1 will always load the two bytes at the address specified by VAR1 into the ax register. The mov ax, offset VAR1 instruction, on the other hand, loads the offset (address) of VAR1 into the ax register. Note that you can use the lea instruction or the mov instruction with the offset operator to load the address of a scalar variable into a 16 bit register. The following two instructions both load bx with the address of variable J:

                mov     bx, offset J
                lea     bx, J

The lea instruction is more flexible since you can specify any memory addressing mode, the offset operator only allows a single symbol (i.e., displacement only addressing). Most programmers use the mov form for scalar variables and the lea instructor for other addressing modes. This is because the mov instruction was faster on earlier processors.

One very common use for the seg and offset operators is to initialize a segment and pointer register with the segmented address of some object. For example, to load es:di with the address of SomeVar, you could use the following code:

                mov     di, seg SomeVar
                mov     es, di
                mov     di, offset SomeVar

Since you cannot load a constant directly into a segment register, the code above copies the segment portion of the address into di and then copies di into es before copying the offset into di. This code uses the di register to copy the segment portion of the address into es so that it will affect as few other registers as possible.

Opattr returns a 16 bit value providing specific information about the expression that follows it. The .type operator is an older version of opattr that returns the L.O. eight bits of this value. Each bit in the value of these operators has the following meaning:

OPATTR/.TYPE Return Value
Bit(s) Meaning
0 References a label in the code segment if set.
1 References a memory variable or relocatable data object if set.
2 Is an immediate (absolute/constant) value if set.
3 Uses direct memory addressing if set.
4 Is a register name, if set.
5 References no undefined symbols and there is no error, if set.
6 Is an SS: relative reference, if set.
7 References an external name.
8-10 000 - no language type

001 - C/C++ language type

010 - SYSCALL language type

011 - STDCALL language type

100 - Pascal language type

101 - FORTRAN language type

110 - BASIC language type

The language bits are for programmers writing code that interfaces with high level languages like C++ or Pascal. Such programs use the simplified segment directives and MASM's HLL features.

You would normally use these values with MASM's conditional assembly directives and macros. This allows you to generate different instruction sequences depending on the type of a macro parameter or the current assembly configuration. For more details, see "Conditional Assembly" and "Macros".

The size, sizeof, length, and lengthof operators compute the sizes of variables (including arrays) and return that size and their value. You shouldn't normally use size and length. The sizeof and lengthof operators have superceded these operators. Size and length do not always return reasonable values for arbitrary operands. MASM 6.x includes them to remain compatible with older versions of the assembler. However, you will see an example later in this chapter where you can use these operators.

The sizeof variable operator returns the number of bytes directly allocated to the specified variable. The following examples illustrate the point:

a1              byte    ?                       ;SIZEOF(a1) = 1
a2              word    ?                       ;SIZEOF(a2) = 2
a4              dword   ?                       ;SIZEOF(a4) = 4
a8              real8   ?                       ;SIZEOF(a8) = 8
ary0            byte    10 dup (0)              ;SIZEOF(ary0) = 10
ary1            word    10 dup (10 dup (0))     ;SIZEOF(ary1) = 200

You can also use the sizeof operator to compute the size, in bytes, of a structure or other data type. This is very useful for computing an index into an array using the formula from Chapter Four:

        Element_Address := base_address + index*Element_Size

You may obtain the element size of an array or structure using the sizeof operator. So if you have an array of structures, you can compute an index into the array as follows:

        .286                            ;Allow 80286 instructions.
s       struct
        <some number of fields>
s       ends

         .
         .
         .

array   s       16 dup ({})             ;An array of 16 "s" elements

         .
         .
         .

        imul    bx, I, sizeof s         ;Compute BX := I * elementsize
        mov     al, array[bx].fieldname

You can also apply the sizeof operator to other data types to obtain their size in bytes. For example, sizeof byte returns 1, sizeof word returns two, and sizeof dword returns 4. Of course, applying this operator to MASM's built-in data types is questionable since the size of those objects is fixed. However, if you create your own data types using typedef, it makes perfect sense to compute the size of the object using the sizeof operator:

integer         typedef word
Array           integer 16 dup (?)
                 .
                 .
                 .
                imul    bx, bx, sizeof integer
                 .
                 .
                 .

In the code above, sizeof integer would return two, just like sizeof word. However, if you change the typedef statement so that integer is a dword rather than a word, the sizeof integer operand would automatically change its value to four to reflect the new size of an integer.

The lengthof operator returns the total number of elements in an array. For the Array variable above, lengthof Array would return 16. If you have a two dimensional array, lengthof returns the total number of elements in that array.

When you use the lengthof and sizeof operators with arrays, you must keep in mind that it is possible for you to declare arrays in ways that MASM can misinterpret. For example, the following statements all declare arrays containing eight words:

A1              word    8 dup (?)

A2              word    1, 2, 3, 4, 5, 6, 7, 8

; Note: the "\" is a "line continuation" symbol. It tells MASM to append
;       the next line to the end of the current line.

A3              word    1, 2, 3, 4, \
                        5, 6, 7, 8

A4              word    1, 2, 3, 4
                word    5, 6, 7, 8

Applying the sizeof and lengthof operators to A1, A2, and A3 produces sixteen (sizeof) and eight (lengthof). However, sizeof(A4) produces eight and lengthof(A4) produces four. This happens because MASM thinks that the arrays begin and end with a single data declaration. Although the A4 declaration sets aside eight consecutive words, just like the other three declarations above, MASM thinks that the two word directives declare two separate arrays rather than a single array. So if you want to initialize the elements of a large array or a multidimensional array and you also want to be able to apply the lengthof and sizeof operators to that array, you should use A3's form of declaration rather than A4's.

The type operator returns a constant that is the number of bytes of the specified operand. For example, type(word) returns the value two. This revelation, by itself, isn't particularly interesting since the size and sizeof operators also return this value. However, when you use the type operator with the comparison operators (eq, ne, le, lt, gt, and ge), the comparison produces a true result only if the types of the operands are the same. Consider the following definitions:

Integer         typedef word
J               word    ?
K               sword   ?
L               integer ?
M               word    ?

                byte    type (J) eq word        ;value = 0FFh
                byte    type (J) eq sword       ;value = 0
                byte    type (J) eq type (L)    ;value = 0FFh
                byte    type (J) eq type (M)    ;value = 0FFh
                byte    type (L) eq integer     ;value = 0FFh
                byte    type (K) eq dword       ;value = 0

Since the code above typedef'd Integer to word, MASM treats integers and words as the same type. Note that with the exception of the last example above, the value on either side of the eq operator is two. Therefore, when using the comparison operations with the type operator, MASM compares more than just the value. Therefore, type and sizeof are not synonymous. E.g.,

                byte    type (J) eq type (K)            ;value = 0
                byte    (sizeof J) equ (sizeof K)       ;value = 0FFh

The type operator is especially useful when using MASM's conditional assembly directives. See "Conditional Assembly" for more details.

The examples above also demonstrate another interesting MASM feature. If you use a type name within an expression, MASM treats it as though you'd entered "type(name)" where name is a symbol of the given type. In particular, specifying a type name returns the size, in bytes, of an object of that type. Consider the following examples:

Integer         typedef word
s               struct
d               dword   ?
w               word    ?
b               byte    ?
s               ends

                byte    word            ;value = 2
                byte    sword           ;value = 2
                byte    byte            ;value = 1
                byte    dword           ;value = 4
                byte    s               ;value = 7
                byte    word eq word    ;value = 0FFh
                byte    word eq sword   ;value = 0
                byte    b eq dword      ;value = 0
                byte    s eq byte       ;value = 0
                byte    word eq Integer ;value = 0FFh

The high and low operators, like offset and seg, change the type of expression from whatever it was to a constant. These operators also affect the value of the expression - they decompose it into a high order byte and a low order byte. The high operator extracts bits eight through fifteen of the expression, the low operator extracts and returns bits zero through seven. Highword and lowword extract the H.O. and L.O. 16 bits of an expression:

You can extract bits 16-23 and 24-31 using expressions of the form low( highword( expr )) and high( highword( expr )), respectively.

8.12.5 Operator Precedence

Although you will rarely need to use a complex address expression employing more than two operands and a single operator, the need does arise on occasion. MASM supports a simple operator precedence convention based on the following rules:

Parentheses should only surround expressions. Some operators, like sizeof and lengthof, require type names, not expressions. They do not allow you to put parentheses around the name. Therefore, "(sizeof X)" is legal, but "sizeof(X)" is not. Keep this in mind when using parentheses to override operator precedence in an expression. If MASM generates an error, you may need to rearrange the parentheses in your expression.

As is true for expressions in a high level language, it is a good idea to always use parentheses to explicitly state the precedence in all complex address expressions (complex meaning that the expression has more than one operator). This generally makes the expression more readable and helps avoid precedence related bugs.

Chapter Eight (Part 4)

Table of Content

Chapter Eight (Part 6)

Chapter Eight: MASM: Directives & Pseudo-Opcodes (Part 5)
26 SEP 1996