Chapter Twelve Mixed Language Programming

Chapter Twelve Mixed Language Programming

12.1 Chapter Overview

Most assembly language code doesn't appear in a stand-alone assembly language program. Instead, most assembly code is actually part of a library package that programs written in a high level language wind up calling. Although HLA makes it really easy to write standalone assembly applications, at one point or another you'll probably want to call an HLA procedure from some code written in another language or you may want to call code written in another language from HLA. This chapter discusses the mechanisms for doing this in three languages: low-level assembly (i.e., MASM or Gas), C/C++, and Delphi/Kylix. The mechanisms for other languages are usually similar to one of these three, so the material in this chapter will still apply even if you're using some other high level language.

12.2 Mixing HLA and MASM/Gas Code in the Same Program

It may seem kind of weird to mix MASM or Gas and HLA code in the same program. After all, they're both assembly languages and almost anything you can do with MASM or Gas can be done in HLA. So why bother trying to mix the two in the same program? Well, there are three reasons:

You've already got a lot of code written in MASM or Gas and you don't want to convert it to HLA's syntax.
There are a few things MASM and Gas do that HLA cannot, and you happen to need to do one of those things.
Someone else has written some MASM or Gas code and they want to be able to call code you've written using HLA.

In this section, we'll discuss two ways to merge MASM/Gas and HLA code in the same program: via in-line assembly code and through linking object files.

12.2.1 In-Line (MASM/Gas) Assembly Code in Your HLA Programs

As you're probably aware, the HLA compiler doesn't actually produce machine code directly from your HLA source files. Instead, it first compiles the code to a MASM or Gas-compatible assembly language source file and then it calls MASM or Gas to assemble this code to object code. If you're interested in seeing the MASM or Gas output HLA produces, just edit the filename.ASM file that HLA creates after compiling your filename.HLA source file. The output assembly file isn't amazingly readable, but it is fairly easy to correlate the assembly output with the HLA source file.

HLA provides two mechanisms that let you inject raw MASM or Gas code directly into the output file it produces: the #ASM..#ENDASM sequence and the #EMIT statement. The #ASM..#ENDASM sequence copies all text between these two clauses directly to the assembly output file, e.g.,
#asm
 

 
	mov eax, 0       ;MASM/Gas syntax for MOV( 0, EAX );
 
	add eax, ebx     ; "     "     "  ADD( ebx, eax );
 

 
#endasm
 

 
The #ASM..#ENDASM sequence is how you inject in-line (MASM or Gas) assembly code into your HLA programs. For the most port there is very little need to use this feature, but in a few instances it is valuable. Note, when using Gas, that HLA specifies the ".intel_syntax" diretive, so you should use Intel syntax when supplying Gas code between #asm and #endasm.

For example, if you're writing structured exception handling code under Windows, you'll need to access the double word at address FS:[0] (offset zero in the segment pointed at by the 80x86's FS segment register). Unfortunately, HLA does not support segmentation and the use of segment registers. However, you can drop into MASM for a statement or two in order to access this value:
	#asm
 
		mov ebx, fs:[0]     ; Loads process pointer into EBX
 
	#endasm
 

 
At the end of this instruction sequence, EBX will contain the pointer to the process information structure that Windows maintains.

HLA blindly copies all text between the #ASM and #ENDASM clauses directly to the assembly output file. HLA does not check the syntax of this code or otherwise verify its correctness. If you introduce an error within this section of your program, the assembler will report the error when HLA assembles your code by calling MASM or Gas.

The #EMIT statement also writes text directly to the assembly output file. However, this statement does not simply copy the text from your source file to the output file; instead, this statement copies the value of a string (constant) expression to the output file. The syntax for this statement is as follows:
		#emit( string_expression );
 

 
This statement evaluates the expression and verifies that it's a string expression. Then it copies the string data to the output file. Like the #ASM/#ENDASM statement, the #EMIT statement does not check the syntax of the MASM statement it writes to the assembly file. If there is a syntax error, MASM or Gas will catch it later on when HLA assembles the output file.

When HLA compiles your programs into assembly language, it does not use the same symbols in the assembly language output file that you use in the HLA source files. There are several technical reasons for this, but the bottom line is this: you cannot easily reference your HLA identifiers in your in-line assembly code. The only exception to this rule are external identifiers. HLA external identifiers use the same name in the assembly file as in the HLA source file. Therefore, you can refer to external objects within your in-line assembly sequences or in the strings you output via #EMIT.

One advantage of the #EMIT statement is that it lets you construct MASM or Gas statements under (compile-time) program control. You can write an HLA compile-time program that generates a sequence of strings and emits them to the assembly file via the #EMIT statement. The compile-time program has access to the HLA symbol table; this means that you can extract the identifiers that HLA emits to the assembly file and use these directly, even if they aren't external objects.

The @StaticName compile-time function returns the name that HLA uses to refer to most static objects in your program. The following program demonstrates a simple use of this compile-time function to obtain the assembly name of an HLA procedure:
 
program emitDemo;
 
#include( "stdlib.hhf" )
 

 
    procedure myProc;
 
    begin myProc;
 

 
        stdout.put( "Inside MyProc" nl );
 

 
    end myProc;
 

 
begin emitDemo;
 

 
    ?stmt:string := "call " + @StaticName( myProc );
 
    #emit( stmt );
 

 
end emitDemo;
 
            
 

 
Program 12.1	 Using the @StaticName Function
 
This example creates a string value (stmt) that contains something like "call ?741_myProc" and emits this assembly instruction directly to the source file ("?741_myProc" is typical of the type of name mangling that HLA does to static names it writes to the output file). If you compile and run this program, it should display "Inside MyProc" and then quit. If you look at the assembly file that HLA emits, you will see that it has given the myProc procedure the same name it appends to the CALL instruction¹.

The @StaticName function is only valid for static symbols. This includes STATIC, READONLY, and STORAGE variables, procedures, and iterators. It does not include VAR objects, constants, macros, class iterators, or methods.

You can access VAR variables by using the [EBP+offset] addressing mode, specifying the offset of the desired local variable. You can use the @offset compile-time function to obtain the offset of a VAR object or a parameter. The following program demonstrates how to do this:
 
program offsetDemo;
 
#include( "stdlib.hhf" )
 

 
var
 
    i:int32;
 

 
begin offsetDemo;
 

 
    mov( -255, i );
 
    ?stmt := "mov eax, [ebp+(" + string( @offset( i )) + ")]";
 
    #print( "Emitting `", stmt, "`" )
 
    #emit( stmt );
 
    stdout.put( "eax = ", (type int32 eax), nl );
 

 
end offsetDemo;
 
            
 

 
Program 12.2	 Using the @Offset Compile-Time Function
 
This example emits the statement "mov eax, [ebp+(-8)]" to the assembly language source file. It turns out that -8 is the offset of the i variable in the offsetDemo program's activation record.

Of course, the examples of #EMIT up to this point have been somewhat ridiculous since you can achieve the same results by using HLA statements. One very useful purpose for the #emit statement, however, is to create some instructions that HLA does not support. For example, as of this writing HLA does not support the LES instruction because you can't really use it under most 32-bit operating systems. However, if you found a need for this instruction, you could easily write a macro to emit this instruction and appropriate operands to the assembly source file. Using the #EMIT statement gives you the ability to reference HLA objects, something you cannot do with the #ASM..#ENDASM sequence.

12.2.2 Linking MASM/Gas-Assembled Modules with HLA Modules

Although you can do some interesting things with HLA's in-line assembly statements, you'll probably never use them. Further, future versions of HLA may not even support these statements, so you should avoid them as much as possible even if you see a need for them. Of course, HLA does most of the stuff you'd want to do with the #ASM/#ENDASM and #EMIT statements anyway, so there is very little reason to use them at all. If you're going to combine MASM/Gas (or other assembler) code and HLA code together in a program, most of the time this will occur because you've got a module or library routine written in some other assembly language and you would like to take advantage of that code in your HLA programs. Rather than convert the other assembler's code to HLA, the easy solution is to simply assemble that other code to an object file and link it with your HLA programs.

Once you've compiled or assembled a source file to an object file, the routines in that module are callable from almost any machine code that can handle the routines' calling sequences. If you have an object file that contains a SQRT function, for example, it doesn't matter whether you compiled that function with HLA, MASM, TASM, NASM, Gas, or even a high level language; if it's object code and it exports the proper symbols, you can call it from your HLA program.

Compiling a module in MASM or Gas and linking that with your HLA program is little different than linking other HLA modules with your main HLA program. In the assembly source file you will have to export some symbols (using the PUBLIC directive in MASM or the .GLOBAL directive in Gas) and in your HLA program you've got to tell HLA that those symbols appear in a separate module (using the EXTERNAL option).

Since the two modules are written in assembly language, there is very little language imposed structure on the calling sequence and parameter passing mechanisms. If you're calling a function written in MASM or Gas from your HLA program, then all you've got to do is to make sure that your HLA program passes parameters in the same locations where the MASM/Gas function is expecting them.

About the only issue you've got to deal with is the case of identifiers in the two programs. By default, MASM and Gas are case insensitive. HLA, on the other hand, enforces case neutrality (which, essentially, means that it is case sensitive). If you're using MASM, there is a MASM command line option ("/Cp") that tells MASM to preserve case in all public symbols. It's a real good idea to use this option when assembling modules you're going to link with HLA so that MASM doesn't mess with the case of your identifiers during assembly.

Of course, since MASM and Gas process symbols in a case sensitive manner, it's possible to create two separate identifiers that are the same except for alphabetic case. HLA enforces case neutrality so it won't let you (directly) create two different identifiers that differ only in case. In general, this is such a bad programming practice that one would hope you never encounter it (and God forbid you actually do this yourself). However, if you inherit some MASM or Gas code written by a C hacker, it's quite possible the code uses this technique. The way around this problem is to use two separate identifiers in your HLA program and use the extended form of the EXTERNAL directive to provide the external names. For example, suppose that in MASM you have the following declarations:
			public  AVariable
 
			public  avariable
 
				.
 
				.
 
				.
 
			.data
 
AVariable			dword    ?
 
avariable			byte     ?
 

 
If you assemble this code with the "/Cp" or "/Cx" (total case sensitivity) command line options, MASM will emit these two external symbols for use by other modules. Of course, were you to attempt to define variables by these two names in an HLA program, HLA would complain about a duplicate symbol definition. However, you can connect two different HLA variables to these two identifiers using code like the following:
static
 
	AVariable: dword; external( "AVariable" );
 
	AnotherVar: byte; external( "avariable" );
 

 
HLA does not check the strings you supply as parameters to the EXTERNAL clause. Therefore, you can supply two names that are the same except for case and HLA will not complain. Note that when HLA calls MASM to assemble it's output file, HLA specifies the "/Cp" option that tells MASM to preserve case in public and global symbols. Of course, you would use this same technique in Gas if the Gas programmer has exported two symbols that are identical except for case.

The following program demonstrates how to call a MASM subroutine from an HLA main program:
 
// To compile this module and the attendant MASM file, use the following
 
// command line:
 
//
 
//      ml -c masmupper.masm
 
//      hla masmdemo1.hla masmupper.obj 
 
//
 
//  Sorry about no make file for this code, but these two files are in
 
//  the HLA Vol4/Ch12 subdirectory that has it's own makefile for building
 
//  all the source files in the directory and I wanted to avoid confusion.
 

 
program MasmDemo1;
 
#include( "stdlib.hhf" )
 

 
    // The following external declaration defines a function that
 
    // is written in MASM to convert the character in AL from
 
    // lower case to upper case.
 

 
    procedure masmUpperCase( c:char in al ); external( "masmUpperCase" );
 

 
static
 
    s: string := "Hello World!";
 

 
begin MasmDemo1;
 

 
    stdout.put( "String converted to uppercase: `" );
 
    mov( s, edi );
 
    while( mov( [edi], al ) <> #0 ) do
 

 
        masmUpperCase( al );
 
        stdout.putc( al );
 
        inc( edi );
 

 
    endwhile;
 
    stdout.put( "`" nl );
 

 

 
end MasmDemo1;
 
            
 

 
            
 
Program 12.3	 Main HLA Program to Link with a MASM Program
 
 
; MASM source file to accompany the MasmDemo1.HLA source
 
; file.  This code compiles to an object module that
 
; gets linked with an HLA main program.  The function
 
; below converts the character in AL to upper case if it
 
; is a lower case character.
 

 
        .586
 
        .model  flat, pascal
 

 
        .code
 
        public  masmUpperCase
 
masmUpperCase   proc    near32
 
        .if al >= 'a' && al <= 'z'
 
        and al, 5fh
 
        .endif
 
        ret
 
masmUpperCase   endp
 
        end
 
Program 12.4	 Calling a MASM Procedure from an HLA Program: MASM Module
 
It is also possible to call an HLA procedure from a MASM or Gas program (this should be obvious since HLA compiles its source code to an assembly source file and that assembly source file can call HLA procedures such as those found in the HLA Standard Library). There are a few restrictions when calling HLA code from some other language. First of all, you can't easily use HLA's exception handling facilities in the modules you call from other languages (including MASM or Gas). The HLA main program initializes the exception handling system; this initialization is probably not done by your non-HLA assembly programs. Further, the HLA main program exports a couple of important symbols needed by the exception handling subsystem; again, it's unlikely your non-HLA main assembly program provides these public symbols. In the volume on Advanced Procedures this text will discuss how to deal with HLA's Exception Handling subsystem. However, that topic is a little too advanced for this chapter. Until you get to the point you can write code in MASM or Gas to properly set up the HLA exception handling system, you should not execute any code that uses the TRY..ENDTRY, RAISE, or any other exception handling statements.

Warning; a large percentage of the HLA Standard Library routines include exception handling statements or call other routines that use exception handling statements. Unless you've set up the HLA exception handling subsystem properly, you should not call any HLA Standard Library routines from non-HLA programs.

Other than the issue of exception handling, calling HLA procedures from standard assembly code is really easy. All you've got to do is put an EXTERNAL prototype in the HLA code to make the symbol you wish to access public and then include an EXTERN (or EXTERNDEF) statement in the MASM/Gas source file to provide the linkage. Then just compile the two source files and link them together.

About the only issue you need concern yourself with when calling HLA procedures from assembly is the parameter passing mechanism. Of course, if you pass all your parameters in registers (the best place), then communication between the two languages is trivial. Just load the registers with the appropriate parameters in your MASM/Gas code and call the HLA procedure. Inside the HLA procedure, the parameter values will be sitting in the appropriate registers (sort of the converse of what happened in Program 12.4).

If you decide to pass parameters on the stack, note that HLA normally uses the PASCAL language calling model. Therefore, you push parameters on the stack in the order they appear in a parameter list (from left to right) and it is the called procedure's responsibility to remove the parameters from the stack. Note that you can specify the PASCAL calling convention for use with MASM's INVOKE statement using the ".model" directive, e.g.,
        .586
 
        .model  flat, pascal
 
			.
 
			.
 
			.
 

 
Of course, if you manually push the parameters on the stack yourself, then the specific language model doesn't really matter. Gas users, of course, don't have the INVOKE statement, so they have to manually push the parameters themselves anyway.

This section is not going to attempt to go into gory details about MASM or Gas syntax. There is an appendix in this text that contrasts the HLA language with MASM (and Gas when using the ".intel_syntax" directive); you should be able to get a rough idea of MASM/Gas syntax from that appendix if you're completely unfamiliar with these assemblers. Another alternative is to read a copy of the DOS/16-bit edition of this text that uses the MASM assembler. That text describes MASM syntax in much greater detail, albeit from a 16-bit perspective. Finally, this section isn't going to go into any further detail because, quite frankly, the need to call MASM or Gas code from HLA (or vice versa) just isn't that great. After all, most of the stuff you can do with MASM and Gas can be done directly in HLA so there really is little need to spend much more time on this subject. Better to move on to more important questions, like how do you call HLA routines from C or Pascal...

¹HLA may assign a different name that "?741_myProc" when you compile the program. The exact symbol HLA chooses varies from version to version of the assembler (it depends on the number of symbols defined prior to the definition of myProc. In this example, there were 741 static symbols defined in the HLA Standard Library before the definition of myProc.