Character Set Routines
----------------------

The character set routines let you deal with groups of characters as a set
rather than a string.  A set is an unordered collection of objects where
membership (presence or absence) is the only important quality.  The stdlib
set routines were designed to let you quickly check if an ASCII character is
in a set, to quickly add characters to a set or remove characters from a set.
These operations are the ones most commonly used on character sets.  The
other operations (like union, intersection, difference, etc.) are useful, but
are not as popular as the former routines.  Therefore, the data structure
has been optimized for sets to handle the membership and add/delete operations
at the slight expense of the others.

Character sets are implemented via bit vectors.  A "1" bit means that an item
is present in the set and a "0" bit means that the item is absent from the
set.  The most common implementation of a character set is to use thirty-two 
consecutive bytes, eight bytes per, giving 256 bits (one bit for each char-
acter in the character set).  While this makes certain operations (like 
assignment, union, intersection, etc.) fast and convenient, other operations
(membership, add/remove items) run much slower.  Since these are the more 
important operations, a different data structure is used to represent sets.
A faster approach is to simply use a byte value for each item in the set.  
This offers a major advantage over the thirty-two bit scheme:  for operations 
like membership it is very fast (since all you have got to do is index into 
an array and test the resulting value).  It has two drawbacks:  first, oper-
ations like set assignment, union, difference, etc., require 256 operations 
rather than thirty-two; second, it takes eight times as much memory.

The first drawback, speed, is of little consequence.  You will rarely use the
the operations so affected, so the fact that they run a little slower will be
of little consequence.  Wasting 224 bytes is a problem, however.  Especially
if you have a lot of character sets.

The approach used here is to allocate 272 bytes.  The first eight bytes con-
tain bit masks, 1, 2, 4, 8, 16, 32, 64, 128.  These masks tell you which bit
in the following 264 bytes is associated with the set.  This facilitates 
putting eight sets into 272 bytes (34 bytes per character set).  This provides
almost the speed of the 256-byte set with only a two byte overhead.  In the
stdlib.a file there is a macro that lets you define a group of character
sets:  set.  The macro is used as follows:

	set set1, set2, set3, ... , set8

You must supply between one and eight labels in the operand field.  These are
the names of the sets you want to create.  The set macro automatically 
attaches these labels to the appropriate mask bytes in the set.  The actual
bit patterns for the set begin eight bytes later (from each label).  There-
fore, the byte corresponding to chr(0) is staggered by one byte for each
set (which explains the other eight bytes needed above and beyond the 256 
required for the set).  When using the set manipulation routines, you should
always pass the address of the mask byte (i.e., the seg/offset of one of the 
labels above) to the particular set manipulation routine you are using. 
Passing the address of the structure created with the macro above will 
reference only the first set in the group.

Note that you can use the set operations for fast pattern matching appli-
cations.  The set membership operation for example, is much faster that the 
strspan routine found in the string package.  Proper use of character sets
can produce a program which runs much faster than some of the equivalent
string operations.


Note: there is a special include file in the INCLUDE directory, STDSETS.A,
which contains the bit definitions for eight commonly-used character sets:
Alpha (upper and lower case alphabetics), lower (lower case alphabetics),
upper (upper case alphabetics), digits ("0".."9"), xdigits (hexadecimal
digits: "0"-"9", 'a'-'z', and 'A'-'Z'), alphanum (upper/lower case alpha
and digits), whitespace (spaces, tabs, carriage returns, and linefeeds),
and delimiters (whitespace plus ",", ";", "<", ">", and "|").

If you want to use this standard character set in your program you must
include the STDSETS.A file in an appropriate (data) segment.  Note that
including STDLIB.A or CHARSETS.A will not give the standard sets.  You must
explicitly place an include STDSETS.A in your program to have access to
these sets.


Routine:  Createsets
--------------------

Category:             Character Set Routine

Registers on Entry:   no parameters passed

Registers on return:  ES:DI - pointer to eight sets

Flags affected:       Carry = 0 if no error. Carry = 1 if insufficient
		      memory to allocate storage for sets.

Example of Usage:
		      Createsets
		      jc      NoMemory
		      mov     word ptr SetPtr,   di
		      mov     word ptr SetPtr+2, es

Description:  Createsets allocates 272 bytes on the heap.   This is sufficient
	      room for eight character sets.  It then initializes the first
	      eight bytes of this storage with the proper mask values for
	      each set.  Location es:0[di] gets set to 1, location es:1[di]
	      gets 2, location es:2[di] gets 4, etc.  The Createsets routine
	      also initializes all of the sets to the empty set by clearing
	      all the bits to zero.

Include:              stdlib.a or charsets.a


Routine:  EmptySet
------------------

Category:             Character Set Routine

Registers on Entry:   ES:DI - pointer to first byte of desired set

Registers on return:  None

Flags affected:	      None

Example of Usage:
		      les     di,  SetPtr
		      add     di,  3          ; Point at 4th set in group.
		      Emptyset


Description:  Emptyset clears out the bits in a character set to zero
	      (thereby setting it to the empty set).  Upon entry, es:di must
	      point at the first byte of the character set you want to clear.
	      Note that this is not the address returned by Createsets.  The
	      first eight bytes of a character set structure are the
	      addresses of eight different sets.  ES:DI must point at one of
	      these bytes upon entry into Emptyset.

Include:              stdlib.a or charsets.a


Routine:  Rangeset
------------------

Category:             Character Set Routine

Registers on entry:   ES:DI (contains the address of the first byte of the set)
		      AL    (contains the lower bound of the items)
		      AH    (contains the upper bound of the items)

Registers on return:  None

Flags affected:       None

Example of Usage:
		      lea di, SetPtr
		      add di, 4
		      mov al, 'A'
		      mov ah, 'Z'
		      rangeset


Description:  This routine adds a range of values to a set with ES:DI as the
	      pointer to the set, AL as the lower bound of the set, and
	      AH as the upper bound of the set (AH has to be greater than
	      AL, otherwise, there will an error).

Include:              stdlib.a or charsets.a


Routine:  Addstr (l)
--------------------

Category:             Character Set Routine

Registers on Entry:   ES:DI- pointer to first byte of desired set
		      DX:SI- pointer to string to add to set (Addstr only)
		      CS:RET-pointer to string to add to set (Addstrl only)

Registers on Return:  None

Flags Affected:       None

Example of Usage:
		      les     di, SetPtr
		      add     di, 1           ;Point at 2nd set in group.
		      mov     dx, seg CharStr ;Pointer to string
		      lea     si, CharStr     ; chars to add to set.
		      addstr                  ;Union in these characters.
;
		      les     di, SetPtr      ;Point at first set in group.
		      addstrl
		      db      "AaBbCcDdEeFf0123456789",0
;


Description:  Addstr lets you add a group of characters to a set by
	      specifying a string containing the characters you want in
	      the set.  To Addstr you pass a pointer to a zero-terminated
	      string in dx:si.  Addstr will add (union) each character
	      from this string into the set.

	      Addstrl works the same way except you pass the string as
	      a literal string constant in the code stream rather than
	      via ES:DI.

Include:              stdlib.a or charsets.a


Routine:  Rmvstr (l)
--------------------


Category:             Character Set Routine


Registers on entry:   ES:DI contains the address of first byte of a set
		      DX:SI contains the address of string to be removed
			     from a set (Rmvstr only)
		      CS:RET pointer to string to add to set (Rmvstrl only)


Registers on return:  None


Flags affected:       None


Example of Usage:
		      les 	di, SetPtr
		      mov 	dx, seg CharStr
		      lea 	si, CharStr
		      rmvstr

		      mov 	dx, seg CharStr
		      lea 	si, CharStr
		      rmvstrl
		      db      	"ABCDEFG",0


Description:  This routine is to remove a string from a set with ES:DI
	      pointing to its first byte, and DX:SI pointing to the
	      string to be removed from the set.

	      For Rmvstrl, the string of characters to remove from the
	      set follows the call in the code stream.

Include:              stdlib.a or charsets.a


Routine:  AddChar
-----------------

Category:             Character Set Routine

Registers on Entry:   ES:DI- pointer to first byte of desired set
		      AL- character to add to the set

Registers on Return:  None

Flags affected:       None

Example of Usage:
		      les     di, SetPtr
		      add     di, 1           ;Point at 2nd set in group.
		      mov     al, Ch2Add      ;Character to add to set.
		      addchar


Description:  AddChar lets you add a single character (passed in AL)
	      to a set.

Include:              stdlib.a or charsets.a


Routine:  Rmvchar
-----------------

Category:             Character Set Routine

Registers on entry:   ES:DI (contains the address of first byte of a set)
		      AL    (contains the character to be removed)

Registers on return:  None

Flags affected:	      None

Example of Usage:
		      lea di, SetPtr
		      add di, 7		;Point at eighth set in group.
		      mov al, Ch2Rmv
		      Rmvchar

Description:  This routine removes the character in AL from a set.
	      ES:SI points to the set's mask byte. The corresponding
	      bit in the set is cleared to zero.

Include:              stdlib.a or charsets.a


Routine:  Member
----------------

Category:             Character Set Routine

Registers on entry:   ES:DI (contains the address of first byte of a set)
		      AL    (contains the character to be compared)

Registers on return:  None

Flags affected:       Zero flag (Zero = 0 if the character is in the set
				 Zero = 1 if the character is not in the set)

Example of Usage:
		      les di, SetPtr
		      add di, 1
		      mov al, 'H'
		      member
		      jne IsInSet


Description:  Member is used to find out if the character in AL is in a set
	      with ES:DI pointing to its mask byte. If the character is in
	      the set, the zero flag is set to 0. If not, the zero flag is
	      set to one.

Include:              stdlib.a or charsets.a


Routine:  CopySet
-----------------

Category:            Character Set Routine

Register on entry:   ES:DI- pointer to first byte of destination set.
		     DX:SI- pointer to first byte of source set.

Register on Return:  None

Flags affected:      None

Example of Usage:
		     les     di, SetPtr
		     add     di, 7           ;Point at 8th set in group.
		     mov     dx, seg SetPtr2 ;Point at first set in group.
		     lea     si, SetPtr2
		     copyset


Description:  CopySet copies the items from one set to another.  This is a
	      straight assignment, not a union operation.  After the
	      operation, the destination set is identical to the source set,
	      both in terms of the element present in the set and absent
	      from the set.


Include:             stdlib.a or charsets.a


Routine:  SetUnion
------------------

Category:            Character Set Routine

Register on entry:   ES:DI - pointer to first byte of destination set.
		     DX:SI - pointer to first byte of source set.

Register on return:  None

Flags affected:      None

Example of Usage:    les   di, SetPtr
		     add   di, 7              ;point at 8th set in group.
		     mov   dx, seg SetPtr2    ;point at 1st set in group.
		     lea   si, sSetPtr2
		     unionset


Description:  The SetUnion routine computes the union of two sets.
	      That is, it adds all of the items present in a source set
	      to a destination set.  This operation preserves items
	      present in the destination set before the SetUnion
	      operation.

Include:             stdlib.a or charsets.a


Routine:  SetIntersect
----------------------

Category:            Character Set Routine

Register on entry:   ES:DI - pointer to first byte of destination set.
		     DX:SI - pointer to first byte of source set.

Register on return:  None

Flags affected:      None

Example of Usage:
		     les   di, SetPtr
		     add   di, 7              ;point at 8th set in group.
		     mov   dx, seg SetPtr2    ;point at 1st set in group.
		     lea   si, SetPtr2
		     setintersect

Description:  SetIntersect computes the intersection of two sets, leaving
	      the result in the destination set.  The new set consists
	      only of those items which previously appeared in
	      both the source and destination sets.

Include:             stdlib.a or charsets.a


Routine:  SetDifference
-----------------------

Category:            Character Set Routine

Register on entry:   ES:DI - pointer to the first byte of destination set.
		     DX:SI - pointer to the first byte of the source set.

Register on return:  None

Flags affected:      None

Example of Usage:
		     les   di, SetPtr
		     add   di, 7               ;point at 8th set in group.
		     mov   dx, seg SetPtr2     ;point at 1st set in group.
		     lea   si, SetPtr2
		     setdifference


Description:  SetDifference computes the result of (ES:DI) := (ES:DI) -
	      (DX:SI).  The destination set is left with its original
	      items minus those items which are also in the source set.

Include:             stdlib.a or charsets.a


Routine:  Nextitem
------------------

Category:             Character Set Routine

Registers on entry:   ES:DI (contains the address of first byte of the set)

Registers on return:  AL (contains the first item in the set)

Flags affected:       None

Example of Usage:
		      les di, SetPtr
		      add di, 7		;Point at eighth set in group.
		      nextitem


Description:  Nextitem is the routine to search the first character (item)
	      in the set with ES:DI pointing to its mask byte. AL will
	      return the character in the set. If the set is empty, AL
	      will contain zero.

Include:              stdlib.a or charsets.a


Routine:  Rmvitem
-----------------

Category:             Character Set Routine

Registers on entry:   ES:DI (contains the address fo first byte of the set)

Registers on return:  AL (contains the first item in the set)

Flags affected:       None

Example of Usage:
		      les di, SetPtr
		      add di, 7
		      rmvitem

Description:  Rmvitem locates the first available item in the set and
	      removes it with ES:DI pointing to its mask byte. AL will
	      return the item removed. If the set is empty, AL will
	      return zero.

Include:              stdlib.a or charsets.a


