Assembler
instruction format
- A assembler language format has one or more number fields associated with it.
- The first one is operation field or opcode field and the other fields is known as operand field.
- There are six general formats of instruction.
- One byte instruction:
- One byte long and have the implied data or register operands.
- The least significant 3-bit is used specify register operanda.
- Register to register:
- Two byte long.
- 1stbyte specify operation code and width of the operand specified by w bit.
- 2nd byte shows register operands and R/M field
D7
D1 Do
OPCODE
|
W
|
- Register to/from memory with no displacement:
- Two byte long.
- Similar to register to register format except for the MOD field.
D7
D1
Do
OPCODE
|
W
|
- The MOD shows the mode addressing.
- Register to/from memory with displacement:
- This contains one or two additional byte for the displacement along with 2-byte the format of theregister to/from memory with no displacement.
- Immediate operand to register:
- The 1st byte as well as the 3-bits from the 2nd byte which are used for REG field in the case of register to register format are used for format
- It also contains one or two bytes of immediate data.
- Immediate operand to memory wih 16 bit displacement:
- Requires 5 or 6 bytes for coding.
- 1st 2 byte contain information regarding OPCODE,MOD and R/M fields.
- The remaining 4 byte contains 2 byte of displacement and 2 byte of data.
The
opcode have single bit indicators.
W-bit:
- It indicates to operand over an 8-bit or 16-bit data/operand.
- W=0,the bit is of 8 bit
- W=1,the operand is 16 bit
D-bit:
- Valid in case of double operand instruction.
- One operand must be a register specified by the REG field.
- The operand specified by REG is source operand if D=0, else it is a destination operand.
S-bit:
- Called as sign extension bit.
- S-bit is used along with W-bit to show the type of the operation.
8-bit
operation with 8-bit immediate operand is indicated by
S=0, W=0;
16-bit
operation with 16-bit immediate operand is indicated by
S=0, W=1
and
16-bit
operation with sign extended immediate data is indicated by
S=1, W=1.
V-bit:
- Used in shift and rotate instructions.
- Bit is set, if shift count is 1 and is set 1,if CL contains the shift count.
Z-bit:
- Used by REP instruction to control the loop.
- Z=1,the instruction with REP prefix is executed until the zero flag matches the Z-bit.
ASSEMBLER
DIRECTIVES AND OPERATORS
The
logical errors or other programming errors are not found by the
assembler. For completing all these tasks, an assembler needs some
hints from the programmer. These types of hints are given to the
assembler using some predefined alphabetical strings called assembler
directives, which helps the assembler to correctly understand the
assembly language program to prepare the codes.
Another
type of hint which helps the assembler to assign a particular
constant with a label or initialize particular memory locations or
labels with constants is an operator.
- DB: Define Byte: The DB directive is used to reserve byte or bytes of memory locations in the available memory.
- DW: Define Word: The DW directive serves the same purposes as the DB directive, but it makes the assembler reserves the number of memory words (16bit) instead of bytes.
- DQ: Define Quadword: This directive is used to direct the assembler to reserve 4 words (8bytes) of memory for the specified variable and may initialize it with the specified values.
- DT: Define Ten Bytes: The DT directive directs the assembler to define the specified variable requiring 10-bytes for its storage and initialize the 10-bytes with the specified values.
- ASSUME: Assume Logical Segment Name: The ASSUME directive is used to inform the assembler, the names of the logical segments to be assumed for different segments used in the program.
- END: END Of Program: The END directive marks the end of an assembly language program.
- ENDP: END Of Procedure: The ENDP directive is used to indicate the end of a procedure.
PROCEDURE STAR
:
STAR
ENDP
//indicates the end of procedure STAR
- ENDS: END Of Segment: This directive marks the end of a logical segment.
DATA
SEGMENT
:
DATA
ENDS
//indicates the end of segment DATA
- EVEN: Align On Even Memory Address: The EVEN directive updates the location counter to the next even address, if the current location counter contents are not even, and assigns the following routine or variable or constant to that address. If the content of the location counter is already even, then the procedure will be assigned with the same address.
- EQU: Equate: The directive EQU is used to assign a label with a value or symbol. The use of this directive is just to reduce the recurrence of the numerical values or constants in the program code.
LABEL
EQU 0500H
- EXTERN: External and PUBLIC: Public: The directive EXTERN informs the assembler that the names, procedures and labels declared after this directive have already been defined in some other assembly language module.
- GROUP: Group the Related Segments: This directive is used to form logical groups of segments with similar purpose or type.
PROGRAM GROUP CODE,
DATA, STACK //this
statement directs the loader/linker to prepare an EXE file such that
CODE, DATA, STACK segment must lie within a 64byte memory segment
that is named as PROGRAM
- LABEL: The LABEL directive is used to assign a name to the current content of the location counter. A LABEL directive can be used to make a FAR jump. The label directive can be used to refer to the data segment along with the data type, byte or word.
DATA
SEGMENT
DATAS
DB 50H DUP (?)
DATA-LAST
LABEL BYTE FAR
DATA
ENDS
After
reserving 50H locations for DATAS, the next location will be assigned
a label DATA-LAST and its type will be byte and far.
- LENGTH: Byte Length Of A Label: This directive is used to refer to the length of a data array or a string. Not available in MASM.
- LOCAL: The label, variables, constants or procedures declared LOCAL in a module are to be used only by that particular module.
- NAME: Logical Name Of A Module: The NAME directive is used to assign a name to an assembly language program module.
- OFFSET: Offset Of A Label: When the assembler comes across the OFFSET operator along with a label, it first computes the 16-bit displacement of the particular label, and replaces the string ‘OFFSET LABEL’ by the computed displacement.
- ORG: Origin: The ORG directive directs the assembler to start the memory allotment for the particular segment, block or code from the declared address in the ORG statement.
- PROC: Procedure: The PROC directive marks the start of a named procedure in the statement. Also the types FAR and NEAR specifies the type of the procedure.
- PTR: Pointer: The POINTER operator is used to declare the type of a label, variable or memory operand. The operator PTR is prefixed by either BYTE (8-bit quantity) or WORD (16-bit quantity).
- SEGMENT: Logical Segment: The SEGMENT directive marks the starting of a logical segment. The started segment is also assigned a name, i.e. label, by this statement.
- SHORT: The SHORT operator indicates the assembler that only one byte is required to code the displacement for a jump. This method of specifying jump address saves memory.
- TYPE: The TYPE operator directs the assembler to decide the data type of the specified label and replaces the TYPE label by the decided data type.
- GLOBAL: The labels, variables, constants or procedures declared GLOBAL may be used by other modules of the program.
- ‘+ & -‘Operators: These operators represent arithmetic addition and subtraction respectively. And are typically used to add or subtract displacements (8 or 16 bit) to base or index registers or stack or base pointers.
- FAR PTR: This directive indicates the assembler that the label following FAR PTR is not available within the same segment and the address of the bit is of 32 bits i.e. 2 bytes offset followed by 2 bytes.
- NEAR PTR: This directive indicates that the label following NEAR PTR is in the same segment and need only 16 bit i.e. 2 byte offset to address it. A NEAR PTR label is considered as default if a label is not preceded by NEAR PTR or FAR PTR.
The
Assembly Process
The
way an assembler is designed depends heavily on the internal
organization of the processor for which it is used. Architectural
features such as memory word size, number formats, internal character
codes, index registers, and general purpose registers affect the way
assemblers are written and the way an assembler handles instructions
and directives.
In
ALP, coding is done in symbolic language (called mnemonics). The
mnemonics specifies the operation to be done. The assembler
translates this symbolic language to machine code which the processor
understands. This is the main task of an assembler. It is an
important program development tool and should provide appropriate
error messages to guide the programmer.
Features
of Assembler
When
code is written in assembly language, it’s easy if we use labels
for memory locations. Managing and utilizing labels efficiently
reflects on the quality of an assembler.
Assemblers
that support macros are called Macro Assembler. A macro is a name
given to a sequence of instruction lines. Once a macro is defined,
it’s name may be used in place of these set of lines. Thus a macro
seems to be like a mnemonic, and can use it as frequently without
incurring any of the overheads that a procedure or function may
cause. The use of macros help to make programs structured.
Instructions
and Directives
An
ALP contains directives along with instructions. Instructions get
translated into machine codes, but directives do not. The directives
perform as ‘help’ for the assembler for deciding certain other
aspects of the assembly process. Thus the assembler takes our source
code (containing instructions and directives) and converts it into
object code which contains the machine code. The object file is the
one that is loaded into memory and executed after linking it with
other necessary files.
The
Forward Reference Problem
A
typical line in a program could be:
BEGIN:
MOV AX, COST ; move content of location COST to AX
In
this example, BEGIN is a label, which corresponds to the address at
which this line of code is stored. MOV is a mnemonic. COST is a label
that acts as one operand in this two operand instruction. A label
when defined is called a symbol. There will be a memory address
corresponding to all symbols. In this instruction the other operand
is AX. The text coming after ’;’ is a comment. All current
assemblers allow the use of labels and finally the assembler must
translate these labels into memory addresses. An important issue the
assembler faces is the ‘forward reference problem’. See the code
below:
BEGIN:
MOV AX, BX
JMP
AGAIN
-----------------
----------------
-
-
-
AGAIN:
---------------
As
the assembler reads the second line, an undefined label AGAIN is
encountered. The assembler has no knowledge about it as yet, but the
value of the symbol AGAIN is needed before it can be defined. This is
also called the future symbol problem or the problem of unresolved
references. This problem is resolved in various ways depending on the
type of the assembler.
Two
Pass Assembler
There
are many types of assemblers – one pass, two pass and multi pass
assemblers. Each pass optimizes the assembly process. Two pass
assemblers are commonly used. A pass implies a reading of the code
from beginning to end. A two pass assembler does two passes over the
source file (the second pass will be over a file generated in the
first pass). In the first pass it looks for label definitions and
inserts them in the symbol table after assigning them addresses.
Memory is allocated sequentially, and a location counter is
incremented at each step. At the end of this pass the symbol table
should contain definitions of all the labels used in the program.
In
pass 2, the actual translation of assembly code to machine code is
done. Errors are also reported after this. The assembly process
produces a ‘re-locatable’ object file. This object file can be
loaded anywhere in the memory, when the program is to be executed. A
‘loader’ is a program which does this.
STACKS
There
are many situations in which a program needs to temporarily store
information and then retrieve it in reverse order.on the 8086 the CX
register and the loop instruction can be conveniently used to provide
the counting, testing, and branching needed in a loop,but because
the loop instructions are designed to use only the CX register a
problem arises when loops are nested. However,if as shown in FIG.1
there were an efficient means of loop counters in order and then
restoring them by retrieving the last stored counter first, at least
part of the problem would be alleviated.
FIG
1:
Loop N counter |
:
:
|
Loop 2 counter |
Loop 1 counter |
Order
for storing counters in memoryorder for retrieving counters from
Memory
Stack
facilities normally involve the use of indirect addressing through a
special register,the stack pointer,that is automatically decremented
as items are put on the stack and incremented as they are retrieved.
Putting something on the stack is called push and taking it off is
called a pop. The address of the last element pushed on to the stack
is called top of the stack(TOS).
The
8086 instructions for directly pushing and popping the stack are
given in fig
2.
Only words can be pushed or popped and data cannot be immediate, but
the PUSH and POP instruction can use all the other addressing modes.
Only the POPF instruction, which pops the top of the stack into the
psw, affects the flags.
FIG
2:
Name
Mnemonics and Format
Push
onto the stack
PUSH SRC
Pop
from the stack
POP DST
Push
the flags
PUSHF
Pop
the flags
POPF
|
STACK_SEG
SEGMENT
DW
30 DUP (?)
TOS
LABEL WORD
STACK_SEG
ENDS
CODE_SEG
SEGMENT
ASSUME
CS:CODE_SEG, SS:STACK_SEG
START:
MOV AX, STACK_SEG
MOV
SS, AX ;INITIALIZES SS
MOV
SP, OFFSET TOS ;INITIALIZES SP
:
:
CODE_SEG
ENDS |
On
the 8086, the physical stack address is obtained from both (SP) and
(SS) or (BP) and (SS), with SP being the implied stack pointer
register for all push and pop operations and SS being the stack
segment register. The (SS) are the lowest address in (i.e., limit of)
the stack area and may be referred to as base of the stack. The
original contents of the SP are considered to be the largest offset
the stack should attain. Therefore the stack is considered to occupy
the memory locations from 16 times to 16 times (SS) to 16 times (SS)
plus the original (SP). The location of the stack must, of course, be
initially set by software, which may be a code sequence in either the
operating system or user’s program. This is done by loading the
(SS) and (SP) registers as shown in FIG
3.
FIG
3:
The
BP register is primarily for allowing random access to the stack. For
ex:
MOV
AX, [BP][SI]
Will
load AX from a location in the stack segment, the offset of the
location being the sum of (BP) and (SI). On the other hand, for the
instruction
MOV
AX, [BX][SI]
The
source operand is from the data segment.
Stack
facilities are more efficient than ordinary memory in two ways. The
PUSH and POP instructions are shorter because one of the operand is
indirectly addressed through the SP register, and the (SP) are
automatically incremented or decremented to point to a new address.
For ex: the registers CX and DX could be stored in memory beginning
at SAVE by the following instructions, which occuppy a total of 8
bytes:
MOV
SAVE, CX
MOVSAVE
+ 2, DX
If
the offset of SAVE is in SI, they could be saved by the sequence
MOV
[SI], CX
MOV SI
INC SI
MOV [SI],
DX
which
occupies only 6 bytes, but contains 4 instructions. By saving CX and
DX on the stack, only the instructions
PUSH CX
PUSH
DX
MACROS
- A macro definition, that is a label, is assigned with the repeatedly appearing string of instructions.
- Process of assigning a label or macro name to the string-defining a macro.
- A macro within a macro-nested macro.
- In the macro, complete code of instructions string is inserted at each place where the macro name appears.
- Hence EXE file become lengthy.
- Macro does not utilise the service of stack.
- Macro requires less time for execution as it does not contain CALL &RET instructions.
- A macro can be defined anywhere in a program using the directives MACRO & ENDM.
- MACRO-macro name which should be in the actual program.
- ENDM-end of the instructions or statements sequence assigned with the macro name.
- A macro may also be used to represent statements & directives.
0 comments:
Post a Comment