Tuesday, November 20, 2012

Assembler instruction format

-->
Assembler instruction format
  • A assembler language format has one or more number fields associated with it.
  • The first one is operation field or opcode field and the other fields is known as operand field.
  • There are six general formats of instruction.
  1. One byte instruction:
  • One byte long and have the implied data or register operands.
  • The least significant 3-bit is used specify register operanda.
  1. Register to register:
  • Two byte long.
  • 1stbyte specify operation code and width of the operand specified by w bit.
  • 2nd byte shows register operands and R/M field
D7 D1 Do
OPCODE
W



  1. Register to/from memory with no displacement:
  • Two byte long.
  • Similar to register to register format except for the MOD field.
D7 D1 Do
OPCODE
W


  • The MOD shows the mode addressing.
  1. Register to/from memory with displacement:
  • This contains one or two additional byte for the displacement along with 2-byte the format of theregister to/from memory with no displacement.
  1. Immediate operand to register:
  • The 1st byte as well as the 3-bits from the 2nd byte which are used for REG field in the case of register to register format are used for format
  • It also contains one or two bytes of immediate data.
  1. Immediate operand to memory wih 16 bit displacement:
  • Requires 5 or 6 bytes for coding.
  • 1st 2 byte contain information regarding OPCODE,MOD and R/M fields.
  • The remaining 4 byte contains 2 byte of displacement and 2 byte of data.
The opcode have single bit indicators.
W-bit:
  • It indicates to operand over an 8-bit or 16-bit data/operand.
  • W=0,the bit is of 8 bit
  • W=1,the operand is 16 bit
D-bit:
  • Valid in case of double operand instruction.
  • One operand must be a register specified by the REG field.
  • The operand specified by REG is source operand if D=0, else it is a destination operand.
S-bit:
  • Called as sign extension bit.
  • S-bit is used along with W-bit to show the type of the operation.
8-bit operation with 8-bit immediate operand is indicated by
S=0, W=0;
16-bit operation with 16-bit immediate operand is indicated by
S=0, W=1 and
16-bit operation with sign extended immediate data is indicated by S=1, W=1.
V-bit:
  • Used in shift and rotate instructions.
  • Bit is set, if shift count is 1 and is set 1,if CL contains the shift count.
Z-bit:
  • Used by REP instruction to control the loop.
  • Z=1,the instruction with REP prefix is executed until the zero flag matches the Z-bit.

ASSEMBLER DIRECTIVES AND OPERATORS

The logical errors or other programming errors are not found by the assembler. For completing all these tasks, an assembler needs some hints from the programmer. These types of hints are given to the assembler using some predefined alphabetical strings called assembler directives, which helps the assembler to correctly understand the assembly language program to prepare the codes.
Another type of hint which helps the assembler to assign a particular constant with a label or initialize particular memory locations or labels with constants is an operator.

  • DB: Define Byte: The DB directive is used to reserve byte or bytes of memory locations in the available memory.
  • DW: Define Word: The DW directive serves the same purposes as the DB directive, but it makes the assembler reserves the number of memory words (16bit) instead of bytes.
  • DQ: Define Quadword: This directive is used to direct the assembler to reserve 4 words (8bytes) of memory for the specified variable and may initialize it with the specified values.
  • DT: Define Ten Bytes: The DT directive directs the assembler to define the specified variable requiring 10-bytes for its storage and initialize the 10-bytes with the specified values.
  • ASSUME: Assume Logical Segment Name: The ASSUME directive is used to inform the assembler, the names of the logical segments to be assumed for different segments used in the program.
  • END: END Of Program: The END directive marks the end of an assembly language program.
  • ENDP: END Of Procedure: The ENDP directive is used to indicate the end of a procedure.
PROCEDURE STAR
:

STAR ENDP //indicates the end of procedure STAR
  • ENDS: END Of Segment: This directive marks the end of a logical segment.
DATA SEGMENT

:

DATA ENDS //indicates the end of segment DATA
  • EVEN: Align On Even Memory Address: The EVEN directive updates the location counter to the next even address, if the current location counter contents are not even, and assigns the following routine or variable or constant to that address. If the content of the location counter is already even, then the procedure will be assigned with the same address.

  • EQU: Equate: The directive EQU is used to assign a label with a value or symbol. The use of this directive is just to reduce the recurrence of the numerical values or constants in the program code.
LABEL EQU 0500H

  • EXTERN: External and PUBLIC: Public: The directive EXTERN informs the assembler that the names, procedures and labels declared after this directive have already been defined in some other assembly language module.

  • GROUP: Group the Related Segments: This directive is used to form logical groups of segments with similar purpose or type.
PROGRAM GROUP CODE, DATA, STACK //this statement directs the loader/linker to prepare an EXE file such that CODE, DATA, STACK segment must lie within a 64byte memory segment that is named as PROGRAM

  • LABEL: The LABEL directive is used to assign a name to the current content of the location counter. A LABEL directive can be used to make a FAR jump. The label directive can be used to refer to the data segment along with the data type, byte or word.

DATA SEGMENT
DATAS DB 50H DUP (?)
DATA-LAST LABEL BYTE FAR
DATA ENDS

After reserving 50H locations for DATAS, the next location will be assigned a label DATA-LAST and its type will be byte and far.
  • LENGTH: Byte Length Of A Label: This directive is used to refer to the length of a data array or a string. Not available in MASM.
  • LOCAL: The label, variables, constants or procedures declared LOCAL in a module are to be used only by that particular module.
  • NAME: Logical Name Of A Module: The NAME directive is used to assign a name to an assembly language program module.
  • OFFSET: Offset Of A Label: When the assembler comes across the OFFSET operator along with a label, it first computes the 16-bit displacement of the particular label, and replaces the string ‘OFFSET LABEL’ by the computed displacement.
  • ORG: Origin: The ORG directive directs the assembler to start the memory allotment for the particular segment, block or code from the declared address in the ORG statement.
  • PROC: Procedure: The PROC directive marks the start of a named procedure in the statement. Also the types FAR and NEAR specifies the type of the procedure.
  • PTR: Pointer: The POINTER operator is used to declare the type of a label, variable or memory operand. The operator PTR is prefixed by either BYTE (8-bit quantity) or WORD (16-bit quantity).
  • SEGMENT: Logical Segment: The SEGMENT directive marks the starting of a logical segment. The started segment is also assigned a name, i.e. label, by this statement.
  • SHORT: The SHORT operator indicates the assembler that only one byte is required to code the displacement for a jump. This method of specifying jump address saves memory.
  • TYPE: The TYPE operator directs the assembler to decide the data type of the specified label and replaces the TYPE label by the decided data type.

  • GLOBAL: The labels, variables, constants or procedures declared GLOBAL may be used by other modules of the program.
  • + & -‘Operators: These operators represent arithmetic addition and subtraction respectively. And are typically used to add or subtract displacements (8 or 16 bit) to base or index registers or stack or base pointers.

  • FAR PTR: This directive indicates the assembler that the label following FAR PTR is not available within the same segment and the address of the bit is of 32 bits i.e. 2 bytes offset followed by 2 bytes.


  • NEAR PTR: This directive indicates that the label following NEAR PTR is in the same segment and need only 16 bit i.e. 2 byte offset to address it. A NEAR PTR label is considered as default if a label is not preceded by NEAR PTR or FAR PTR.

The Assembly Process
The way an assembler is designed depends heavily on the internal organization of the processor for which it is used. Architectural features such as memory word size, number formats, internal character codes, index registers, and general purpose registers affect the way assemblers are written and the way an assembler handles instructions and directives.
In ALP, coding is done in symbolic language (called mnemonics). The mnemonics specifies the operation to be done. The assembler translates this symbolic language to machine code which the processor understands. This is the main task of an assembler. It is an important program development tool and should provide appropriate error messages to guide the programmer.
Features of Assembler
When code is written in assembly language, it’s easy if we use labels for memory locations. Managing and utilizing labels efficiently reflects on the quality of an assembler.
Assemblers that support macros are called Macro Assembler. A macro is a name given to a sequence of instruction lines. Once a macro is defined, it’s name may be used in place of these set of lines. Thus a macro seems to be like a mnemonic, and can use it as frequently without incurring any of the overheads that a procedure or function may cause. The use of macros help to make programs structured.
Instructions and Directives
An ALP contains directives along with instructions. Instructions get translated into machine codes, but directives do not. The directives perform as ‘help’ for the assembler for deciding certain other aspects of the assembly process. Thus the assembler takes our source code (containing instructions and directives) and converts it into object code which contains the machine code. The object file is the one that is loaded into memory and executed after linking it with other necessary files.
The Forward Reference Problem
A typical line in a program could be:
BEGIN: MOV AX, COST ; move content of location COST to AX
In this example, BEGIN is a label, which corresponds to the address at which this line of code is stored. MOV is a mnemonic. COST is a label that acts as one operand in this two operand instruction. A label when defined is called a symbol. There will be a memory address corresponding to all symbols. In this instruction the other operand is AX. The text coming after ’;’ is a comment. All current assemblers allow the use of labels and finally the assembler must translate these labels into memory addresses. An important issue the assembler faces is the ‘forward reference problem’. See the code below:
BEGIN: MOV AX, BX
JMP AGAIN
-----------------
----------------
-
-
-
AGAIN: ---------------
As the assembler reads the second line, an undefined label AGAIN is encountered. The assembler has no knowledge about it as yet, but the value of the symbol AGAIN is needed before it can be defined. This is also called the future symbol problem or the problem of unresolved references. This problem is resolved in various ways depending on the type of the assembler.
Two Pass Assembler
There are many types of assemblers – one pass, two pass and multi pass assemblers. Each pass optimizes the assembly process. Two pass assemblers are commonly used. A pass implies a reading of the code from beginning to end. A two pass assembler does two passes over the source file (the second pass will be over a file generated in the first pass). In the first pass it looks for label definitions and inserts them in the symbol table after assigning them addresses. Memory is allocated sequentially, and a location counter is incremented at each step. At the end of this pass the symbol table should contain definitions of all the labels used in the program.
In pass 2, the actual translation of assembly code to machine code is done. Errors are also reported after this. The assembly process produces a ‘re-locatable’ object file. This object file can be loaded anywhere in the memory, when the program is to be executed. A ‘loader’ is a program which does this.


STACKS
There are many situations in which a program needs to temporarily store information and then retrieve it in reverse order.on the 8086 the CX register and the loop instruction can be conveniently used to provide the counting, testing, and branching needed in a loop,but because the loop instructions are designed to use only the CX register a problem arises when loops are nested. However,if as shown in FIG.1 there were an efficient means of loop counters in order and then restoring them by retrieving the last stored counter first, at least part of the problem would be alleviated.
FIG 1:


Loop N counter

:
:
Loop 2 counter
Loop 1 counter

Outer loop





Order for storing counters in memoryorder for retrieving counters from
Memory




Stack facilities normally involve the use of indirect addressing through a special register,the stack pointer,that is automatically decremented as items are put on the stack and incremented as they are retrieved. Putting something on the stack is called push and taking it off is called a pop. The address of the last element pushed on to the stack is called top of the stack(TOS).

The 8086 instructions for directly pushing and popping the stack are given in fig 2. Only words can be pushed or popped and data cannot be immediate, but the PUSH and POP instruction can use all the other addressing modes. Only the POPF instruction, which pops the top of the stack into the psw, affects the flags.
FIG 2:
Name Mnemonics and Format
Push onto the stack PUSH SRC
Pop from the stack POP DST
Push the flags PUSHF
Pop the flags POPF









STACK_SEG SEGMENT
DW 30 DUP (?)
TOS LABEL WORD
STACK_SEG ENDS
CODE_SEG SEGMENT
ASSUME CS:CODE_SEG, SS:STACK_SEG
START: MOV AX, STACK_SEG
MOV SS, AX ;INITIALIZES SS
MOV SP, OFFSET TOS ;INITIALIZES SP
:
:
CODE_SEG ENDS
On the 8086, the physical stack address is obtained from both (SP) and (SS) or (BP) and (SS), with SP being the implied stack pointer register for all push and pop operations and SS being the stack segment register. The (SS) are the lowest address in (i.e., limit of) the stack area and may be referred to as base of the stack. The original contents of the SP are considered to be the largest offset the stack should attain. Therefore the stack is considered to occupy the memory locations from 16 times to 16 times (SS) to 16 times (SS) plus the original (SP). The location of the stack must, of course, be initially set by software, which may be a code sequence in either the operating system or user’s program. This is done by loading the (SS) and (SP) registers as shown in FIG 3.
FIG 3:








The BP register is primarily for allowing random access to the stack. For ex:
MOV AX, [BP][SI]
Will load AX from a location in the stack segment, the offset of the location being the sum of (BP) and (SI). On the other hand, for the instruction
MOV AX, [BX][SI]
The source operand is from the data segment.
Stack facilities are more efficient than ordinary memory in two ways. The PUSH and POP instructions are shorter because one of the operand is indirectly addressed through the SP register, and the (SP) are automatically incremented or decremented to point to a new address. For ex: the registers CX and DX could be stored in memory beginning at SAVE by the following instructions, which occuppy a total of 8 bytes:
MOV SAVE, CX
MOVSAVE + 2, DX
If the offset of SAVE is in SI, they could be saved by the sequence
MOV [SI], CX
MOV SI
INC SI
MOV [SI], DX
which occupies only 6 bytes, but contains 4 instructions. By saving CX and DX on the stack, only the instructions
PUSH CX
PUSH DX
which requires only 2 bytes of memory and can be executed very quickly, are needed.

MACROS

  • A macro definition, that is a label, is assigned with the repeatedly appearing string of instructions.
  • Process of assigning a label or macro name to the string-defining a macro.
  • A macro within a macro-nested macro.
  • In the macro, complete code of instructions string is inserted at each place where the macro name appears.
  • Hence EXE file become lengthy.
  • Macro does not utilise the service of stack.
  • Macro requires less time for execution as it does not contain CALL &RET instructions.
  • A macro can be defined anywhere in a program using the directives MACRO & ENDM.
  • MACRO-macro name which should be in the actual program.
  • ENDM-end of the instructions or statements sequence assigned with the macro name.
  • A macro may also be used to represent statements & directives.

0 comments:

Post a Comment

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More