|
ARM ASSEMBLER PROGRAMMER'S POCKET REFERENCE (v.0.25)
A BITS PUBLICATION
WRITTEN AND ENTERED INTO THE PUBLIC DOMAIN BY DAVE WALKER JANUARY 1994
History
~~~~~~~
November 1992: Version 0.1, placed in PD by Dave Walker
February 1993: Version 0.1A, incorporating corrections by Graham Willmott
(Dave Walker then goes and does assorted other things, including doing an MSc,
falling in love, losing an A540 and getting a job)
December 1993: Version 0.2, incorporating corrections and extensions
by Martin Ebourne and extra bits at suggestion
of Kevin Bracey; compiled from two issues of
EurekA into one document for John Veness.
January 1994: Version 0.25, correcting the obvious typo in 0.2 and adding
rather more on the (very nontrivial) Floating Point
system.
ARM 6xx Pocket Reference (which should also work for pretty much anything based
on the uncustomised ARM 7xx) planned somewhen, but don't hold your breath.
Email corrections to bits-admin@bristol, and they will somehow get through to
me.
Dedicated, as is everything else I do, to DVG.
Dave Walker, Cambridge 1994.
Memory Assignment, Parameter Passing and Assembly Directives
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memory is reserved with DIM & ; set P% to start of
code as normal.
P% acts as (usual) load address, and then execute address.
O% may be used for offset assembly, however RMs are NOT paged, but sent to
the Heap Manager. O% is still useful, however, so that the object code can
still be assembled with the BASIC source remaining in RAM at &8000. Range
checking has now been implemented, so program assembly aborts if reserved
memory is exceeded. This is done using L%, and produces a listing like:
DIM code% 256
P%=code%
L%=P%+256
[....
Multiple-pass assembly, and indeed all assembler options, are implemented
using the OPT directive:
OPT is a pseudo-opcode which directs the assembler to perform specific
functions, and when used, is always the first operator after the [.
When OPT is not explicitly specified, the default value is set to 3.
is an integer, or variable containing an integer, which has
bits flagged to produce the following effects:
Bit 0 clear = No listing produced
set = Listing produced
Bit 1 clear = No errors reported
set = Errors reported
Bit 2 clear = No offset assembly
set = Offset assembly performed
Bit 3 clear = No range check performed
set = Range check performed (via L%)
The "no errors" bit is necessary for the first pass of a multi-pass assembly
(ie where forward-referenced labels have been used).
Parameters may be passed to an assembler program either by assigning integers
to variables A% to H % which are then copied into R0-R7, or by using
CALL {}
The parameter set may be decoded by examining R9 and R10 on entry to the
routine; R9 contains a pointer to the parameter descriptor block, and R10
contains the number of parameters passed.
The memory block indicated by R9 has a 2-word entry in it for each parameter,
the entries being set up in reverse order (last parameter passed is described
by first 2 words of block). The first word of an entry is a pointer to the
address where the variable itself is stored (or, in the case of a string, to
where a pointer to it is stored), and the second word indicates the type of the
variable.
Type number First word points to Example of possible BASIC var passed
----------- -------------------- ------------------------------------
0 Single-byte number ?var
4 4-byte integer !var,var%,var%(n)
5 5-byte real var, var(n)
128 String info block var$, var$(n)
129 Terminated char string $var
256+4 Integer array block var%()
256+5 Real array block var()
256+128 String array block var$
The locations pointed to by the first word are not guaranteed to be
word-aligned.
In the case of types 4, 5 and 129, the first word points directly to the
variable; otherwise, it points to a further information block. For the case of
strings, the first word points to a word-aligned string information block,
which has the format
Bytes 0-3: Pointer to the characters comprising the string
Byte 4: Current number of characters in the string.
The value of R0 produced by a routine may be read back into a BASIC variable
using the command
=USR()
Registers and Processor Modes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
User Mode
---------
16 registers, R0-R15, visible. Conventionally:
R13 used as primary stack pointer
R14 used as link register
R0-R12 are general and available. R15 used as program counter (PC) and is also
location of status flags:
Bits 2-25 used as PC; "current" memory location is 2 instructions
(8 bytes) past instruction being executed,
as a result of pipelining. Need subtraction
to compensate when writing interrupt
routines.
Bits 26-31 are the main status flags:
Bit Letter Description
--- ------ -----------
26 F Fast Interrupt disable
27 I Interrupt disable
28 V Arithmetic overflow
29 C Carry
30 Z Zero
31 N Negative
Bits 0 and 1 set processor status mode:
Bit 1 (S1) Bit 0 (S0) Mode
---------- ---------- ----
0 0 User (default)
0 1 Fast interrupt (FIQ)
1 0 Interrupt (IRQ)
1 1 Supervisor (SVC)
Interrupts may NOT be disabled by directly writing masked bits to R15 in User
Mode, but SWI"OS_IntOn" and SWI"OS_IntOff" may be used to toggle IRQ (bit 27).
FIQ Mode
--------
R8-R14 are replaced by private R8_FIQ - R14_FIQ
IRQ Mode
--------
R13 and R14 replaced by private R13_IRQ and R14_IRQ
SVC Mode
--------
R13 and R14 replaced by private R13_SVC and R14_SVC. Direct writing to
writable support hardware is permitted. All hardware devices are memory mapped.
Interrupts and SVC Mode
-----------------------
SWI calls WILL corrupt R14_SVC. When executing an SWI
from IRQ or FIQ, the code supplied by Acorn to get round this is:
MOV R9,PC ; Preserve current processor mode
ORR R8,R9,#3 ; Move to R8, selecting SVC mode
TEQP R8,#0 ; Enter SVC mode
MOV R0,R0 ; No-op to sync internal registers
STMFD R13!,{R14} ; Preserve R14_SVC on SVC stack
LDMFD R13!,{R14} ; Restore R14_SVC from SVC stack
TEQP R9,#0 ; Back to original processor mode
MOV R0,R0 ; No-op to sync internal registers
In IRQ and FIQ modes, it is necessary to set bits 26 and 27 of R15, and then
reset them before re-entering User mode.
Note in the code segment above the change from the original MOVNV R0,R0 to
MOV R0,R0; an official statement has been made by Acorn to the effect that
the NV conditional extender is not to be used in further software.
Addressing Modes in ARM Assembler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As R15 has a 26-bit wide address field, it is
obviously not possible to fit an absolute address description into a 32 bit
instruction which also has to contain the opcode, conditional(s) and operand
registers. Hence, when specifying absolute addresses in ARM code, it is
necessary to use indirection.
The basic operands to load and store single-word registers are LDR and STR. LDR
will be used in the examples illustrating addressing modes.
Pre-Indexed Addressing
----------------------
Syntax: LDR , [{,}]
is a simple register.
is an optional quantity such that the memory word loaded
into is the contents of address (base+offset).
may be a simple register, a shifted register or an immediate
constant in the range -4095 to +4095.
Write-back is also implemented, so the instruction
LDR R0,[R3,R8,LSL#2] has the effect Load R0 with the contents of (R3+4*R8) and
set R3=R3+4*R8.
Post-Indexed Addressing
-----------------------
Syntax: LDR [],
The contents of are taken as the address to load from,
and then the contents of are added to it. Write-back is implicit.
Hence:
LDR R8,[R2],R5,LSL#4 ; Load R8 from address R2: R2=R2+16*R5.
Assembler Mnemonics
~~~~~~~~~~~~~~~~~~~
Register Loading
----------------
MOV , Loads register with absolute value of expression; ie if
expression is a label, the absolute address of same at
compile time. Affects N,Z.
MVN , Loads NOT(op1) into reg. As 2's complement is -n=NOT(n)+1,
use MVN ,(n-1) to load reg with -n.
ADR , Loads register with address given as offset from P%,
giving relocatable code. For expr=label, the offset
calculations are user-transparent.
Shift Instructions
------------------
LSL#n or Rx Shifts the operated register left by n places, or by the no.
of places indicated by Rx. Zeroes are inserted into bit 0,
successive bit 31s are shifted into the carry, overflow is lost.
ASL #n or Rx Use is exactly the same as LSL.
LSR #n or Rx Zeroes inserted at bit 31; data shifted right and through bit 0.
Data overflow at least significant end is lost.
ASR #n or Rx As LSR, but bit 31 (sign) is preserved and bit 0 moves into the
carry. For a shift of 1 place, bit 31 is copied to bit 30;
afterwards treat as an LSR on a 31 bit word, bit 32 remaining
signed.
ROR #n or Rx Barrel shift right n places; bits move from bit 0 to bit 31. A
copy of the initial bit 0 is preserved in the carry flag.
RRX Rotate right by one place, using the carry as a "bit 32." Bit 0
goes to the carry, the carry goes to bit 31 etc. Essentially, a
33 bit rotate.
Logical Processing Instructions
-------------------------------
ADD ,, Add operand 1 to operand 2, store in destination.
N,Z,C,V are updated if S suffix used.
ADC ,, As ADD, but accounts for the initial status of the
carry flag. N,Z,C,V affected if S suffix used.
SUB ,, Subtract operand 2 from operand 1, store in
destination. Valid if op1 and op2 are both unsigned
or both 2's complement. Affects N,Z,C,V if S suffix
used.
SBC ,, Subtract with carry; ie dest=op1-op2-NOT(carry).
Affects N,Z,C,V if S suffix used.
RSB ,, Subtract op1 from op2, so shift ops can be done on
op2. Affects N,Z,C,V if S suffix used.
RSC ,, dest=op2-op1-NOT(carry). Affects N,Z,C,V if S suffix
used.
CMP , Reflect notional result of op1-op2 in N,Z,C,V flags.
S suffix is implicitly assumed.
CMN , Reflect notional result of op1-(-op2). Note not
NOT(op2)! Sets Z if op1=op2. Affects N,Z,C,V. S suffix
implicitly assumed.
AND ,, dest=op1 AND op2. Affects N,Z if S suffix used.
ORR ,, dest=op1 OR op2. Affects N,Z if S suffix used.
EOR ,, dest=op1 EOR op2. Affects N,Z if S suffix used.
BIC ,, dest=op1 AND (NOT (op2)). If we take op1 and treat
op2 as a mask, a set bit in op2 will cause the
corresponding bit in op1 to clear. Perform for all 32
bits, then store the modified op1 in dest. Affects
N,Z if S suffix used.
TST , Bitwise notional AND of op1 and op2. Either op can be
the bit mask; Z sets if the bit set in the mask is also
set in the op. Affects N,Z. S suffix is implicitly
assumed.
TEQ , Notional bitwise OR, used to test equivalence.
Affects N,Z. S suffix implicitly assumed.
MUL ,, dest=op1*op2. Restrictions: op1 and op2 must be simple
registers, must be different, and must not be R15. N
and Z reflect the result, V is unchanged and C becomes
ill-defined (if S suffix used).
MLA ,,, dest=(op1*op2)+op3. Useful for running totals.
Permissible for dest to be the same as op3. Flags
set as with MUL.
Block Data Transfer
-------------------
LDM{!}, ]
] See "Stack Implementation" section
STM{!}, ]
Miscellany
----------
B Direct branch to routine at addr; wasteful on pipeline.
BL Copies pipeline-corrected PC to R14 and branches to routine at
addr; return may be affected by MOV PC,R14 at end of subroutine.
SWI Executes an SWI routine. See available list of routines.
Conditional Execution Suffixes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All ARM assembler operations may be executed conditionally; the conditions are
defined by a two-letter extension appended to the mnemonic to be executed
subject to the conditions. In the table below, "flag status" refers to the
conditions required for a command extended by the suffix to be executed.
Suffix Literal Meaning Flag Status Comments
------ --------------- ----------- --------
EQ Equal Z set eg from CMPS with equal args
NE Not equal Z clear
VS Overflow set V set
VC Overflow clear V clear
AL Always Default, implicit
NV Never Only workable in a large chunk
of Boolean; use now prohibited
by Acorn
HI Higher (unsigned) C set AND arg1>arg2 ]
Z clear ]Use after a CMP or
]CPN
LS Less than or same as C clear OR Z arg1<=arg2]
(unsigned) set
PL Plus N clear Zero is positive
MI Minus N set Bit 31 of prev. result is 1
CS Carry set C set prev. instruction carried or
overflowed. Also HS (higher
or same) allowed
CC Carry clear C clear Also LO (lower) allowed
GE Greater than or equal (N set AND V clear)OR(N clear AND V clear)
(signed)
LT Less than (signed) (N set AND V clear)OR(N clear AND V set)
GT Greater than (signed) ((N set AND V set)OR(N clear AND V clear))AND
Z clear
LE Less than or equal ((N set AND V clear)OR(N clear AND V set))AND
(signed) Z clear
For signed suffixes, signing convention is 2's complement (ie NOT and add 1)
Other Mnemonic Extenders
------------------------
B Byte flag Operator only operates on 8-bit operand; word
alignment not necessary
S Status flag Reflect result of operation in status flags.
Beware; this is implicit in 6502, but must be
specified in ARM unless otherwise stated (above)
P Pipeline suspend Disable pipelining to allow writing to R15;
result of operand (usually TEQ) is written back
_directly_ to status bits (26-31) of R15.
Stack Implementation
--------------------
Stacks are most easily implemented using STM and LDM. Both
of these instructions take the syntax
{!},
where is either STM (store multiple)
or LDM (load multiple)
is I Increment address after storing each register
D Decrement address after storing each register
A Modify address after storing each register
B Modify address before storing each register
FA Operate on full ascending stack
FD Operate on full descending stack
EA Operate on empty ascending stack
ED Operate on empty descending stack
is the address of the stack pointer; usually held in R13
{!} specifies write-back of the modified address to the stack pointer
(to prevent data being accidentally overwritten)
is a comma-separated list of arguments, enclosed in braces {}, to
be pushed to the stack. Registers are ALWAYS placed in ascending
memory addresses according to their register number.
The FA-ED extenders save the problem of having to reverse the usual extenders
when pushing and pulling; to pull from a stack which had been pushed to using
STMIA, it was necessary to use LDMDB. Now we can just use STMEA and LDMEA, for
example.
Floating Point Model
~~~~~~~~~~~~~~~~~~~~
The following instructions are _NOT_ included in the
standard ARM BASIC assembler; however, for those of you who may have
cross-assemblers or the Acorn Object Assembler, or are prepared to hand-code
the instructions in hex, here goes.
The FP model complies with the IEEE specification, and provides a further 8
registers, F0-F7. Numbers may be stored in the following 4 formats:
IEEE Single Precision (S)
32 bits: 1 sign bit
23 bit mantissa
8 bit exponent
IEEE Double Precision (D)
64 bits (double word): 1 sign bit
52 bit mantissa
11 bit exponent
[Note: Older FPE and FPPC had a different format to this current system]
Double Extended Prec. (E)
96 bits (triple word): 1 sign bit
64 bit mantissa
15 bit exponent
16 bits unused
Packed decimal BCD (P)
96 bits (triple word): 1 sign digit
19 digit mantissa
4 digit exponent
Expanded packed BCD (EP)
96 bits (triple word): 1 sign digit
24 digit mantissa
7 digit exponent
[Note: This is not available in older FPE & FPPC]
Note: P and EP are mutually exclusive - the one available is determined by
the EP bit in the System Control byte.
The FP status register has separate flags for Overflow, Underflow, Division by
Zero, Inexact result and Invalid operation; a subset of the flags indicating
these is copied into the ARM status register (R15) by the co-processor
interface.
In brief, the instructions are:
LDF ,
STF ,
where:
is one of S,D,F,P
is F0-F7
is either [Rn]{,#offset} or [Rn,#offset]{!}
and is defined as an offset from the ARM base register specified; the
offset is in the range -1020 to +1020. The offset is added to the
base register when write-back {!} is specified with pre-indexed
addressing, and is always added with post-indexed.
FLT Change integer to FP
FIX Change FP to integer
WPS Write FP status to FP status reg.
RFS Read FP status from FP status reg.
WFC Write FP control ] Processor SVC mode only
RFC Read FP control ]
This set uses the general form
eg FLT{rounding mode} ,( #value)
where the rounding mode is chosen from:
Round to nearest (no mnemonic)
Round to + infinity (P)
Round to - infinity (M)
Round to Zero (Z)
Unary Operations
----------------
Command format:
{rounding mode} ,( #val)
Mnemonic Effect Calculation Opcode
-------- ------ ----------- ------
MVF Move Fdest=Fop 00001
MNF Move negated Fdest=-Fop 00011
ABS Absolute value Fdest=ABS(Fop) 00101
RND Round to integer Fdest=INT(Fop) 00111
SQT Square root Fdest=SQR(Fop) 01001
LOG Log to base #10d Fdest=LOG(Fop) 01011
LGN Natural log Fdest=LN(Fop) 01101
EXP Exponent Fdest=EXP(Fop) 01111
SIN Obvious Fdest=SIN(Fop) 10001
COS Equally obvious Fdest=COS(Fop) 10011
TAN Ditto Fdest=TAN(Fop) 10101
ASN ] Fdest=ASN(Fop) 10111
ACS ] (Obvious)^-1 Fdest=ACS(Fop) 11001
ATN ] Fdest=ATN(Fop) 11011
Binary Operations
-----------------
Command format:
{rounding mode} ,,( #value)
Mnemonic Effect Calculation Opcode
-------- ------ ----------- ------
ADF Add Fdest=Fop1+Fop2 00000
MUF Multiply Fdest=Fop1*Fop2 00010
SUB subtract Fdest=Fop1-Fop2 00100
RSF Reverse subtract Fdest=Fop2-Fop1 00110
DVF Divide Fdest=Fop1/Fop2 01000
RDF Reverse divide Fdest=Fop2/Fop1 01010
POW Raise to power Fdest=Fop1^Fop2 01100
RPW Reverse raise... Fdest=Fop2^Fop1 01110
RMF Remainder Fdest=Fop1 MOD Fop2 10000
FML Fast multiply Fdest=Fop1*Fop2 10010
FDV Fast divide Fdest=Fop1/Fop2 10100
FRD Fast reverse divide Fdest=Fop2/Fop1 10110
POL Polar angle Fdest=angle between 11000
Fop1 and Fop2
Note that the "fast" instructionsonly produce single-figure accuracy,
regardless of the precision specified in the mnemonic.
System Memory Map
~~~~~~~~~~~~~~~~
Please note that this is the provisional version of the memory map details;
the master table (at the top) is for an A540/R200 series machine (the addresses
are the same on the entire Archimedes range, but there are no Acorn-endorsed
RAM expansions to take you over the 4Mb), whereas the logically-mapped RAM area
was assigned from much hacking on an A310. The addresses in this latter table
are, of course, affected directly by machine configuration, and the data thus
given is merely representative of the setup of that machine at that time.
Read Write Addr
---- ----- ----
-------------------------------- 3FFFFFF
ROM (high) | Logical to Physical
| address translator -------------------
-------------------------------- 3800000 / 4Mb daughter card
ROM (low) | DMA Address / 3rd slot MEMC (z)
| generators / -------------------
|-------------------- 3600000 / 4Mb daughter card
| Video Controller / 2nd slot MEMC (y)
-------------------------------- 3400000 / -------------------
Input/Output Controllers / 4Mb daughter card
-------------------------------- 3000000 / Ist slot MEMC (x)
Physically mapped RAM -------------------
-------------------------------- 2000000 \ 4Mb on motherboard
Logically mapped RAM \ Master MEMC (w)
-------------------------------- 0000000 \----- -------------------
0 ---------------------- \
ARM Reset \ \
4 ---------------------- \ \
Undefined instruction \ \
8 ---------------------- \ \
Software Interrupt (SWI) \ \
C ---------------------- \ 0 ---------------------------------
Abort (pre-fetch) Bootstrap and hardware exception
10 ---------------------- vectors
Abort (data) / 1C ---------------------------------
14 ---------------------- / System and BASIC workspace
Address exception / 8000 ---------------------------------
18 ---------------------- / Application RAM
IRQ_Vec / (RISC OS Desktop Tasks map to here)
1C --------------------- / A7FFF ---------------------------------
FIQ_Vec / Unassigned address space
--------------------- 1000000 ---------------------------------
RAM filing system
1800000 ---------------------------------
RAM-based Relocatable Modules
(incl. those downloaded from podules)
Font definitions grow downwards
1825FFF ---------------------------------
Unassigned address space
1C00000 ---------------------------------
System Heap
1C03FFF ---------------------------------
Unassigned address space
1F00000 ---------------------------------
Cursor data and Desktop scratchpad
1F07FFF ---------------------------------
Unassigned address space
1FEC000 ---------------------------------
Screen RAM
(grows downwards)
1FFFFFF ---------------------------------
IOC
---
The I/O controller is mapped into RAM from &3000000 to &3400000, and _ALL_ I/O
devices are also thus memory mapped. The chip has 4 operation modes; sync,
fast, medium and slow. It therefore follows that a good many operations will be
duplicated in different modes. A (necessarily incomplete) list of some device
entry points follows; generally, these entry points are only directly
accessible in SVC mode.
Cycle Type Base Address Device Device IC
---------- ------------ ------ ---------
Fast &3310000 Floppy disc controller WD1772
Sync &33A0000 Econet controller 6854
Sync &33B0000 Serial controller 6551
Fast &3350000 Printer data LS374
Fast &3360000 Podule IRQ register
Fast &3360004 Podule IRQ mask
register
Slow &3270000 Extended external
podule space
Slow &3240000 Podule 0
Med &32C0000 Podule 0
Fast &3340000 Podule 0
Sync &33C0000 Podule 0
Slow &3244000 Podule 1
Med &32C4000 Podule 1
Fast &3344000 Podule 1
Sync &33C4000 Podule 1
Slow &3248000 Podule 2
Med &32C8000 Podule 2
Fast &3348000 Podule 2
Sync &33C8000 Podule 2
Slow &324C000 Podule 3
Med &32CC000 Podule 3
Fast &334C000 Podule 3
Sync &33CC000 Podule 3
Although these are the current lookup addresses, Acorn make no guarantees that
they will stay here in future incarnations of the OS. Writing to and reading
from these addresses is currently the fastest way to interact with a podule,
but for future compatibility, it's safer to use the SWIs detailed later this
issue. Alternatively, build your own lookup table using SWI
"Podule_HardwareAddress", or (from what I've heard about it) take a look at
!StrongHlp.
NB
-- When writing to a device via IOC, the data must be on the top 16 bits of the
data word. Data to be read appears on the bottom 16 bits.
poppy@poppyfields.net
|
|