|
From: athomas (Alasdair Thomas)
Subject: Re: 32bit immediate load in ARM code
Date: 17 May 91 10:37:14 GMT
Sender: athomas@armltd.uucp
Organization: A.R.M. Ltd, Swaffham Bulbeck, Cambs, UK
IMPORTANT RULES FOR ARM CODE WRITERS
====================================
Date: 17/5/91
Issue: 2.5
Every effort has been made to ensure that the information in this document
is true and correct at the date of issue. Products described in this
document, however, are subject to continuous development and improvements
and Advanced RISC Machines Ltd (and other contributors) reserve the right to
change their specifications at any time. Advanced RISC Machines Ltd cannot
accept liability for any loss or damage arising from the use of any
information or particulars in this document.
================
= Introduction =
================
The ARM processor family uses Reduced Instruction Set (RISC) techniques to
maximise performance; as such, the instruction set allows some instructions
and code sequences to be constructed that will give rise to unexpected (and
potentially erroneous) results. These cases must be avoided by all machine
code writers and generators if correct program operation across the whole
range of ARM processors is to be obtained.
In order to be upwards compatible with future versions of the ARM processor
family NEVER use any of the undefined instruction formats: both those shown
in the manual as "Undefined" which the processor traps AND those which are
not shown in the manual and which don't trap (for example a Multiply
instruction where bit 5 or 6 of the instruction is set). In addition the
"NV" (never executed) instruction class should not be used [It is
recommended that the instruction "MOV R0,R0" be used as a general purpose
NOP].
This document lists the instruction code sequences to be avoided. It is
*STRONGLY* recommended that you take the time to familiarise yourself with
these cases because some will only fail under particular circumstances which
may not arise during testing.
============================================
= Instructions and code sequences to avoid =
============================================
The instructions and code sequences are split into a number of categories.
Each category starts with a recommendation or warning, and indicates which
of the two main ARM variants (ARM2, ARM3) it applies to. The text then goes
on to explain the conditions in more detail and to supply examples where
appropriate.
Unless a program is being targeted SPECIFICALLY for a single version of the
ARM processor family, all of these recommendations should be adhered to.
1) TSTP/TEQP/CMPP/CMNP: Changing mode
-------------------------------------
####################################################################
# When the processor's mode is changed by altering the mode bits #
# in the PSR using a data processing operation, care must be taken #
# not to access a banked register (R8-R14) in the following #
# instruction. Accesses to the unbanked registers (R0-R7,R15) are #
# safe. #
####################################################################
# Applicability: ARM2 #
####################################################################
The following instructions are affected, but note that mode changes can
only be made when the processor is in a non-user mode:-
TSTP Rn,
TEQP Rn,
CMPP Rn,
CMNP Rn,
These are the only operations that change all the bits in the PSR
(including the mode bits) without affecting the PC (thereby forcing a
pipeline refill during which time the register bank select logic settles).
e.g. Assume processor starts in Supervisor mode in each case:-
a) TEQP PC,#0
MOV R0,R0 SAFE: NOP added between mode change and access
ADD R0,R1,R13_usr to a banked register (R13_usr).
b) TEQP PC,#0
ADD R0,R1,R2 SAFE: No access made to a banked register
c) TEQP PC,#0
ADD R0,R1,R13_usr *FAILS*: Data NOT read from Register R13_usr!
The safest default is always to add a NOP (e.g. MOV R0,R0) after a mode
changing instruction; this will guarantee correct operation regardless of
the code sequence that follows it.
2) LDM/STM: Forcing transfer of the user bank (Part 1)
------------------------------------------------------
###################################################################
# Don't use write back when forcing user bank transfer in LDM/STM #
###################################################################
# Applicability: ARM2,ARM3 #
###################################################################
For STM instructions the S bit is redundant as the PSR is always stored
with the PC whenever R15 is in the transfer list. In user mode programs the
S bit is ignored, but in other modes it has a second interpretation. S=1 is
used to force transfers to take values from the user register bank instead
of the current register bank. This is useful for saving the user state on
process switches.
Similarly, in LDM instructions the S bit is redundant if R15 is not in the
transfer list. In user mode programs, the S bit is ignored, but in non-user
mode programs where R15 is not in the transfer list, S=1 is used to force
loaded values to go to the user registers instead of the current register
bank.
In both cases where the processor is in a non-user mode and transfer
to/from the user bank is forced by setting the S bit, write back of the base
will also be to the user bank though the base will be fetched from the
current bank. Therefore don't use write back when forcing user bank transfer
in LDM/STM.
e.g. In all cases, the processor is assumed to be in a non-user mode and
is assumed not to include R15:-
STMxx Rn!, SAFE: Storing non-user registers with write back to
the non-user base register
LDMxx Rn!, SAFE: Loading non-user registers with write back to
the non-user base register
STMxx Rn,^ SAFE: Storing user registers, but no base
write-back
STMxx Rn!,^ *FAILS*: Base fetched from non-user register, but
written back into user register
LDMxx Rn!,^ *FAILS*: Base fetched from non-user register, but
written back into user register
3) LDM: Forcing transfer of the user bank (Part 2)
--------------------------------------------------
######################################################################
# When loading user bank registers with an LDM in a non-user mode, #
# care must be taken not to access a banked register (R8-R14) in the #
# following instruction. Accesses to the unbanked registers #
# (R0-R7,R15) are safe. #
######################################################################
# Applicability: ARM2,ARM3 #
######################################################################
Because the register bank switches from user mode to non-user mode during
the first cycle of the instruction following an "LDM Rn,^", an
attempt to access a banked register in that cycle may cause the wrong
register to be accessed.
e.g. In all cases, the processor is assumed to be in a non-user mode and
is assumed not to include R15:-
LDM Rn,^
ADD R0,R1,R2 SAFE: Access to unbanked registers after LDM^
LDM Rn,^
MOV R0,R0 SAFE: NOP inserted before banked register used
ADD R0,R1,R13_svc following an LDM^
LDM Rn,^
ADD R0,R1,R13_svc *FAILS*: Accessing a banked register immediately
after an LDM^ returns the wrong data!
ADR R14_svc, saveblock
LDMIA R14_svc, {R0 - R14_usr}^
LDR R14_svc, [R14_svc,#15*4] *FAILS*: Banked base register (R14_svc)
MOVS PC, R14_svc used immediately after the LDM^
ADR R14_svc, saveblock
LDMIA R14_svc, {R0 - R14_usr}^
MOV R0,R0 SAFE: NOP inserted before banked
LDR R14_svc, [R14_svc,#15*4] register (R14_svc) used
MOVS PC, R14_svc
NOTE:
The ARM2 and ARM3 processors *usually* give the expected result, but cannot
be guaranteed to do so under all circumstances. Therefore this code sequence
should be avoided in future.
4) SWI/Undefined Instruction trap interaction
---------------------------------------------
######################################################################
# Care must be taken when writing an undefined instruction handler #
# to allow for an unexpected call from a SWI instruction. #
# The erroneous SWI call should be intercepted and redirected to the #
# software interrupt handler #
######################################################################
# Applicability: ARM2 #
######################################################################
The implementation of the CDP instruction on ARM2 causes a Software
Interrupt (SWI) to take the Undefined Instruction trap if the SWI was the
next instruction after the CDP.
e.g.
SIN F0,F1
SWI &11 *FAILS*: ARM2 will take the undefined instruction trap
instead of software interrupt trap.
All Undefined Instruction handler code should check the failed instruction
to see if it is a SWI, and if so pass it over to the software interrupt
handler.
5) Undefined instruction/Prefetch abort trap interaction
--------------------------------------------------------
######################################################################
# Care must be taken when writing the Prefetch abort trap handler to #
# allow for an unexpected call due to an undefined instruction #
######################################################################
# Applicability: ARM2,ARM3 #
######################################################################
When an undefined instruction is fetched from the last word of a page,
where the next page is absent from memory, the undefined instruction will
cause the undefined instruction trap to be taken, and the following
(aborted) instructions will cause a prefetch abort trap. One might expect
the undefined instruction trap to be taken first, then the return to the
succeeding code will cause the abort trap. In fact the prefetch abort has a
higher priority than the undefined instruction trap, so the prefetch abort
handler is entered _before_ the undefined instruction trap, indicating a
fault at the address of the undefined instruction (which is in a page which
is actually present). A normal return from the prefetch abort handler (after
loading the absent page) will cause the undefined instruction to execute and
take the trap correctly. However the indicated page is already present, so
the prefetch abort handler may simply return control, causing an infinite
loop to be entered.
Therefore, the prefetch abort handler should check whether the indicated
fault is in a page which is actually present. If so, the above condition
must be present and so control should be passed to the undefined instruction
handler. This will restore the expected sequential nature of the execution
sequence; a normal return from the undefined instruction handler will cause
the next instruction to be fetched (which will abort), the prefetch abort
handler will be reentered (with an address pointing to the absent page), and
execution can proceed normally.
========================
= Other points to note =
========================
This section highlights some obscure cases of ARM operation which should be
borne in mind when writing code.
1) Use of R15
-------------
*************************************************************************
* WARNING: When the PC is used as a destination, operand, base or shift *
* register, different results will be obtained depending on *
* the instruction and the exact usage of R15 *
*************************************************************************
* Applicability: ARM2,ARM3 *
*************************************************************************
Full details of the value derived from or written into R15+PSR for each
instruction class is given in the datasheet. Care must be taken when using
R15 because small changes in the instruction can yield significantly
different results.
e.g. Consider data operations of the type:-
{cond}{S} Rd,Rn,Rm
or {cond}{S} Rd,Rn,Rm, Rs
a) When R15 is used in the Rm position, it will give the value of the PC
together with the PSR flags.
b) When R15 is used in the Rn or Rs positions, it will give the value of
the PC without the PSR flags (PSR bits replaced by zeros).
MOV R0,#0
ORR R1,R0,R15 ; R1:=PC+PSR (bits 31:26,1:0 reflect PSR flags)
ORR R2,R15,R0 ; R2:=PC (bits 31:26,1:0 set to zero)
NOTE:
The relevant instruction description in the ARM datasheets should be
consulted for full details of the behaviour of R15.
2) STM: Inclusion of the base in the register list
--------------------------------------------------
***********************************************************************
* WARNING: In the case of a STM with writeback that includes the base *
* register in the register list, the value of the base *
* register stored depends upon its position in the register *
* list *
***********************************************************************
* Applicability: ARM2,ARM3 *
***********************************************************************
During a STM, the first register is written out at the start of the second
cycle of the instruction. When writeback is specified, the base is written
back at the end of the second cycle. A STM which includes storing the base
with the base as the first register to be stored will therefore store the
unchanged value, whereas with the base second or later in the transfer
order, it will store the modified value.
e.g.
MOV R5,#&1000
STMIA R5!,{R5-R6} ; Stores value of R5=&1000
MOV R5,#&1000
STMIA R5!,{R4-R5} ; Stores value of R5=&1008
3) MUL/MLA: Register restrictions
---------------------------------
****************************************************
* Given MUL Rd,Rm,Rs *
* or MLA Rd,Rm,Rs,Rn *
* *
* Then Rd & Rm must be different registers *
* Rd must not be R15 *
****************************************************
* Applicability: ARM2,ARM3 *
****************************************************
Due to the way that Booth's algorithm has been implemented, certain
combinations of operand registers should be avoided. (The assembler will
issue a warning if these restrictions are overlooked.)
The destination register (Rd) should not be the same as the Rm operand
register, as Rd is used to hold intermediate values and Rm is used
repeatedly during the multiply. A MUL will give a zero result if Rm=Rd, and
a MLA will give a meaningless result.
The destination register (Rd) should also not be R15. R15 is protected from
modification by these instructions, so the instruction will have no effect,
except that it will put meaningless values in the PSR flags if the S bit is
set.
All other register combinations will give correct results, and Rd, Rn and
Rs may use the same register when required.
4) LDM/STM: Address Exceptions
------------------------------
************************************************************************
* WARNING: Illegal addresses formed during a LDM or STM operation will *
* not cause an address exception *
************************************************************************
* Applicability: ARM2,ARM3 *
************************************************************************
Only the address of the first transfer of a LDM or STM is checked for an
address exception; if subsequent addresses over- or under-flow into illegal
address space they will be truncated to 26 bits but will not cause an
address exception trap.
e.g. Assume processor is in a non-user mode & MEMC being accessed:-
{these examples are very contrived}
MOV R0,#&04000000 ; R0=&04000000
STMIA R0,{R1-R2} ; Address exception reported (base address illegal)
MOV R0,#&04000000
SUB R0,R0,#4 ; R0=&03FFFFFC
STMIA R0,{R1-R2} ; No address exception reported (base address legal)
; code will overwrite data at address &00000000
NOTE:
The exact behaviour of the system depends upon the memory manager to which
the processor is attached; in some cases, the wraparound may be detected and
the processor aborted.
5) LDC/STC: Address Exceptions
------------------------------
************************************************************************
* WARNING: Illegal addresses formed during a LDC or STC operation will *
* not cause an address exception (affects LDF/STF) *
************************************************************************
* Applicability: ARM2,ARM3 *
************************************************************************
The coprocessor data transfer operations act like STM and LDM with the
processor generating the addresses and the coprocessor supplying/reading the
data. As with LDM/STM, only the address of the first transfer of a LDM or
STM is checked for an address exception; if subsequent addresses over- or
under-flow into illegal address space they will be truncated to 26 bits but
will not cause an address exception trap.
Note that the floating point LDF/STF instructions are forms of LDC & STC!
e.g. Assume processor is in a non-user mode & MEMC being accessed:-
{these examples are very contrived}
MOV R0,#&04000000 ; R0=&04000000
STC CP1,CR0,[R0] ; Address exception reported (base address illegal)
MOV R0,#&04000000
SUB R0,R0,#4 ; R0=&03FFFFFC
STFD F0,[R0] ; No address exception reported (base address legal)
; code will overwrite data at address &00000000
NOTE:
The exact behaviour of the system depends upon the memory manager to which
the processor is attached; in some cases, the wraparound may be detected and
the processor aborted.
6) LDC: Data transfers to a coprocessor fetch more data than expected
---------------------------------------------------------------------
***************************************************************************
* Data to be transferred to a coprocessor with the LDC instruction should *
* never be placed in the last word of an addressable chunk of memory, nor *
* in the word of memory immediately preceding a read-sensitive memory *
* location *
***************************************************************************
* Applicability: ARM3 *
***************************************************************************
Due to the pipelining introduced into the ARM3 coprocessor interface, an
LDC operation will cause one extra word of data to be fetched from the
internal cache or external memory by ARM3 and then discarded; if the extra
data is fetched from an area of external memory marked as cacheable, a whole
line of data will be fetched and placed in the cache.
A particular case in point is that an LDC whose data ends at the last word
of a memory page will load and then discard the first word (and hence the
first cache line) of the next page. A minor effect of this is that it may
occasionally cause an unnecessary page swap in a virtual memory system. The
major effect of it is that (whether in a virtual memory system or not), the
data for an LDC should never be placed in the last word of an addressable
chunk of memory: the LDC will attempt to read the immediately following
non-existent location and thus produce a memory fault.
e.g. Assume processor is in a non-user mode, FPU hardware attached and MEMC
being accessed:-
{this example is very contrived}
MOV R13,#&03000000 ; R13=Address of I/O space
STFD F0,[R13,#-8]! ; Store F.P. register 0 at top of physical memory
; (two words of data transferred)
LDFD F1,[R13],#8 ; Load F.P. register 1 from top of physical memory
; but THREE words of data are transferred, and the
; third access will read from I/O space which may be
; read sensitive! *** BEWARE ***
poppy@poppyfields.net
|
|