|
Origins of the SWP instruction
From: john@acorn.co.uk (John Bowler)
Subject: Re: Multiprocessing Archimedes??
Date: 16 Aug 91 11:10:50 GMT
torq@GNU.AI.MIT.EDU (Andrew Mell) writes:
>I notice that the Arm3 has a new instruction over the Arm2 which is
>SWP. It swaps a byte or a word between register and external memory.
>(uninterruptible between the read and write)
^^^^^^^^^^^^^^^
Indeed, but not necessarily not interleavable with other memory operations
(sorry about the double negative :-). In particular, to fully support the
SWP on a system with multiple memory bus masters the memory control logic
which decides which bus master has access to the memory next would have to
force an interlock between the memory read and memory write of the SWP
instruction. Now, the ARM3 has a LOCK pin for this, but to support
multi-processors you need to connect it to something :-).
>All very interesting you might say, but it intrigues me as this sort
>of instruction is usually only used in multiprocessor systems as a
>software semaphore.
>
>Why did Acorn add this instruction to the Arm3?
Because a long time ago, when we were very young (;-) we tried to write a
multi-threaded OS (ARX) and we ``found'' (sic, thought) that it was
spending a lot of time going into supervisor mode and disabling interrupts
so that it could implement mutexes (for user mode code - including the OS,
which ran in user mode too). In theory SWP allows user code to implement
mutexes efficiently.
As far as I am concerned the MP aspects of SWP are bonuses (clearly these
were considered at the same time - or the LOCK pin wouldn't be there).
Notice that SWP always bypasses the cache; again this is MP support, however
there is an ommission here in that it is impossible to do a (reliable) read
from external memory (you might get the cache contents instead!)
John Bowler (jbowler@acorn.co.uk)
From: john@acorn.co.uk (John Bowler)
Subject: Re: Multiprocessing Archimedes??
Date: 19 Aug 91 16:25:33 GMT
julian@bridge.welly.gen.nz writes:
>john@acorn.co.uk (John Bowler) writes:
>
>> Notice that SWP always bypasses the cache; again this is MP support, however
>> there is an ommission here in that it is impossible to do a (reliable) read
>> from external memory (you might get the cache contents instead!)
>
>If you're using it to implement semaphores, this is not a problem, as you'd
>never need to access the semaphore with any instruction other than SWP.
Yes; there is no problem with the semaphore, but the semaphore must be
protecting some state which is shared. When a processor has claimed that
semaphor it probably needs to read the state and to obtain consistent
results when it reads it. If the data is in cacheable memory the only way
it can do that is to use sequences of the form:-
SWP rx, rx, [raddr] ; read a value out
STR rx, [raddr] ; and put it back... :-(
The alternative is to allocate shared data in uncacheable memory. This
requires some OS intervention (a user program cannot simply allocate
shareable data structures out of its own heap unless the whole heap is
uncacheable) and uncacheable data obviously has a performance hit.
>BTW. You wouldn't happen to know the instruction format for SWP, by any
> chance? If a software emulator can be written for it for ARM2 machines
> (like the FPE - or even add it to the FPE) then we can all start using
> it.
RISC iX 1.2 emulates the SWP instruction on machines which do not support
it. RISC OS doesn't. The assembler syntax is:-
SWP{cond}{B} Rd, Rm, [Rn]
the semantics (except for the cache behaviour and so on) are:-
MOV , Rm
LDR{cond}{B} Rd, [Rn]
STR{cond}{B} , [Rn]
(ie the SWP Rx, Rx, [Raddr] example above *does* store the *old* Rx value
in [Raddr]... :-).
The instruction format is:-
bit 31 bit 0
c.o.n.d.0.0.0.1 0.B.0.0.n.n.n.n d.d.d.d.0.0.0.0 1.0.0.1.m.m.m.m
c.o.n.d - the condition
B - 0 = swap word
1 = swap byte
n.n.n.n - Rn
d.d.d.d - Rd
m.m.m.m - Rm
Data aborts (from the memory manager) leave Rd/Rm as they were before.
SWP bypasses the ARM3 cache, although the write operation still updates
the cache (if the address is cached). I don't know whether the read
will cause the rest of that part of the cache to be updated (I assume
not, and the programmer should not care :-)
John Bowler (jbowler@acorn.co.uk)
From: dseal@armltd.co.uk (David Seal)
Subject: Re: ARM3 instructions.
Date: 4 Sep 92 15:01:12 GMT
In article <4422@gos.ukc.ac.uk> amsh1@ukc.ac.uk (Brian May#2) writes:
> I don't have an Archie myself but have used them quite a lot in the past.
>I was recently mucking about with a friend's A5000, trying to find the new
>instructions that turned the cache on and off. I found them, they were
>co-processor instructions with the processor itself as (I think) number 0.
Coprocessor 15, in fact.
> Anyway, as I was disassembling away I found a new instruction (well, I had
>never come across it before). It was 'SWP' and I imagine it swaps registers
>with registers, maybe with memory as well? I can't remember. If it does
>reg<->mem as well, and is uninterruptable, perhaps it is for use as a
>semaphore in multi-processor systems?
The SWP instruction was new to the ARM2as macrocell. I believe ARM3 was the
first full chip which contained it. More recent macrocells and chips like
ARM6, ARM60, ARM600 and ARM610 also contain it.
It only swaps a register with a memory location (either a byte or a word),
and not two registers. It can however read the new contents of the memory
location from one register, and write the old contents of the memory
location to another register - i.e. it doesn't have to do a pure swap. This
may be the source of your idea that it can swap two registers. It is indeed
uninterruptable, and yes, it is intended for semaphores.
> Of course I won't be the first person to notice this so I wondered, could
>someone post some info on this, and also on the co-processor instructions
>relevant to the CPU itself?
The SWP instruction:
Bits 31..28: Usual condition field
Bits 27..23: 00010
Bit 22: 0 for a word swap, 1 for a byte swap
Bits 21..20: 00
Bits 19..16: Base register (addresses the memory location involved)
Bits 15..12: Destination register (where the old memory contents go)
Bits 11..4: 00001001
Bits 3..0: Source register (where the new memory contents come from)
Byte swaps use the bottom byte of the source and destination registers,
and clear the top three bytes of the destination register. There are
various rules about how R15 works in each register position, similar to
those for LDR and STR instructions. The destination and source registers
are allowed to be the same for a pure swap. I don't know offhand what
would happen if the base register were equal to one or both of the others,
but I don't think I'd recommend doing it!
Assembler syntax is (using <> around optional sections):
SWP Rdest,Rsrc,[Rbase]
The ARM3 cache control registers are all coprocessor 15 registers, accessed
by MRC and MCR instructions in non-user modes. (They will produce invalid
operation traps in user mode.)
Coprocessor 15 register 0 is read only and identifies the chip - e.g.:
Bits 31..24: &41 - designer code for ARM Ltd.
Bits 23..16: &56 - manufacturer code for VLSI Technology Inc.
Bits 15..8: &03 - identifies chip as an ARM3.
Bits 7..0: &00 - revision of chip.
Coprocessor 15 register 1 is simply a write-sensitive location - writing any
value to it flushes the cache.
Coprocessor 15 register 2: a miscellaneous control register.
Bit 0 turns the cache on (if 1) or off (if 0).
Bit 1 determines whether user mode and non-user modes use the same address
mapping. Bit 1 is 1 if they do, 0 if they have separate address
mappings. It should be 1 for use with MEMC.
Bit 2 is 0 for normal operation, 1 for a special "monitor mode" in which
the processor is always run at memory speed and all addresses and data
are put on the external pins, even if the memory request was satisfied
by the cache. This allows external hardware like a logic analyser to
trace the program properly.
Other bits are reserved for future expansion. Code which is trying to set
the whole control register (e.g. at system initialisation time) should
write these bits as zeros to ensure compatibility with any such future
expansions. Code which is just trying to change one or two bits (e.g.
turn the cache on or off) should read this register, modify the bits
concerned and write it back: this ensures that it won't have unexpected
side effects in the future like turning as-yet-undefined features off.
This register is reset to all zeros when the ARM3 is reset.
Coprocessor 15 register 3: controls whether areas of memory are cacheable,
in 2 megabyte chunks. All accesses to an uncacheable area of memory go
to the real memory and not to the cache - this is a suitable setting
e.g. for areas containing memory-mapped IO, or for doubly mapped areas
of memory.
Bit 0 is 1 if virtual addresses &0000000-&01FFFFF are cacheable, 0 if they
are not.
Bit 1 is 1 if virtual addresses &0200000-&03FFFFF are cacheable, 0 if they
are not.
:
:
Bit 31 is 1 if virtual addresses &3E00000-&3FFFFFF are cacheable, 0 if
they are not.
Coprocessor 15 register 4: controls whether areas of memory are updateable,
in 2 megabyte chunks. All write accesses to a non-updateable area of
memory go to the real memory only, not to the cache - this is a suitable
setting for areas of memory that contain ROMs, for instance, since you
don't want the cached values to be altered by an attempt to write to the
ROM. (Or, as in MEMC, by an attempt to write to write-only locations
that share an address with the read-only ROMs.)
Bit 0 is 1 if virtual addresses &0000000-&01FFFFF are updateable, 0 if
they are not.
Bit 1 is 1 if virtual addresses &0200000-&03FFFFF are updateable, 0 if
they are not.
:
:
Bit 31 is 1 if virtual addresses &3E00000-&3FFFFFF are updateable, 0 if
they are not.
Coprocessor 15 register 5: controls whether areas of memory are disruptive,
in 2 megabyte chunks. Any write access to a disruptive area of memory
will cause the cache to be flushed. This is a suitable setting for areas
of memory which if written, could cause cache contents to become invalid
in some way. E.g. on MEMC, writing to the physically addressed memory at
addresses &2000000-&2FFFFFF will also usually change a virtually
addressed location's contents: if this location is in cache, a
subsequent attempt to read it would read the old value. To avoid this
problem, the physically addressed memory should be marked as disruptive
in a MEMC system. Similarly, any remapping of memory on a MEMC or other
memory controller should act disruptively, since the cache contents are
liable to have become invalid.
Bit 0 is 1 if virtual addresses &0000000-&01FFFFF are disruptive, 0 if
they are not.
Bit 1 is 1 if virtual addresses &0200000-&03FFFFF are disruptive, 0 if
they are not.
:
:
Bit 31 is 1 if virtual addresses &3E00000-&3FFFFFF are disruptive, 0 if
they are not.
Coprocessor 15 registers 3-5 are in an undefined state after power-up: they
must be programmed correctly before the cache is turned on.
Note that you should check the identity code in coprocessor 15 register 0
identifies the chip as an ARM3 before assuming that the other registers can
be used as stated above, unless you are absolutely certain your code can
only ever be run on an ARM3. Otherwise you are likely to run into problems
with other chips - e.g. an ARM600 uses the same coprocessor 15 registers to
control its cache and MMU, but in a completely different way. Just about the
only thing they do have in common is that coprocessor 15 register 0 contains
an identification code as described above.
David Seal
dseal@armltd.co.uk
All opinions are mine only...
From: mhardy@acorn.co.uk (Michael Hardy)
Subject: Re: Risc-OS Documentation
Date: 15 Aug 91 09:45:14 GMT
Organization: Acorn Computers Ltd, Cambridge, England
ARM3 SUPPORT
============
Introduction and Overview
=========================
The ARM3Support module provides commands to control the use of the ARM3
processor's cache, where one is fitted to a machine. The module will
immediately kill itself if you try to run it on a machine that only has an
ARM2 processor fitted.
Summary of facilities
---------------------
* Commands are provided: one to configure whether or not the cache is
enabled at a power-on or reset, and the other to independently turn the
cache on or off.
There is also a SWI to turn the cache on or off. A further SWI forces the
cache to be flushed. Finally, there is also a set of SWIs that control how
various areas of memory interact with the cache.
The default setup is such that all RISC OS programs should run unchanged
with the ARM3's cache enabled. Consequently, you are unlikely to need to
use the SWIs (beyond, possibly, turning the cache on or off).
Notes
-----
A few poorly-written programs may not work correctly with ARM3 processors,
because they make assumptions about processor timing or clock rates.
Finding out more
----------------
For more details of the ARM3 processor, see the Acorn RISC Machine family
Data Manual. VLSI Technology Inc. (1990) Prentice-Hall, Englewood Cliffs,
NJ, USA: ISBN 0-13-781618-9.
SWI Calls
=========
Cache_Control (SWI &280)
========================
Turns the cache on or off
On entry
--------
R0 = EOR mask
R1 = AND mask
On exit
-------
R0 = old state (0 => cacheing was disabled, 1 => cacheing was enabled)
Interrupts
----------
Interrupts are disabled
Fast interrupts are enabled
Processor mode
--------------
Processor is in SVC mode
Re-entrancy
-----------
Not defined
Use
---
This call turns the cache on or off. Bit 0 of the ARM3's control register 2
is altered by being masked with R1 and then exclusive ORd with R0: ie new
value = ((old value AND R1) XOR R0). Bit 1 of the control register is also
set, forcing the memory controller to use the same translation table for
both User and Supervisor Modes (as indeed the MEMC chip should). Other bits
of the control register are set to zero.
Related SWIs
------------
None
Related vectors
---------------
None
Cache_Cacheable (SWI &281)
==========================
Controls which areas of memory may be cached
On entry
--------
R0 = EOR mask
R1 = AND mask
On exit
-------
R0 = old value (bit n set => 2MBytes starting at n*2MBytes are cacheable)
Interrupts
----------
Interrupts are disabled
Fast interrupts are enabled
Processor mode
--------------
Processor is in SVC mode
Re-entrancy
-----------
Not defined
Use
---
This call controls which areas of memory may be cached (ie are cacheable).
The ARM3's control register 3 is altered by being masked with R1 and then
exclusive ORd with R0: ie new value = ((old value AND R1) XOR R0). If bit n
of the control register is set, the 2MBytes starting at n*2MBytes are
cacheable.
The default value stored is &FC007FFF, so ROM, the RAM disc and logical
non-screen RAM are cacheable, but I/O space, physical memory and logical
screen memory are not.
(You may find a value of &FC007CFF - which disables cacheing the RAM disc -
gives better performance.)
Related SWIs
------------
Cache_Updateable (SWI &282), Cache_Disruptive (SWI &283)
Related vectors
---------------
None
Cache_Updateable (SWI &282)
===========================
Controls which areas of memory will be automatically updated in the cache
On entry
--------
R0 = EOR mask
R1 = AND mask
On exit
-------
R0 = old value (bit n set => 2MBytes starting at n*2MBytes are cacheable)
Interrupts
----------
Interrupts are disabled
Fast interrupts are enabled
Processor mode
--------------
Processor is in SVC mode
Re-entrancy
-----------
Not defined
Use
---
This call controls which areas of memory will be automatically updated in
the cache when the processor writes to that area (ie are updateable). The
ARM3's control register 4 is altered by being masked with R1 and then
exclusive ORd with R0: ie new value = ((old value AND R1) XOR R0). If bit n
of the control register is set, the 2MBytes starting at n*2MBytes are
updateable.
The default value stored is &00007FFF, so logical non-screen RAM is
updateable, but ROM/CAM/DAG, I/O space, physical memory and logical screen
memory are not.
Related SWIs
------------
Cache_Cacheable (SWI &281), Cache_Disruptive (SWI &283)
Related vectors
---------------
None
Cache_Disruptive (SWI &283)
===========================
Controls which areas of memory cause automatic flushing of the cache on a
write
On entry
--------
R0 = EOR mask
R1 = AND mask
On exit
-------
R0 = old value (bit n set => 2MBytes starting at n*2MBytes are disruptive)
Interrupts
----------
Interrupts are disabled
Fast interrupts are enabled
Processor mode
--------------
Processor is in SVC mode
Re-entrancy
-----------
Not defined
Use
---
This call controls which areas of memory cause automatic flushing of the
cache when the processor writes to that area (ie are disruptive). The
ARM3's control register 5 is altered by being masked with R1 and then
exclusive ORd with R0: ie new value = ((old value AND R1) XOR R0). If bit n
of the control register is set, the 2MBytes starting at n*2MBytes are
updateable.
The default value stored is &F0000000, so the CAM map is disruptive, but
ROM/DAG, I/O space, physical memory and logical memory are not. This causes
automatic flushing whenever MEMC's page mapping is altered, which allows
programs written for the ARM2 (including RISC OS itself) to run unaltered,
but at the expense of unnecessary flushing on page swaps.
Related SWIs
------------
Cache_Cacheable (SWI &281), Cache_Updateable (SWI &282)
Related vectors
---------------
None
Cache_Flush (SWI &284)
======================
Flushes the cache
On entry
--------
-
On exit
-------
-
Interrupts
----------
Interrupts are disabled
Fast interrupts are enabled
Processor mode
--------------
Processor is in SVC mode
Re-entrancy
-----------
Not defined
Use
---
This call flushes the cache by writing to the ARM3's control register 1.
Related SWIs
------------
None
Related vectors
---------------
None
* Commands
==========
*Cache
======
Turns the cache on or off, or gives the cache's current state
Syntax
------
*Cache [On|Off]
Parameters
----------
On or Off
Use
---
*Cache turns the cache on or off. With no parameter, it gives the cache's
current state.
Example
-------
*Cache Off
Related commands
----------------
*Configure Cache
Related SWIs
------------
Cache_Control (SWI &280)
Related vectors
---------------
None
*Configure Cache
================
Sets the configured cache state to be on or off
Syntax
------
*Configure Cache On|Off
Parameters
----------
On or Off
Use
---
*Configure Cache sets the configured cache state to be on or off.
Example
-------
*Configure Cache On
Related commands
----------------
*Cache
Related SWIs
------------
Cache_Control (SWI &280)
Related vectors
---------------
None
******************************************************************************
I hope this helps.
- Michael J Hardy Email: mhardy@acorn.co.uk
Acorn Computers Ltd Telephone: +44 223 214411
Cambridge TechnoPark Fax: +44 223 214382
645 Newmarket Road Telex: 81152 ACNNMR G
Cambridge CB5 8PB
England Disclaimer: All opinions are my own, not Acorn's
From: osmith@acorn.co.uk (Owen Smith)
Subject: Re: Risc-OS Documentation
Date: 13 Aug 91 15:06:19 GMT
The ARM3 SWIs really aren't all that interesting, and I've just totally
failed to find a documentation file for them. However, as a tester, here
is a bit of BASIC (courtesy of Brian Brunswick) which marks the RAM disk
area as not cacheable. This in fact makes it go faster.
SYS "Cache_Cacheable", 0, &fffffcff
SYS "Cache_Updateable", 0, &fffffcff
The reason it goes faster is that because such large amounts of data are
being slurped around, the memory copy loop tends to get flushed out of
the cache, particularly since it is a long piece of loop unrolled code
(for speed on an ARM2). So you end up with a cache full of data, very little
of which is ever accessed again before it gets flushed out of the cache by
some more data. The loop does an LDM and STM 10 registers at a time in
RamFS, so in theory there are two words that get cached (ARM3 read 4 words
at a time), but this saving is swallowed up by the cache synchronisation
delays.
You have to be careful though. Brian has his own re-sizing ram disk
which uses the system sprite area. Marking the system sprite are as not
cacheable makes it go slower. We (Brian and I) think this is because he
uses the C function memcpy(), in which the LDM and STM is 4 registers
at a time. Since this is a multiple of four, it hits the ARM bug where
it loads 5 words and then throws the fifth one away, which results in
loading 8 words on an ARM3 (it always reads 4 word chunks even with the
cache off). So with the cache off, you load 8 then throw 4 away, load the
next 8 (including the 4 you just threw away) and throw 4 away etc. So
you are effectively reading all the data twice. With the cache on this
goes down to once. Yes the code will probably get flushed out, but it
is a tight loop (not unrolled) so it is not very likely and the cost of
reloading the code is less than the saving on the data loads.
The moral of this is to be careful with the ARM3 SWIs, and don't just
think that it ought to go faster, do timings, in lots of different screen
modes.
Owen.
poppy@poppyfields.net
|
|