Motherboard Forums


Reply
Thread Tools Display Modes

Inline assembler on PowerPC

 
 
David R Brooks
Guest
Posts: n/a
 
      06-13-2005, 01:50 PM
Consider the following (compiler=GCC3.4.3, host=I686,
target=powerpc-eabi):

typedef void(*pVoid)(void);

static inline bool1 kSetVector(uint1 level, pVoid func, int type) {
int r;
const int code = 0;
__asm__ __volatile__ (
" li 0, %1 \n" /* code */
" mr 3, %2 \n" /* level */
" mr 4, %3 \n" /* func */
" mr 5, %4 \n" /* type */
" sc \n" /* System Call: may corrupt regs: result in r3 */
" mr %0, 3 \n" /* Return result */
: "=r" (r)
: "rI" (code), "0" (level), "r" (func), "r" (type)
: "r0", "cc", "memory"
);
return r;
}
....
(void)kSetVector(31, SerialIoInterrupt, 3);

This compiles, & runs fine (producing the code below). However I
would like to improve the efficiency, by eliminating the "mr"
instructions to move arguments to & from registers. The "sc" needs the
data in precisely the registers shown, so GCC needs to be coaxed into
using those registers itself.

Generated code (comments added):

54:h/services.h **** static inline bool1 kSetVector(uint1 level,
pVoid func, int type) {
203 .loc 2 54 0
204 019c 3940001F li 10,31 /* level */
205 01a0 3D200000 lis 9,SerialIoInterrupt@ha /* func */
206 01a4 39290000 la 9,SerialIoInterrupt@l(9)
207 01a8 39600003 li 11,3 /* type */
208 .LBB3:
55:h/services.h **** int r;
56:h/services.h **** const int code = 0;
57:h/services.h **** __asm__ __volatile__ (
209 .loc 2 57 0
210 01ac 38000000 li 0, 0
211 01b0 7D435378 mr 3, 10 /* The "mr's" I want to remove */
212 01b4 7D244B78 mr 4, 9
213 01b8 7D655B78 mr 5, 11
214 01bc 44000002 sc
215 01c0 7C6A1B78 mr 10, 3 /* result */

In the X86 builds of GCC, there are "register loading codes", as "c",
"a" & "D" in the following example (from: "Using Inline Assembly With
gcc" by Clark L. Coleman).

asm ("cld\n\t" "rep\n\t" "stosl"
: /* no output registers */
: "c" (count), "a" (fill_value), "D" (dest)
: "%ecx", "%edi" );

Is there a similar device for the PowerPC, whereby I can tell GCC to
create the values in specific registers, so eliminating the need for
those "mr" instructions?
TIA,

 
Reply With Quote
 
 
 
 
l'indien
Guest
Posts: n/a
 
      06-13-2005, 09:46 PM
On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote:

> Consider the following (compiler=GCC3.4.3, host=I686,
> target=powerpc-eabi):
>
> typedef void(*pVoid)(void);
>
> static inline bool1 kSetVector(uint1 level, pVoid func, int type) {
> int r;
> const int code = 0;
> __asm__ __volatile__ (
> " li 0, %1 \n" /* code */
> " mr 3, %2 \n" /* level */
> " mr 4, %3 \n" /* func */
> " mr 5, %4 \n" /* type */
> " sc \n" /* System Call: may corrupt regs: result in r3 */
> " mr %0, 3 \n" /* Return result */
> : "=r" (r)
> : "rI" (code), "0" (level), "r" (func), "r" (type)
> : "r0", "cc", "memory"
> );
> return r;
> }
> ...
> (void)kSetVector(31, SerialIoInterrupt, 3);
>
> This compiles, & runs fine (producing the code below). However I
> would like to improve the efficiency, by eliminating the "mr"
> instructions to move arguments to & from registers. The "sc" needs the
> data in precisely the registers shown, so GCC needs to be coaxed into
> using those registers itself.


Imho, the easiest way is to do it ... in C:
static inline bool1 kSetVector (uint1 level, pVoid func, int type)
{
register uint1 _level __asm__ ("r3");
register pVoid _func __asm__ ("r4");
register int _type __asm__ ("r5");

_level = level;
_func = func;
_type = type;
__asm__ __volatile__ (
"li 0, %1 \n"
"sc \n"
: "=r" (_level)
: "rI" (code)
: "r0", "cc", "memory");

return _level;
}

Then gcc will be able to optimise variables allocations then only produce
mr or lwz if necessary.
The second thing to consider is that this code is more easily readable
than any inline assembly dependency.
The only drawback is that you have to use the same local variable for the
first argument and the returned value.

[...]

 
Reply With Quote
 
 
 
 
David Brown
Guest
Posts: n/a
 
      06-14-2005, 06:59 AM
l'indien wrote:
> On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote:
>
>
>>Consider the following (compiler=GCC3.4.3, host=I686,
>>target=powerpc-eabi):
>>
>>typedef void(*pVoid)(void);
>>
>>static inline bool1 kSetVector(uint1 level, pVoid func, int type) {
>> int r;
>> const int code = 0;
>> __asm__ __volatile__ (
>> " li 0, %1 \n" /* code */
>> " mr 3, %2 \n" /* level */
>> " mr 4, %3 \n" /* func */
>> " mr 5, %4 \n" /* type */
>> " sc \n" /* System Call: may corrupt regs: result in r3 */
>> " mr %0, 3 \n" /* Return result */
>> : "=r" (r)
>> : "rI" (code), "0" (level), "r" (func), "r" (type)
>> : "r0", "cc", "memory"
>> );
>> return r;
>>}
>>...
>>(void)kSetVector(31, SerialIoInterrupt, 3);
>>
>> This compiles, & runs fine (producing the code below). However I
>>would like to improve the efficiency, by eliminating the "mr"
>>instructions to move arguments to & from registers. The "sc" needs the
>>data in precisely the registers shown, so GCC needs to be coaxed into
>>using those registers itself.

>
>
> Imho, the easiest way is to do it ... in C:
> static inline bool1 kSetVector (uint1 level, pVoid func, int type)
> {
> register uint1 _level __asm__ ("r3");
> register pVoid _func __asm__ ("r4");
> register int _type __asm__ ("r5");
>
> _level = level;
> _func = func;
> _type = type;
> __asm__ __volatile__ (
> "li 0, %1 \n"
> "sc \n"
> : "=r" (_level)
> : "rI" (code)
> : "r0", "cc", "memory");
>
> return _level;
> }
>
> Then gcc will be able to optimise variables allocations then only produce
> mr or lwz if necessary.
> The second thing to consider is that this code is more easily readable
> than any inline assembly dependency.
> The only drawback is that you have to use the same local variable for the
> first argument and the returned value.
>
> [...]
>


Of course, you will still get pretty much the same "mr" instructions in
the stand-alone version of the function (if it is generated) - it is
only in in-lined versions that they could be eliminated.

And I presume you are only doing this optomisation for interest and
understanding, not because you are setting vectors so often that 3
cycles delay here will be a serious issue?

David
 
Reply With Quote
 
l'indien
Guest
Posts: n/a
 
      06-14-2005, 07:57 AM
On Tue, 14 Jun 2005 08:59:01 +0200, David Brown wrote:

> l'indien wrote:
>> On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote:
>>
>>
>>>Consider the following (compiler=GCC3.4.3, host=I686,
>>>target=powerpc-eabi):
>>>
>>>typedef void(*pVoid)(void);
>>>
>>>static inline bool1 kSetVector(uint1 level, pVoid func, int type) {
>>> int r;
>>> const int code = 0;
>>> __asm__ __volatile__ (
>>> " li 0, %1 \n" /* code */
>>> " mr 3, %2 \n" /* level */
>>> " mr 4, %3 \n" /* func */
>>> " mr 5, %4 \n" /* type */
>>> " sc \n" /* System Call: may corrupt regs: result in r3 */
>>> " mr %0, 3 \n" /* Return result */
>>> : "=r" (r)
>>> : "rI" (code), "0" (level), "r" (func), "r" (type)
>>> : "r0", "cc", "memory"
>>> );
>>> return r;
>>>}
>>>...
>>>(void)kSetVector(31, SerialIoInterrupt, 3);
>>>
>>> This compiles, & runs fine (producing the code below). However I
>>>would like to improve the efficiency, by eliminating the "mr"
>>>instructions to move arguments to & from registers. The "sc" needs the
>>>data in precisely the registers shown, so GCC needs to be coaxed into
>>>using those registers itself.

>>
>>
>> Imho, the easiest way is to do it ... in C:
>> static inline bool1 kSetVector (uint1 level, pVoid func, int type)
>> {
>> register uint1 _level __asm__ ("r3");
>> register pVoid _func __asm__ ("r4");
>> register int _type __asm__ ("r5");
>>
>> _level = level;
>> _func = func;
>> _type = type;
>> __asm__ __volatile__ (
>> "li 0, %1 \n"
>> "sc \n"
>> : "=r" (_level)
>> : "rI" (code)
>> : "r0", "cc", "memory");
>>
>> return _level;
>> }
>>
>> Then gcc will be able to optimise variables allocations then only produce
>> mr or lwz if necessary.
>> The second thing to consider is that this code is more easily readable
>> than any inline assembly dependency.
>> The only drawback is that you have to use the same local variable for the
>> first argument and the returned value.
>>
>> [...]
>>

>
> Of course, you will still get pretty much the same "mr" instructions in
> the stand-alone version of the function (if it is generated) - it is
> only in in-lined versions that they could be eliminated.


You won't have any mr in the stand-alone version:
as the arguments are passed in registers r3 ..., then level already is in
r3, func in r4 and type in r5.
As the returned argument is into r3, there won't be any mr at all.
Then, when I compile this function as a standalone one, I get:
00000000 <kSetVector>:
0: 38 00 00 00 li r0,0
4: 44 00 00 02 sc
8: 4e 80 00 20 blr

Which is optimal.

> And I presume you are only doing this optomisation for interest and
> understanding, not because you are setting vectors so often that 3
> cycles delay here will be a serious issue?


We always want optimal code, don't we ? ;-)

 
Reply With Quote
 
David R Brooks
Guest
Posts: n/a
 
      06-14-2005, 10:12 AM
Many thanks. That works with one addition: you still have to mention
all the arguments to the "sc" (_level, _func, _type) on the inputs
line, else GCC will optimise them away.
I got it down to:

static inline bool1 kSetVector (uint1 level, pVoid func, int type)
{
register uint1 _code __asm__ ("r0") = 0;
register uint1 _level __asm__ ("r3") = level;
register pVoid _func __asm__ ("r4") = func;
register int _type __asm__ ("r5") = type;

__asm__ __volatile__ (
"sc \n"
: "=r" (_level)
: "rI" (_code), "0" (_level), "r" (_func), "r" (_type)
: "cc", "memory" );

return _level;
}


l'indien <(E-Mail Removed)> wrote:

:On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote:
:
:> Consider the following (compiler=GCC3.4.3, host=I686,
:> target=powerpc-eabi):
:>
:> typedef void(*pVoid)(void);
:>
:> static inline bool1 kSetVector(uint1 level, pVoid func, int type) {
:> int r;
:> const int code = 0;
:> __asm__ __volatile__ (
:> " li 0, %1 \n" /* code */
:> " mr 3, %2 \n" /* level */
:> " mr 4, %3 \n" /* func */
:> " mr 5, %4 \n" /* type */
:> " sc \n" /* System Call: may corrupt regs: result in r3 */
:> " mr %0, 3 \n" /* Return result */
:> : "=r" (r)
:> : "rI" (code), "0" (level), "r" (func), "r" (type)
:> : "r0", "cc", "memory"
:> );
:> return r;
:> }
:> ...
:> (void)kSetVector(31, SerialIoInterrupt, 3);
:>
:> This compiles, & runs fine (producing the code below). However I
:> would like to improve the efficiency, by eliminating the "mr"
:> instructions to move arguments to & from registers. The "sc" needs the
:> data in precisely the registers shown, so GCC needs to be coaxed into
:> using those registers itself.
:
:Imho, the easiest way is to do it ... in C:
:static inline bool1 kSetVector (uint1 level, pVoid func, int type)
:{
: register uint1 _level __asm__ ("r3");
: register pVoid _func __asm__ ("r4");
: register int _type __asm__ ("r5");
:
: _level = level;
: _func = func;
: _type = type;
: __asm__ __volatile__ (
: "li 0, %1 \n"
: "sc \n"
: : "=r" (_level)
: : "rI" (code)
: : "r0", "cc", "memory");
:
: return _level;
:}
:
:Then gcc will be able to optimise variables allocations then only produce
:mr or lwz if necessary.
:The second thing to consider is that this code is more easily readable
:than any inline assembly dependency.
:The only drawback is that you have to use the same local variable for the
:first argument and the returned value.
:
:[...]

 
Reply With Quote
 
l'indien
Guest
Posts: n/a
 
      06-14-2005, 10:33 AM
On Tue, 14 Jun 2005 18:12:31 +0800, David R Brooks wrote:

> Many thanks. That works with one addition: you still have to mention
> all the arguments to the "sc" (_level, _func, _type) on the inputs
> line, else GCC will optimise them away.


You're absolutely right. I have to admit I wrote it down without testing...

> I got it down to:
>
> static inline bool1 kSetVector (uint1 level, pVoid func, int type)
> {
> register uint1 _code __asm__ ("r0") = 0;
> register uint1 _level __asm__ ("r3") = level;
> register pVoid _func __asm__ ("r4") = func;
> register int _type __asm__ ("r5") = type;
>
> __asm__ __volatile__ (
> "sc \n"
> : "=r" (_level)
> : "rI" (_code), "0" (_level), "r" (_func), "r" (_type)
> : "cc", "memory" );
>
> return _level;
> }


I just have two questions/remarks:
- why don't you directly initialise _code = code ? This would make code
even more easy to read and won't product more output code.
- I would use "+r" constraint for _level, to follow gcc asm constraints
specifications. But, I'm not a specialist on this point, I must admit...


> l'indien <(E-Mail Removed)> wrote:
>
> :On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote:
> :
> :> Consider the following (compiler=GCC3.4.3, host=I686,
> :> target=powerpc-eabi):
> :>
> :> typedef void(*pVoid)(void);
> :>
> :> static inline bool1 kSetVector(uint1 level, pVoid func, int type) {
> :> int r;
> :> const int code = 0;
> :> __asm__ __volatile__ (
> :> " li 0, %1 \n" /* code */
> :> " mr 3, %2 \n" /* level */
> :> " mr 4, %3 \n" /* func */
> :> " mr 5, %4 \n" /* type */
> :> " sc \n" /* System Call: may corrupt regs: result in r3 */
> :> " mr %0, 3 \n" /* Return result */
> :> : "=r" (r)
> :> : "rI" (code), "0" (level), "r" (func), "r" (type)
> :> : "r0", "cc", "memory"
> :> );
> :> return r;
> :> }
> :> ...
> :> (void)kSetVector(31, SerialIoInterrupt, 3);
> :>
> :> This compiles, & runs fine (producing the code below). However I
> :> would like to improve the efficiency, by eliminating the "mr"
> :> instructions to move arguments to & from registers. The "sc" needs the
> :> data in precisely the registers shown, so GCC needs to be coaxed into
> :> using those registers itself.
> :
> :Imho, the easiest way is to do it ... in C:
> :static inline bool1 kSetVector (uint1 level, pVoid func, int type)
> :{
> : register uint1 _level __asm__ ("r3");
> : register pVoid _func __asm__ ("r4");
> : register int _type __asm__ ("r5");
> :
> : _level = level;
> : _func = func;
> : _type = type;
> : __asm__ __volatile__ (
> : "li 0, %1 \n"
> : "sc \n"
> : : "=r" (_level)
> : : "rI" (code)
> : : "r0", "cc", "memory");
> :
> : return _level;
> :}
> :
> :Then gcc will be able to optimise variables allocations then only produce
> :mr or lwz if necessary.
> :The second thing to consider is that this code is more easily readable
> :than any inline assembly dependency.
> :The only drawback is that you have to use the same local variable for the
> :first argument and the returned value.
> :
> :[...]


 
Reply With Quote
 
David R Brooks
Guest
Posts: n/a
 
      06-14-2005, 10:39 PM
Answering your questions:
1. _code is explicitly a constant: being the function code. There are
several similar definitions in the header file, having different names
& corresponding function codes. The number of arguments varies too.
2. "+r", although legal in pure asm, is not accepted by GCC.

l'indien <(E-Mail Removed)> wrote:

:On Tue, 14 Jun 2005 18:12:31 +0800, David R Brooks wrote:
:
:> Many thanks. That works with one addition: you still have to mention
:> all the arguments to the "sc" (_level, _func, _type) on the inputs
:> line, else GCC will optimise them away.
:
:You're absolutely right. I have to admit I wrote it down without testing...
:
:> I got it down to:
:>
:> static inline bool1 kSetVector (uint1 level, pVoid func, int type)
:> {
:> register uint1 _code __asm__ ("r0") = 0;
:> register uint1 _level __asm__ ("r3") = level;
:> register pVoid _func __asm__ ("r4") = func;
:> register int _type __asm__ ("r5") = type;
:>
:> __asm__ __volatile__ (
:> "sc \n"
:> : "=r" (_level)
:> : "rI" (_code), "0" (_level), "r" (_func), "r" (_type)
:> : "cc", "memory" );
:>
:> return _level;
:> }
:
:I just have two questions/remarks:
:- why don't you directly initialise _code = code ? This would make code
:even more easy to read and won't product more output code.
:- I would use "+r" constraint for _level, to follow gcc asm constraints
:specifications. But, I'm not a specialist on this point, I must admit...
:
[snip]

 
Reply With Quote
 
R Adsett
Guest
Posts: n/a
 
      06-15-2005, 02:49 AM
In article <(E-Mail Removed)>,
(E-Mail Removed) says...
> On Tue, 14 Jun 2005 08:59:01 +0200, David Brown wrote:
>
> > l'indien wrote:
> > And I presume you are only doing this optomisation for interest and
> > understanding, not because you are setting vectors so often that 3
> > cycles delay here will be a serious issue?

>
> We always want optimal code, don't we ? ;-)


Actually no. Readable (human readable) and correct first. Optimal is,
at best, a distant third.

Robert
 
Reply With Quote
 
l'indien
Guest
Posts: n/a
 
      06-15-2005, 07:27 AM
On Wed, 15 Jun 2005 06:39:40 +0800, David R Brooks wrote:

> Answering your questions:
> 1. _code is explicitly a constant: being the function code. There are
> several similar definitions in the header file, having different names
> & corresponding function codes. The number of arguments varies too.


OK, sorry, I misread your code...

> 2. "+r", although legal in pure asm, is not accepted by GCC.


I did the test, gcc does accept it.
"+r" is documented in gcc documentation (I'm using gcc 2.95.3 as a PowerPC
cross compiler).

[...]

 
Reply With Quote
 
Anton Erasmus
Guest
Posts: n/a
 
      06-16-2005, 10:16 AM
On Tue, 14 Jun 2005 22:49:54 -0400, R Adsett
<(E-Mail Removed)> wrote:

>In article <(E-Mail Removed)>,
>(E-Mail Removed) says...
>> On Tue, 14 Jun 2005 08:59:01 +0200, David Brown wrote:
>>
>> > l'indien wrote:
>> > And I presume you are only doing this optomisation for interest and
>> > understanding, not because you are setting vectors so often that 3
>> > cycles delay here will be a serious issue?

>>
>> We always want optimal code, don't we ? ;-)

>
>Actually no. Readable (human readable) and correct first. Optimal is,
>at best, a distant third.


Optimal implies correct code. One cannot decribe anything as an
optimal solution, if it does not do what it is supposed to do.
Things that are obscure at first, become very "Human Readable" if it
is the optimum solution to a problem.
Readable code for even a complete newby programmer is total black
magic to the avarage lay person.

Regards
Anton Erasmus


 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
PowerPC: Problem with some assembler opcodes Gilles Embedded 3 02-05-2008 08:06 PM
Multiple inline Assembler-Commands with IAR Embedded Workbench for MSP430? Bastian Stahmer Embedded 7 04-08-2005 03:59 PM
Need help for converting inline assembly to intrinsic functions AMD64 manojkumar_net@rediffmail.com Intel 2 03-29-2005 05:04 AM
how different is pic assembler and avr assembler? Michael Embedded 3 08-14-2003 12:21 PM
Trying to Write ToggleLED for MPC860 using GCC inline assembly Fahd Embedded 9 07-08-2003 07:58 PM


All times are GMT. The time now is 01:09 PM.


Welcome!
Welcome to Motherboard Point
 

Advertisment