Motherboard Forums


Reply
Thread Tools Display Modes

Parallel Battle/Core Executors for Corewars/Red code on the gpu

 
 





















Skybuck Flying
Guest
Posts: n/a

 
      09-02-2009, 01:31 AM


Hello,

Below I will describe my preliminary/sketch idea(s)/concept(s) for parallel
battle/core executors for corewars/red code on the gpu.

I will focus on sequential execution per battle... however the battles
themselfes will be executed in parallel so that multiple battles can take
place at the same time.

Though this would require detecting when a battle is done and can be
replaced by a new battle... if more battles are to be processed... otherwise
the "done" battle would not be executed anymore and it would simply wait
until all other battles are done... so that for example an evolution
algorithm can start after all battles are done.

The idea is to stuff all the cores into texture(s)... as well as all other
necessary data... like instruction pointers, process queue, possibly even
p-space.

For now I will focus on cores and instruction pointers/instruction execution
only... because if that can be done then the rest should be easily doable as
well.

It seems like primarily two things need to happen:

1. Instructions need to be executed.

2. Cores need to be updated.

I will now explain these two steps further.

Step 1:

For the instructions reads and writes might need to occur. Reading is not a
problem on the gpu. Writing is a problem and this cannot be done. Only
writing to the current cell can be done and this is not sufficient that's
why step 2 is needed.

However to solve this problem for step 1 the following might be possible:

"Artificial registers will be created".

These registers will hold all information about the execution of an
instruction.

So for example:

1. What data/memory cells where read by the instruction ? (might not be
necessary)
2. What data/memory cells where written by the instruction ?
3. What was the read data's content ?
4. What was the written data's content ?

There will be a worst case scenerio... meaning that the most complex
instruction will only affect X cells... say 5 or so....

So there only need to be enough registers to contain all possible data for a
"worst case instruction".

Now the GPU can happily execute an instruction and record all the necessary
information.

This could be done in a vertex program/shader as step 1. Maybe vertex
shaders could be used to trigger depths for step2/textures so that only the
necessary cells become active.

Ok I will now explain step 2.

Step 2:

In step 2 the cores need to be updated and any other variables/arrays like
pspace, process queues and what not... again I will limit myself to cores
for now... since the rest could more or less use the same technique.

I am not sure what implementation possibilities there exist but in the worst
case:

All core cells would be examined by a fragment/pixel shader... it moves
across the cells and at each cell it examines if this cell was affected by
the instruction that was executed... and if so... how it needs to alter this
cell to comply with the instruction.

This is the update.

Should be pretty easy to do... just a few comparisions would be needed with
some recorded information/fields about the instruction.

For example to give you an idea if you dont have one :

We are at cell 5 so ask the question: Did the instruction write to cell 5 ?
(us).

If so then ask next question: What is the content that must be written to
cell 5 ? Answer is again the record instruction information fields.

I am pretty much convinced that this should be easily doable.

The question is can somehow all cells be deactived except those that the
instruction affected in the vertex program ?

So to cut the question down to something simple that graphics programmers
can understand:

1. Is it possible for a vertex shader to disable certain
processing/pixels/fragments in the fragment shader ? (Can vertex shaders
dismiss some of the work)

The answer is probably yes.

2. The second question would than be how is this possible implementation
wise ?

(I am not sure but I read something about

2.1 Maybe sciccors.
2.2 Maybe occlusion queries.
2.3 Maybe depths.

Maybe you graphics guys can further explain or have some idea's ?

Anyway the vertex shaders can also access textures on more recent graphics
cards. So the vertex shaders could also already do some processing of the
instruction. Especially if necessary I would not expect any problems here...
So let's of possibilities me thinks.

So let's see if I covered everything concerning instruction execution and
core updates.

The last thing that needs to be done is prepare for the next cycle/round. So
all that needed to be done is specifiy the next instruction pointer if
any... and possibility a spawn instruction pointer... and that's also done
at step 1. This information could simply be written back to the
instruction/vertexes themselfes... unless they need to be preserved for step
2... in that case some extra output registers needed... but all in all
should be doable.

Some further explinations about step 1:

Even at step 1 vertex shaders can probably only modify the verteces
themselfes...

So input vertex = output vertex.

So to be able to do a sort of 1 to many... there need to be many input
verteces=output verteces.

Each vertex would be a certain type/data.

The vertex could be the instruction pointer or
The vertex could be the instruction field A or
The vertex could be the instruction field B or
The vertex could be the instruction modifier
The vertex could be the next instruction pointer
The vertex could be the spawn instruction pointer.

So for every possibility/output there is an input.

Depending on the location of the vertex it would be a certain type....

So the vertex at the start of it's processing only has to figure out what
kind of vertex it is and what it should be looking for/what kind of
processing it should do for the instruction.

So for example:

If vertex location is number 4 then vertex is the instruction modifier.

So it's like an array of integers/floats, where each element performs a
specific role.

I hope I made myself perfectly clear and I think I did !

Maybe a little bit of parallel processing for the instruction could happen.

If a vertex needs information from multiple locations to do it's thing than
that should not be a problem since the vertex can do multiple reads/gathers.
Alternatively larger data types could be used to do more in the same shader
like floats 4 or so... but that probably not really necessary... an array of
single floats might be enough... the choice would be arbitrare except maybe
for performance reasons... or maybe it doesn't matter performance wise... I
don't know yet. But any implementation at this point would do ! So later
this is something that could be experimented with to see if one or the other
style gives more performance

Ok, I am pretty much done now, I think I gave you guys a pretty good idea
how a core executor and therefore multiple core executors could be
implemented on a gpu !

One last explanation about that last concept:

Multiple cores/battles/information could be stored into one big texture.

Battle information 1 would be at offset 0 to 32000.
Battle information 2 would be at offset 32001 to 64000.
Battle information 3 would be at offset 64001 to 96000.
And so forth...

This way all verteces and pixel shaders can use arithmetic to understand to
which battle they belong by simply determine the battle base address first
for example:

Battle Number = Offset div 32000; <- indicates to what battle they belong

Battle Vertex offset = Offset mod 32000; <- indicates where the vertex is
within the "battle information array" so it can understand what type it is.

Same goes for core cells.

Cores could start at for example Battle Base + 8000;

if (Offset >= Battle Base + Core Start) and (Offset <= Battle Base + Core
End) then
begin
// pixel is a core cell.
end;

Ok that should be enough for now !

Bye,
Skybuck.


 
Reply With Quote
 
Terry Newton
Guest
Posts: n/a

 
      09-02-2009, 04:45 PM
Hi,

On Sep 1, 7:31*pm, "Skybuck Flying" <BloodySh...@hotmail.com> wrote:
> 1. Instructions need to be executed.
>
> 2. Cores need to be updated.


Yes the essential tasks for parallel computing.
First some observations...

The ultimate strength of an evolving scheme is determined by
the environment being simulated, NOT by processing power.
Parallel processing, supercomputers, etc, just provides the
answer faster. Instead of taking days to "max out", it might
only take minutes. But better output requires a better algorithm.

Execution.... a good MARS that can run on a GPU is essential.
Obviously :-) the evolving control code is much simpler so
that probably isn't a problem.

Updating... the less the cores have to talk to each other,
the faster it goes. I think the best scheme would be to take
a large soup and assign each core a subset of soup locations,
so that only the edges of the areas have to be communicated.

Gotta go...
Terry
 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      09-02-2009, 05:09 PM
You should start working on a modified/new evolver where your evolver can
supply multiple or all battles in one go... so that they can be fought in
parallel.

Ok ?

Battle/warrior information could be in one file or multiple files... or
whatever you want...

Could be simply text or xml...

Could also be binary... can your qbasic read/write to binary files... I was
wondering that...

Also are you able to execute your evolver on Windows XP ?

I would like to see your evolver become faster by using such a parallel
battle/core executor.

At the moment I am having a little bit of problems with opengl... it's
support for Delphi seems to suck... I could switch to another language like
C but then I would probably have to use the Horribly Slow Visual Studio


At least thanks to C everything should then be working... even for opengl
with extensions ? but me not sure.

But then again I would really like to have something working in Delphi since
I know it well...

Also if xp is not a possibility for you then maybe, just maybe free pascal
could be an option to produce something for linux or so...

But then again... maybe your evolver only works in some dosbox ? I do
believe I ran it in windows 95 so it should ran at least on xp... Then
there is also wine but that would suck for speed me thinks

Also C would have added benefit of just using PMARS source code for redcode
compiler/assembler... though I could probably rip that out stuff it in DLL
and use it in Delphi...

I could do the same with the opengl stuff.

Maybe first try opengl in C to see if it's working with extensions...

Then develop concept code in Delphi...

Port it to C... implement it with opengl stuff... and then run it... etc...

If it's working make a nice DLL... use that again in Delphi... to do cool
stuff.

Or simply integrate everything into pmars

Actually I think this could be a nice way to go... because I would probably
have to develop a prototype first.

Might as well do that in Delphi with it's reliable language...

Then later port it to C...

Only problem is I cannot experiment with OpenGL and learn from it and that
sux...

I was hoping to do that to get more of a feel for what's possible... <- I
need that...

So this still needs to be resolved.... and I rather experiment with robust
Delphi on my side then ****ing C crap

But then again that might not be possible so then I have to start screwing
with C which bores me to death.Also very annoying.. all the stupid
abbrevations like Cd instead of Code... and Blzrd instead of Blizard... and
Cntx instead of Contex... like those few letters matter LOL.

DirectX shitty complex... but I am gonna give that a try first... at least
that should compile/work in Delphi but now I need to go find some good
up-to-date headers again

Or alternatively a good opengl32.dll with extensions... don't know if that
exists...

So many possibly paths to follow... so that's where I am at at the moment


Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      09-02-2009, 06:40 PM
Ok it seemed to be an initialization issue with the opengl
api/headers/pascal units.

I now believe:

0. Basic opengl functions must be loaded.

1. First a rendering context must be made.

2. Then the special extensions can be loaded.

I didn't do 1 because I am a newb and I didn't think it was necessary...
since the dlg opengl unit says: InitOpenGL...

So I thought that was enough or so... and that rendering context might come
later or so... which it does... but it has to come before loading the
extensions !

<GJE>

Now that that's cleared up I should be able to make some further progress
tomorrow with Delphi and OpenGL and experimentation ! =DDDD

Bye,
Skybuck =D


 
Reply With Quote
 
Terry Newton
Guest
Posts: n/a

 
      09-02-2009, 08:46 PM
On Sep 2, 11:09 am, "Skybuck Flying" <BloodySh...@hotmail.com> wrote:
> You should start working on a modified/new evolver where your evolver can
> supply multiple or all battles in one go... so that they can be fought in
> parallel.
>
> Ok ?


The algorithm I've been using for the last 10 years or so is
fully parallel-ready and can be mapped to as many cores is available.
Up to a point anyway, if too fine-grained then communications
overhead exceeds the benefit of additional cores. But something
in the 10-100 core range should be doable and provide great
speed increase but not necessarily better results, in fact any
parallel scheme should probably be modelled using conventional
techniques first before wasting time with all the parallel stuff.
Any real advances in evolving technology will also work on
a regular system, just take much longer.

> Could also be binary... can your qbasic read/write to binary files... I was
> wondering that...


Yes QBasic writes binary... but so does just about every BASIC.
Lately I've been using better solutions than QBasic that provide
far better cross-compatibility... FreeBasic is very nice and can
link to modules written in other languages (C asm etc). On the
other end of the scale, Blassic is very primitive (line numbers:-)
but runs on just about anything. Both of these solutions are
(or can be) compatible with QBasic, but permit the code to
be run natively under Dos, Windows, Linux and probably Mac
too (Mac's are based on BSD which has much the same
compiler tools, so things like pmars Blassic etc might work).

But this doesn't matter for multi-core GPU processing... for
that have to use tools that are supported by the environment.
Oh it's a huge endevour! but an interesting one - there are
companies making boards with multi GPU cores - maybe
even Cell processors - for making personal supercomputers
for uses way beyond Corewar. There are special parallel-ready
languages like OpenMP for such things.

> Also are you able to execute your evolver on Windows XP ?


Of course. All of them (I dabble in many evolvers, each designed
to test different ideas - and for REBS you'll need Cygwin but
RedMixer mostly replaces REBS). Vista might complain about the
more dossy stuff (one reason why I'm moving to more universal
solutions) but XP is not a problem. I use Linux but most of my
"customers" use Windows so I lean towards solutions that
let essentially the same code run on either OS.

> So many possibly paths to follow... so that's where I am at at the moment


Figure out how to get code running in a GPU with communications
to a control program running normally, that'd be a good start.

Terry
 
Reply With Quote
 
Terry Newton
Guest
Posts: n/a

 
      09-03-2009, 07:53 AM
To clarify a couple of things...

On Sep 2, 2:46*pm, Terry Newton <nb...@bellsouth.net> wrote:
> The algorithm I've been using for the last 10 years or so is
> fully parallel-ready and can be mapped to as many cores is available.


Essentially that algorithm is:

Pick two warriors that are "near by" in the soup
Battle them with MARS, determine winner (if tie pick one)
Replace the loser with a mutated copy of the winner
Repeat until the evolver is stopped.

Of course there's a huge amount of variability within this
simple recipe - soup selection determines topology (shape)
of the soup which determines the rate of spread, rounds
determines accuracy (and forgiveness, that's important)
and mutations vary widely including changing code,
inserting/deleting/swapping lines, crossover, etc.
Together these things determine what kind of code
tends to be generated and how strong it can get.
Other factors can be mixed in like a hill, valhalla,
benchmark-based guidance, hints, etc. There are other
ways to evolve but I usually use this overall method
because it's simple and easy to program, and sometimes
works very well (especially for nano).

Anyway, the basic process is easily cast in parallel.
One way with multi core shared memory is like:

Spawn thread that runs:
Pick two warriors that aren't "busy"
Set busy flags for these warriors (*)
Battle them in MARS
Replace loser with mutated winner
Unset busy flags
Repeat

(*) must be done carefully to avoid race condition

This method is nice because the OS will automatically
pick the core for each spawned thread to run on, so
no sectioning is needed on the part of the programmer.

Another way to do it that would work well with a grid
topology but takes more programming is to divide the
grid into a bunch of sub grids with private memory,
surrounded by "shadow" locations from the other
sections copied to and from as accessed and updated.
Then each subgrid runs the simple algorithm as a
single thread (or multiple threads). Because memory
bandwith is increased by using private memory, this
method can be much faster assuming a suitable
machine is available to run it.

> ... There are special parallel-ready
> languages like OpenMP for such things.


Actually, OpenMP is a library that's linked to normal
languages like C or Fortran to make it easier to write
parallel programs. I'm not familiar with it, but I hear
about it a lot.

Terry
 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      09-03-2009, 10:09 AM
Ok,

Since I already have a multi-core executor I shall make an executable for
you... so you can feed multiple battles to the executable and then the
executable will return all results after the last battle is done.

It will be interesting to see if you can put your skills where your writing
is LOL

I don't know exact details yet... but I will give it a try

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      09-03-2009, 10:49 AM
I already have algorithms in place for battling multiple warriors be it 1 v
1 or multi-fights.

Busy flag is not needed.

Warriors are simply copied to battles so all battles can happen in parallel.

Once all battles are over the warrior with the best score is selected as
winner for the cell that was fought for.

It's been a while since I wrote the battlefield evolver of mine and the code
is very good but a bit fuzy... because of all the advanced oo and
inheritance stuff and such.

I think it's most easy if your evolver is plugged into my battlefield
"framework".

The my battlefield "executor" can take care of scheduling the battles and
executing them and such.

All you need to do is provide the warriors and evolve the warriors.

This would be really cool.

Only thing I need to add to my battlefield evolver is some kind of
plug-in/option/external system so that your evolver can be plugged into
it...

That would be funny.

It's been a while since I wrote that code so I am examining how this could
be possible

You would be locked into my evolver framework... this means you can't make
your own gui and all that stuff...

But advantages are:

1. More advanced/graphical gui
2. More speed... (multi-core and possibly gpu in future)
3. Possibly more speed (less harddisk activity hopefully)
4. Binary storage of evolution of warriors (costs less space and can be
played back in analyzer)
5. Different evolvers could be plugged into it in the future.

Bye,
Skybuck =D


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      09-03-2009, 10:56 AM
I think a DLL/plug-in-system would be a good choice to start with....

Then the DLL could use "harddisk" loading for now...

If you ever manage to write your evolver into a DLL then it will be
lightning fast !

And best thing is: "very very very low harddisk activity !"

So I go work on dll-plug-in...

Might even be interesting for myself...

Seperates battlefield execution from evolving !

Negative side is DLL limited to procedural/routine and not object
orientation unless it was a delphi package or maybe a .NET something... but
I am not even sure if Delphi Win32 can handle .NET so probably not... could
port code to .NET then object orientation across different languages could
be possible... but .NET easily reverse engineerable LOL.

So for now I will try to limit myself to a simple DLL plugin system... maybe
I don't use it myself... so then it would just be for you and maybe anybody
else wanting to have a play with it...

Ofcourse I will test it so you have enough features available to do decent
evolving just like I would do directly.

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      09-03-2009, 02:02 PM
Ok,

Plug-in is in the working.

(Actually it probably won't be a plug in... see below: it will probably just
be an executor )
(So I wasted some time writing a plug-in but that's not bad.. I learned from
it and now understand better what to do... and how to interface ! probably !
)

I am now considering two possibilities:

1. Single cell evolver. (Input: Loser, Winner, Coordinates) (Output: Evolved
Warrior, AutoColor(yes/no))

2. Battlefield/multiple cell evolver.(Input: previous battlefield, current
battlefield)

The difference is the single cell evolver will be called many many many
times, so then a simple evolver can be written which simply evolves a step
at a time... so like one warrior so the user doesn't have to walk the
battlefield... that is done for him...

Or if the user wants to walk the battlefield himself... then option 2 would
be better.

For example I can remember that you wrote in the passed that your evolver
allows the winner to select a mate to breed with... option 2 might be better
for that... then the battlefield can be examined.

There is also the possible issue of "state management".

Does your evolver need to maintain state between evolving generations ?

The answer to that question could be a possible/surprising: "yes".

For mine too probably...

The "problem" is with the random number generators... they could be put on
time... but still that's not so cool/good (?)

Random number generators need to maintain "state" -> the seed.

Then there could also be the problem of maintaining a "generation count"...
my battlefield evolver already does that... and this information could be
provided.

One last possibility could be that my evolver maintains state for your
evolver... but then things could get a little bit complex for you and
everybody really... it's possible though... anything is possible

There is still the issue of communicating the qbasic program... I simply
don't know what would be best

Maybe it's time we start working towards each other. Well you already told
your qbasic can provide input and output in binary files... so that's a big
plus I hope !

Maybe I need to change the plug in idea/architecture...

I think the easiest would be as follows:

1. Your evolver generates warriors on the battlefield... like new warriors
or evolved warriors... that way it can do whatever it wants.

2. Your evolver writes all the warriors in binary format to the battlefield
file.
(at the input warriors section)

3. Your evolver calls my battlefield executor.

4. My battlefield executor reads the battlefield file/warriors.
(at input warriors section)

5. My battlefield executor schedules battles for the input warriors.

6. My battlefield executor determines the battlefield winners and writes
them to the battlefield file.
(to output warriors section)

7. My battlefield executor returns to dos...

8. Your qbasic program continues running and reads the output warriors to
determine who won...
(at output warriors section)

(warriors will have an x and y... the output_warrior.x and y indicate where
it came from... from the inputs)

This is pretty much how my evolver works internally... so this is a proven
method.

This will work very well...

All that needs to be done by your qbasic file is simply overwrite the
battlefield file...

Or alternatively... expand to it...

Since my battlefield executor can load and load and load until it encounters
the last battlefield round and it will continue from there...

This way replay is possible too.

However I would not recommend constantly re-reading the entire battlefield
file...

So that's probably why it's better to simply overwrite it...

Actually this isn't even a problem...

Since my battlefield evolver can simply append all battlefields to a
seperate file...

So all that's really need is two things:

1. My battlefield executor writes to a special "replay" file.

2. My battlefield executor has a special loading possibility... where it
loads a different file/battlefield... which will be your generated
battlefield file !

It's that simple ! WOW

Only thing that I probably need to do is set generation to 1 or so.. for
automatic exiting or so...

Or simply break out of everything... or I could further modify it... but I
won't do that because it would be nice if my executable can remain just the
way it is so I can continue using it to evolve for myself...

So by just triggering some options/booleans/variables it would go into
"console mode" or so... and "single battlefield mode".

Where it simply performs one battlefield fight and returns/exits.

It could even be a simply gui app that terminates itself... but then your pc
would start flicking with gui...

But I know delphi can switch/trigger between console and gui so maybe that a
possible...

Or I would need to make a slight special version for console mode...

And by simply using branches I can get rid of the painter thread and such...

I think it's doable...

Only thing left me to do is explain the battlefield format/warrior
format/instruction format to you... and then hopefully you can manage to
write some code for it ?!?

Sounds like a good plan to me... whatta ya say ?

Bye,
Skybuck.


 
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Failure to Eject John McWilliams Apple 41 12-24-2008 08:35 PM
Security update 2008-2 Michelle Steiner Apple 0 03-19-2008 12:19 AM
sun poker free $200 bonus code mansion poker free $200 bonus codespoker free $200 bonus codes poker free $200 bonus listings poker reload free$200 bonus poker sites free $200 bonus poker free $200 bonus offers free $200bonus poker strategy full tilt susi40004@googlemail.com Apple 0 03-13-2008 06:06 AM
Skybuck's Universal Code 4,Delphi Demonstration Program Skybuck Flying Asus 8 05-28-2007 11:30 PM
Code curosity G. Michael Paine Apple 14 04-16-2006 12:37 PM


All times are GMT. The time now is 07:09 AM.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43