Motherboard Forums


Reply
Thread Tools Display Modes

Converting a floating point texture to a rgba texture so it's ready to be flipped to the screen ?! ;)

 
 





















Skybuck Flying
Guest
Posts: n/a

 
      10-02-2009, 10:04 AM


Hmm,

Maybe I am in luck... I already have pmars source code in C... didn't write
it myself... but it's samiliar to mine

If I can get the source code compiling in visual studio I might be able to
do some profiling with amd analyzer/profiler to hopefully see what the
bottleneck might be

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-02-2009, 10:23 AM
I think I already tried that in the past... so screw pmars code... if there
was an amd analyzer for c++ builder then it would have been usefull.

Converting my code to c/c++ shouldn't be too hard except maybe for the
threading code... but that could be interesting as well if that fails I can
always switch to non-multi threaded code...

For now I am going to do a conversion to C/C++ because it's highly needed I
have no idea what the real bottleneck is... and that makes it hard to come
up with a good bottleneck-fighting-strategy

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-02-2009, 10:39 AM
Having to use visual studio and converting my code to c/c++ is depressing.

However I could use other editors like c++ builder to easy the pain
somewhat...

Then finally I would have to use visual studio.

To get over this depression I am now going to play some "CoH ToS"

For inspiration and happyness LOL.

^ downtime coming ^ =D

Bye,
Skybuck


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-02-2009, 02:24 PM
Hmm ok

1. played a bit of CoH some time ago...
2. Tried a Delphi to C++ conversion tool 1.5 trial it said... But it was no
good.
3. And investiged the possibility of writing my own Delphi to C/C++
convertor...
It might be possible but it would require a whole lot of time and a whole
lot of testing.
Mostly to figure out how the used parser/lexer works.

It would require to much time for now me thinks... but could be interesting
project for the future...

As far as I know there is no decent Delphi to C++ tool ? Thus such a good
tool could be popular ?!? And sell well ? Any potential buyers out there ?


Anyway the tool could be usefull for myself as well... to quickly convert
Delphi code to C/C++ code... to then benchmark/profile it with AMD Code
Analyst and any other future tools.

However maybe it's possible to make Delphi interface with AMD Code
Analyst... haven't exploded that... but that would probably be even more
difficult if not impossible ?!

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-02-2009, 03:25 PM
Some other idea's to consider:

1. Speculative execution of all core cells, would probably lead to many
conflicts, however output to different cells could be stored seperatedly per
input/output core so at least all results would be ok. <- many unnecessary
executions at first and maybe later too

2. Speculative execution of all processes in the list <- different way of
parallelism, could produce more usefull executions but still very limited

These two idea's above are more "fun" idea's they are not very serious...
but could be easy to implement.

Time for a totally different idea:

3. CPU does preprocessing of all-to-be-executed instructions per
core/simulator.

CPU could have access to 2 GB of ram (virtual memory limit) 4 GB of ram
would need to be enabled for kernel memory.

Total ammount of simulators for 1 v 1 warrior fights would be:

2 GB / 84.000 bytes = 2147483648 / 84000 = 25.565 simulators.

Possibilities for memory locations per instruction are rougly:
1. A=A+B,
2. A=A+1,
3. A=A-1,
4. B=B+1,
5. B=B-1,
6. A=A/B,
7. A=B/A,
8. B=B/A,
9. B=A/B,
10. A=A*B,
11. B=B*A,
12. A=A mod A
13. A=B mod B
14. A=A mod B
15. A=B mod A
16. B=A mod A
17. B=B mod B
18. B=A mod B
19. B=B mod A

Maybe even all of these+1...

I am not sure how many possibilities there are...

Maybe 100 ? Maybe more ?

For now let's assume 100 or so.

This could mean 100 memory locations have to be read to be sure that all
locations are present for complete instruction execution and memory input
data and memory output data...

Actually the possibilities aren't that great... the pre-processor should be
able to know exactly which instruction type will be executed so the number
of possibilities will be very small... and can be pre-computed. However this
would almost be the same as actually executing it...

So another idea could be to do the pre-processor on the gpu as well... so I
guess this comes down to simply:

1. Processing the instructions on the gpu for as far as possible
2. Falling back to cpu to get any necessary code or locations and supplieing
them again to the gpu... or maybe another gpu pass can actually do all that.
3. Go back to gpu and execute the remaining part of the instructions.

(Had this idea while letting this post "idle" for a while on my pc LOL )

Yeah so to keep this story short:

1. Process instructions on the gpu for as far as possible, then try to do
anything else is secondary/tertiary passes/multiple passes and so forth.

Yeah this is pretty much how I designed the original core gpu algorithm...
which also included loading/using multiple textures in the gpu up to 512 MB
!

So I was hoping to do just one texture map or so... but now it turns out
that would not give enough performance.

So to make long story short: I must go back to the original core gpu design
and implement it massively !

However easier said then done... because more passes probably means more api
delay... and then the target might not be reached as well.

Target ofcourse being insane speed !

Let's do some calculations...

Number of steps estimated for core gpu executor design: 21

21 passes * 0.152 milliseconds = 3.192,00 milliseconds required for all
steps...

1000 / 3.192 = 313 cycles per second... let's divide this by 2 just in
case... 155 cycles per second.


25565 simulators * 155 cycles = 3.962.575

Again 4 million cycles ?!?!? wtf ?!

Kinda funny how I keep hitting this 4 million limit !

Bye,
Skybuck =D


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-03-2009, 09:48 AM
Ok,

I just did some testing of the draw routine...

The speed in a tight loop without any data changes is about 20.000 frames
per second...

I am not sure if OpenGL actually renders each one or that it detect that
nothing changed...

For now I will assume it renders each frame.

This means the actual speed in the scenerio described could be 3 times
higher...

About 12.000.000 cycles per second.

However the scenerio described is probably totally unrealistic since the cpu
could never supply 2 gb per frame...

That would be like 40.000 tb per second haha !

However I have some new idea's which might work by feedback to gpu.

But I am getting a bit tired of all these different models/scenerio's...

Maybe I describe one later on or maybe not and keep it secret

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-03-2009, 09:53 AM
Euhm actually not 4 tb... because cpu could upload only those this which
would be necessary and that's definetly not everything... only small
portion...

So many different ways of implementing it... makes me dizzy and nervous !

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-03-2009, 01:03 PM
So more interesting techniques to investigate:

0. Feedback buffers (already mentioned in previous post ) (only for
texture0?)
1. Pixel rectangles
2. Bitmaps
3. Stencil buffer can be used to exclude certain fragments
(if multiple cores in the texture...then stencil buffer would need at least
a few bits to indicate which core instruction pointer is to be enabled and
which instruction pointer/location so another 14 to 16 bits so many bits
needed for stencil buffer... not sure what the maximum is...)

4. The z buffer also has some bits... 24 bits... not sure if that could
somehow be used.

5. Logical operations... could be used to quickly replace certain values in
the framebuffer.
(Only for integers ???) Could be used to do copy ?

6. Buffer updates color masks... <- could be interesting to split planes or
to get a certain bitplane.

For other projects
7. Bits can be written into the stencil by using mask... not sure if it
means color bits, depth bits or any...
8. Accumalation buffer, can for example add up bits it seems... could be
handy for counting bits in parallel.
9. Pixel store parameters could be used to swap bytes, or switch bit order.

10. CopyPixels could be used to copy from read framebuffer to drawbuffer ?
To then display it ? I so could be used as an easy scroller or so.

11. BlitFrameBuffer can be used to visualize the special buffers like
stencils and depth buffers and so.
(Supports stretching) (If buffers specified is the same then overlapping in
same buffer results are undefined).
Pixel formats for both buffers must be the same.

12. Not sure but: render buffers might be able to have 16 stencil bits ?!
(Probably still to little for what I might use them for...) Maybe stencil
and depth can be combined to form one large special buffer. dIt mentions
depth_stencil or so.

13. Texture objects can be bound to frame buffer via FrameBufferTexture1D
(probably equivalent of _EXT version or so )

14. Attaching the texture buffer to framebuffer and using that same texture
buffer as a texture access could produce undefined results... it could lead
to a feedback loop which in itself is kinda interesting... could be used to
try and to sequantial execution... or it could be used for random noise
generation (?).

15. Generally interesting: polynomials... used to generate verteces and
such... I assume across the polynomial/curve ? (See map command)

16. Specifieing hint: POINT SMOOTH HINT could be enough and might provide
some performance benefits.

17. I am not sure what a histogram is... but might be interesting for
"belongs to group" visualizations.
Page 420 has a word about histogram... apperently it's counting the
occurence of certain color values
(min and max pixel values can also be determined)

18. State tables could be interesting to learn what "state" the opengl is in
?! when in doubt I presume
(tremendous ammount of state/information can be examined.)

19. Multi texture could be interesting even highly interesting ?!? It
mentions the possibility of "pipelining" and using the output of one texture
as input for the next texture ?!? It mentions this is controlled through
texture environments ?!? not sure what that is all about...

20. This is explained further: Texture Combine Environment Mode <- Could be
highly interesting !!! Page 427 says it's possible to arrange these textures
in all kind of ways ! Very very interesting !
I hope that each pixel shader gets to act on them ?!?!? Or maybe it's just
an opengl api thingy ? Not sure...

21. Funny thingy: "point parameter" controls characteritics of points (?!?)


22. Pixel buffer object might give more performance for pixel drawing and
reading... not sure how usefull this would be...

And that ladies and gentlemen concludes my "analyzation" of the current
opengl 3.0 spec...

Most interesting concept/feature I came across is the concept of "pipelining
textures".

I am not sure if it's possible but that would be a very interesting concept:

texture->shader->texture->shader->texture->shader->texture->shader->texture->shader

Only problem would be shaders can't write to certain locations... but that
can be solved by using an "output address" then the next shader can use that
and simply "read" and pretend that it came from itself or so...
But then again... it doesn't know here to read so this wouldn't work haha !
It would only work for vertex shaders which can displace themselfes so then
the pipeline would look:

texture->vertex shader->pixel shader->texture->vertex shader->pixel
shader->texture.

I think multi-textures are probably limited to pixel shaders only ? Or maybe
not even that... so don't know about all this.

Attention: Framebuffers have no accumalation buffer... so much for that !
Though for other project the default frame could probably be used if this
was necessary ?!? (Maybe a hidden frame or so )


Here is an idea to use the depth buffer:

Different cores could be at different depths... maybe by setting some depth
value a specify core could be selected... this could be used to reset cores
or to update them with new battles this way the gpu could run multiple
cores/simulators asynchronously... and it doesn't have to wait until all or
done... could be pretty and pretty damn handy ! depth and stencil values
can be combined could also be used as a place to store more information.
However a simply copy to certain location of framebuffer could work just as
well so maybe this unnecessary complexity or so depends on what is faster
I guess

^ These are all opengl api calls...

It's like "saying" to the cpu do a := a xor b; except now it's told to the
gpu and the gpu does a := a xor b...

But a in this case is not just a field... it could be a whole buffer... like
one million pixels !

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-04-2009, 10:48 AM
I thiiiiiiiink I am going to attempt a Delphi to C/C++ converter tool.

The idea of having such a tool which would work very well seems very
attractive to me !

Bye,
Skybuck =D


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      10-04-2009, 11:48 AM
With the parser I have it's gonna be a peace of cake and then I am gonna be
filthy rich ! LOL

Bye,
Skybuck =D


 
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off



All times are GMT. The time now is 06:18 AM.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43