# Turn vertex points on/off, so fragment points turn on/off

Discussion in 'Nvidia' started by Skybuck Flying, Sep 4, 2009.

1. ### Skybuck FlyingGuest

Hello,

I am newb to gpgpu programming so I wonder if the following is possible and
what technique for it could be used to achieve it
(I saw someone else mention GL_POINTS so I will go on that):

The idea is as follows:

There is an array of verteces for example 1000 verteces.
Each vertex translates to a "pixel" / fragment.
So there should be 1000 pixels as well.

The vertex shader will be called first and each vertex must decide if it's
to be on or off based on some condition.

Each vertex that is on should trigger a fragment/pixel shader.

Therefore only the pixel/fragment associated with the vertex should execute
the pixel/fragment shader to make it as efficient/fast as possible.

So that's idea and I have a couple of questions about that:

1. Is this possible somehow ?

2. What techniques exist ?

3. Which technique would be fastest ?

I would prefer to use OpenGL + CG on DirectX9 cards like NVIDIA GTX 7900.

DirectX solutions would be interesting too.

(I posted this question on gpgpu forum as well (except above two lines), so
I am shotgunning multiple resources for some possible answers )

Thanks for any help ! God bless you !

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009

2. ### Skybuck FlyingGuest

Ok,

It seems easy enough, however there is one little peculiar thing; the last
horizontal line does not get any points ? Weird... anybody have an
explanation for that ?

Here is the code:

(By changing the Z value from 0 to -2 or 2 the points will disappear... but
not when setting it to -1 or 1... why is that ? that's kinda strange too )

procedure CreateVertexPoints( ParaWidth : integer; ParaHeight : integer );
var
vX : integer;
vY : integer;
// vZ : integer;
begin
glBegin(GL_POINTS);

for vY := 0 to ParaHeight-1 do
begin
for vX := 0 to ParaWidth-1 do
begin
glVertex3f( vX, vY, 0 );
end;
end;

glEnd;
end;

procedure TForm1.mOpenGLGLInit(Sender: TObject);
begin
glMatrixMode(GL_PROJECTION);
gluOrtho2D(0.0, mOpenGL.Width, mOpenGL.Height, 0.0);
glMatrixMode(GL_MODELVIEW);
glViewport(0, 0, mOpenGL.Width, mOpenGL.Height );
end;

procedure TForm1.mOpenGLGLPaint(Sender: TObject);
begin
glClear(GL_COLOR_BUFFER_BIT or GL_DEPTH_BUFFER_BIT);
CreateVertexPoints( mOpenGL.Width, mOpenGL.Height );
end;

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009

3. ### Skybuck FlyingGuest

Ok,

I read a posting somewhere how pixels are supposed to fall into the middle
or so...

So far there seem to be two solutions:

1. glVertex3f( 0.5 + vX, 0.5 + vY, 0 );

2. gluOrtho2D(0.0-0.5, mOpenGL.Width-0.5, mOpenGL.Height-0.5, 0.0 - 0.5);

Last solution would probably be better for two reasons:

1. First of all it's only done once.

2. Second of all I don't need to fuss around in all glVertex calls and such
which would be nice !

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
4. ### Skybuck FlyingGuest

Though I am not sure how these two solutions would affect gpgpu
programming...

Could this disturb the values by -0.5 ?

Hmm...

Well for now I can let it in... this is also advantage of solution 2...

If it needs to be removed then only in one place does it need to be removed
!

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
5. ### Skybuck FlyingGuest

Whatever I do it seems the speed is always 60 FPS, the rate of my LCD
monitor... kinda strange...

I am not sure if this is a locked speed of TOpenGL, Delphi, Windows or
something else... like maybe swapping buffers always being that slow... I
don't know...

Hmm.

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
6. ### Skybuck FlyingGuest

I love the internet, I love google, days like these I love

Searched and again found the answer in 3 seconds lol.

It's because of the double buffering... apperently it waits for a vertical
retrace.

Turn that off and it runs at 29000 fps LOL. (Not drawing anything except
black background )

Now I measure performance of the "slow" immediate vertex drawing mode

Here goes:

Ok for 500x400 verteces it achieves 160.72 fps.

The screen blinks a lot and the last lines are missing... but at least this
gives a bit of an indication how slow it is (?)

I also tried adding glFinish before measurements but that don't matter much
in this case

Yeah going from 29000 fps to 160 fps is bad

Bye,
Skybuck LOL.

Skybuck Flying, Sep 4, 2009
7. ### Skybuck FlyingGuest

Ok,

The idea is now to create a 4096x4096 vertex array in 3D Studio Max 9 or
so...

So that this file can be loaded into FX Composer 2.5 so that I can directly
program the shader in FX Composer 2.5 and see how it works out !

Hihi.

3D Studio 4.0 for DOS used to have some kind of generation/duplication
functionality to quickly duplicate objects probably even verteces now I go
figure this out for 3D Studio Max 9

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
8. ### Skybuck FlyingGuest

Hmm kinda interesting I am running into all kinds of problems/limitations
with 3D Studio:

1. First of all the plane is limited to 1000 by 1000 segments.

2. Second of FX Composer 2.5 cannot import MAX files ?

3. Third of all exporting to 3DS is problematic: 3D Studio Max 9 shows
warnings: Object(Plane01) has too many faces(more than 64K) to export.

Seems like this is a 3DS 4.0 file format limitation ? Ouch !

Well I don't need the faces... I just need the verteces... maybe I can
delete the triangles/faces... that would leave 1000x1000 verteces or so...
one million verteces... maybe it will complain again...

Hmmm...

I also wondering which tool would be better for developing shaders:

1. 3D Studio Max ?

or

2. FX Composer 2.5 ?

I know FX Composer 2.5 has a shader debugger... but does 3D Studio Max have

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
9. ### Skybuck FlyingGuest

I am trying to find an export format that both tools support...

I tried the wave front (obj) format... but it's some kind of fucked up
format that generates like 200+++ MB...

I had to terminate 3D Studio Max... just in case...

I kinda new that would happen... maybe it's some kind of length text file
format... don't know, don't care

Well hmm shitty... me must find good efficient format

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
10. ### Skybuck FlyingGuest

I tried the FBX plugin... or whatever... but this also take a pretty long
time... I don't wanna wait for it... life is too short

Had to terminate 3D Studio Max 9 again... it used something like 1 GB of ram
during export.

I shall try one more time by trying to delete the faces or so... maybe that
helps but I doubt it

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
11. ### Skybuck FlyingGuest

I managed to delete the faces, and preserving the verteces by turning off
"delete isolated verteces".

However exporting to 3DS format now gives the other warning I feared:

"Object [Plane01] has too many vertices (more than 64K) to export.

I tried opening the partial file or so in FX Composer (the file with the
face) but composer didn't show anything... first time I actually test
importing... maybe I should first do a small test to see if that would work
and how it would look but ok.

So far it seems 3DS format is seriously limited to 16 bit which means
everything having a limit of 64K

So best I can do is SQRT(64000) which is: 252x252 = 63504

However normally it's something like 256*256 = 65536

However C loops will fail under that condition so I fear 3DS Max 9 will spin
for ever when trying to save or load it...

Would be interesting to see... so I will attempt creating exactly 65536 or
65535 vertices to see what happens

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
12. ### Skybuck FlyingGuest

Ok,

I did some testing that was kinda interesting

A plane of 255 by 255 segments creates exactly 65536 verteces.

Exporting to 3DS fails.

I deleted one vertec.

Exporting to 3DS fails.

Apperently the programmers/file format designers were paranoid enough to
limit it to 64000

So now I create a plane of (252-1) by (252-1) segments and that should be
the maximum square that will fit into 64000 verteces... it would almost
create double the ammount of faces so those need to be deleted and then I
can finally export something into FX Composer 2.5...

Would be funny if FX Composer is stubern enough to delete the unused
verteces... I hope not... but I fear the worst !

Bye,
Skybuck =D

Skybuck Flying, Sep 4, 2009
13. ### Skybuck FlyingGuest

With the faces I don't see anything in FX Composer 2.5...

This kinda sux.

Well new calculation:

Maximum number of faces is 64000

Each square needs to faces so new figure is 32000

SQRT(3200) = (178-1) by (178-1)

This should create the maximum allowable plane with faces for 3DS...

Gonna try it now... if that fails then I am gonna try a fricking teapot
lol... to see if it's gonna work at all

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
14. ### Skybuck FlyingGuest

FX Composer 2.5 seems to totally lock up when trying to load just 64K faces
and 32K verteces ?!?!?

Ouch ! Sigh...

Well just to test out my theories I guess I will have to seriously constrain
myself and limit myself to

10x10 faces... I hope it can handle that ?!? Gjez ! =D

LOL.

Bye,
Skybuck =D LOL.

Skybuck Flying, Sep 4, 2009
15. ### Skybuck FlyingGuest

Ok finally... 9 by 9 faces it can handle that ! LOL.

Giving me exactly 10 by 10 grid of verteces...

Now I try to write a vertex shader only to see how it can affect verteces...

According to docs it can fully change vertex locations... so that will be
most interesting to witness !

Bye,
Skybuck.

Skybuck Flying, Sep 4, 2009
16. ### Skybuck FlyingGuest

It appears graphics have come a long way since the days of dos LOL.

requires a position more or less...

FX Composer is nice enough to provide a formula/code for camera/world
projection or something like that... never understood matrices me but it
gets the job then here is my little vertex shader:

Here is my first little vertex shader:

/*

% Second line of description for my shader.

keywords: material classic

date: YYMMDD

*/

float4x4 WorldViewProj : WorldViewProjection;

float4 mainVS(float3 pos : POSITION) : POSITION{

float3 test;

test = pos;

// makes the plane all spikey
// test.y = test.y + cos(test.x/10)*100 + cos(test.z/10)*80;

// makes the plane wavey like a bowl
test.y = test.y + cos(test.x/100)*60 + cos(test.z/100)*50;

return mul(WorldViewProj, float4(test.xyz, 1.0));
}

float4 mainPS() : COLOR {
return float4(1.0, 1.0, 1.0, 1.0);
}

technique technique0 {
pass p0 {
CullFaceEnable = false;
VertexProgram = compile vp40 mainVS();
FragmentProgram = compile fp40 mainPS();
}
}

This is exactly what I need this kinda of vertex shader capabilities...

It will probably do just fine for the intended purposes !

I don't know if it will be fast... but probably... only other alternative I
know so far is modifieing the verteces outside of the shader... like in c or
delphi code... this would have benefit of only having to modify a few
verteces only... but question is can object be updated easily/partially like
that ?!? And how long would it take... kinda interesting question...

But for now I am betting on somehow keeping the verteces all inside the
graphics cards and trying to manipulate them repeatedly.

It's to bad that FX Composer can't handle so many vertices otherwise I could
have used it's performance indicator... but anyway I try that now to see
what kind of performance it gives for this simple shader !

Well I am also running 3D Studio so maybe these numbers not so representive
but ok here goes... later I will test them so more... but first I want more
vertices for kicks ! I need them too hihi.

For Geforce 8400 GS it says something like:

regs:
normal 5
fp16 5
fp32 5

what does this mean, how many registers it uses ?

Cycles:
5,5,5

MPIX/sec
1799,
1799,
1799.

I guess that's for the simple pixel shader.

Ah yes now I see I have to select "show vertex shader"

Regs: 5
Cycles: 23
MVert/s : 23

Regs 5 probably means 5 registers used ?
Cycles 23 probably means how many cycles it takes to execute the shader ?
MVert/s is probably how many vertices it can shade in millions...

So that would be 23 million verteces... that's not bad for 2 cosines, 2
multiplies and 3 addition and some assigments

23 million fricking verteces hmmmm.... suppose a redcode warrior would
execute that many cosines and such that would mean 287.5 warriors would run
per second ! HMMM juicy ! =D LOL.

Sounds to good to be true... there is probably some overhead here and there
for opengl calls and such...

But things are starting to look nice...

Actually it is too good to be true... since these are just 10 by 10
vertices... ultimately... it would be pretty full: 4096 x 4096 verteces is
what I want... but I can't test that with FX composer...

I did do some tests with some code yesterday that looked fast though... so
could be true !

This is way too much fun ! LOL.

Bye,
Skybuck =D

Skybuck Flying, Sep 4, 2009
17. ### Nicolas BonneelGuest

[snip]

hem.. I really don't want to hurt you but...just as a side note... do
you realize that with your average writing rate on these newsgroup
(maybe a hundred lines every 10 minutes), nobody is reading what you're
writing ?
Probably you could consider posting once a day with the summary of your
deeper thoughts instead of posting every single question that you get at
every instant, and that your solving by yourself in the next minutes ?

that you're not starting a new topic each time), but more importantly,
People here have jobs and won't spend their night reading your tens of
messages! However, reading and answering a short message a day is feasible.

Also, you can consider putting a follow-up to a single newsgroup instead
of cross-posting, but at this point, this is a detail...

Thank you very much for your understanding! But I'm sure you can

Nicolas Bonneel, Sep 4, 2009
18. ### Skybuck FlyingGuest

Read what interests you, skip the rest

Bye,
Skybuck =D

Skybuck Flying, Sep 4, 2009
19. ### Charles E HardwidgeGuest

You're trolling like the point and shoot guy does in the photography
newsgroups. It's a good way to be ignored.

Follow-ups trimmed.

Charles E Hardwidge, Sep 4, 2009
20. ### Nicolas BonneelGuest

Skybuck Flying a écrit :
oh, ok, you're not here to ask questions (although it appears so),
you're writing here either to show off or to hold a blog. There are web
tools for that.
no: as soon as you see 10 messages which are each posted at 15 minutes
of interval (and I'm nice) and contains >100 lines each, everyone just
SKIP it. It means the message is not read *at all*, and I'm not even
aware if it would have interested me if I had taken the time to read it,
even partly (or to open it). In brief, we just see the length of the
message, not its content.

Nicolas Bonneel, Sep 4, 2009