"Skybuck Flying" wrote in message
news:b3cc5$4df0a2aa$5419acc3$ b.home.nl...
Hello,
Question is:
Can a x86/x64 cpu/memory system be changed into a barrel processor ?
I shall provide an idea here and then you guys figure out if it would be
possible or not.
What I would want as a programmer is something like the following:
1. Request memory contents/addresses with an instruction which does not
block, for example:
EnqueueReadRequest address1
Then it should be possible to "machine gun" these requests like so:
EnqueueReadRequest address1
EnqueueReadRequest address2
EnqueueReadRequest address3
EnqueueReadRequest address4
EnqueueReadRequest address5
2. Block on response queue and get memory contents
DequeueReadResponse register1
do something with register1, perhaps enqueue another read request
DequeueReadResponse register2
DequeueReadResponse register3
If the queues act in order... then this would be sufficient.
Otherwise extra information would be necessary to know which is what.
So if queues would be out of order then the dequeue would need to provide
which address the contents where for.
DeQueueReadResponse content_register1, address_register2
The same would be done for writing as well:
EnqueueWriteRequest address1, content_register
EnqueueWriteRequest address2, content_register
EnqueueWriteRequest address3, content_register
There could then also be a response queue which notifies the thread when
certain memory addresses where written.
DequeueWriteResponse register1 (in order design)
or
DequeueWriteResponse content_register1, address_register2 (out of order
design)
There could also be some special instructions which would return queue
status without blocking...
Like queue empty count, queue full count, queue max count and perhaps a
queue up count which could be used to change queue status in case something
happened to the queue.
For example each queue has a maximum ammount of entries available.
The queueing/dequeuing instructions mentioned above would block until they
succeed (meaning their request is placed on queue or response removed from
queue)
The counting instructions would not block.
This way the cpu would have 4 queues at least:
1. Read Request Queue
2. Read Response Queue
3. Write Request Queue
4. Write Response Queue
Each queue would have a certain maximum size.
Each queue has counters to indicate how much "free entries there are" and
how much "taken entries there are".
For example, these are also querieable via instructions and do not block the
thread, the counters are protected via hardware mutexes or so because of
queieing and dequeing
but as long as nothing is happening these counters should be able to return
properly.
Little correct: full should have been fill:
GetReadRequestQueueEmptyCount register
GetReadRequestQueueFillCount register
GetReadResponseQueueEmptyCount register
GetReadResponseQueueFillCount register
GetWriteRequestQueueEmptyCount register
GetWriteRequestQueueFillCount register
GetWriteResponseQueueEmptyCount register
GetWriteResponseQueueFillCount register
All instructions should be shareable by threads... so that for example one
thread might be postings read requests and another thread might be
retrieving those read responses.
Otherwise the first thread might block because of read request full, and
nobody responding to response queue.
Alternatively perhaps the instructions could also be made non-blocking, and
return a status code to indicate if they operation succeeded or not, however
then an additional code or mode would also be necessary to specify if it
should be blocking or non-blocking... which might make things a bit too
complex, but this is hardware-maker decision... in case many threads sharing
is too difficult or impossible or too slow then non-blocking might be
better, the thread can then cycle around read responses and see if anything
came in so it can do something... however this would lead to high cpu
usage... so for efficiency sake blocking is preferred, or perhaps a context
switch until the thread no longer blocks. It would then still be necessary
for the thread to somehow deal with responses... so this this seem to need
multiple threads to work together for the blocking situation.
The memory system/chips would probably also need some modifications to be
able to deal with these memory requests and return responses.
Perhaps also special wiring/protocols to be able to "pipeline"/"transfer as
much of these requests/responses back and forth.
So what you think of a "barrel" like addition to current amd/intel x86/x64
cpu's and there memory systems ?!? Possible or not ?!?
This idea described above is a bit messy... but it's the idea that counts...
if cpu manufacturers interested I might work it out some more to see how it
would flesh out/work exactly
Bye,
Skybuck.