Can a x86/x64 cpu/memory system be changed into a barrel processor ?

Discussion in 'Asus' started by Skybuck Flying, Jun 9, 2011.

  1. Hello,

    Question is:

    Can a x86/x64 cpu/memory system be changed into a barrel processor ?

    I shall provide an idea here and then you guys figure out if it would be
    possible or not.

    What I would want as a programmer is something like the following:

    1. Request memory contents/addresses with an instruction which does not
    block, for example:

    EnqueueReadRequest address1

    Then it should be possible to "machine gun" these requests like so:

    EnqueueReadRequest address1
    EnqueueReadRequest address2
    EnqueueReadRequest address3
    EnqueueReadRequest address4
    EnqueueReadRequest address5

    2. Block on response queue and get memory contents

    DequeueReadResponse register1

    do something with register1, perhaps enqueue another read request

    DequeueReadResponse register2
    DequeueReadResponse register3

    If the queues act in order... then this would be sufficient.

    Otherwise extra information would be necessary to know which is what.

    So if queues would be out of order then the dequeue would need to provide
    which address the contents where for.

    DeQueueReadResponse content_register1, address_register2

    The same would be done for writing as well:

    EnqueueWriteRequest address1, content_register
    EnqueueWriteRequest address2, content_register
    EnqueueWriteRequest address3, content_register

    There could then also be a response queue which notifies the thread when
    certain memory addresses where written.

    DequeueWriteResponse register1 (in order design)

    or

    DequeueWriteResponse content_register1, address_register2 (out of order
    design)


    There could also be some special instructions which would return queue
    status without blocking...

    Like queue empty count, queue full count, queue max count and perhaps a
    queue up count which could be used to change queue status in case something
    happened to the queue.

    For example each queue has a maximum ammount of entries available.

    The queueing/dequeuing instructions mentioned above would block until they
    succeed (meaning their request is placed on queue or response removed from
    queue)

    The counting instructions would not block.

    This way the cpu would have 4 queues at least:

    1. Read Request Queue
    2. Read Response Queue
    3. Write Request Queue
    4. Write Response Queue

    Each queue would have a certain maximum size.

    Each queue has counters to indicate how much "free entries there are" and
    how much "taken entries there are".

    For example, these are also querieable via instructions and do not block the
    thread, the counters are protected via hardware mutexes or so because of
    queieing and dequeing
    but as long as nothing is happening these counters should be able to return
    properly.

    GetReadRequestQueueEmptyCount register
    GetReadRequestQueueFullCount register

    GetReadResponseQueueEmptyCount register
    GetReadResponseQueueFillCount register

    GetWriteRequestQueueEmptyCount register
    GetWriteRequestQueueFullCount register

    GetWriteResponseQueueEmptyCount register
    GetWriteResponseQueueFillCount register

    All instructions should be shareable by threads... so that for example one
    thread might be postings read requests and another thread might be
    retrieving those read responses.

    Otherwise the first thread might block because of read request full, and
    nobody responding to response queue.

    Alternatively perhaps the instructions could also be made non-blocking, and
    return a status code to indicate if they operation succeeded or not, however
    then an additional code or mode would also be necessary to specify if it
    should be blocking or non-blocking... which might make things a bit too
    complex, but this is hardware-maker decision... in case many threads sharing
    is too difficult or impossible or too slow then non-blocking might be
    better, the thread can then cycle around read responses and see if anything
    came in so it can do something... however this would lead to high cpu
    usage... so for efficiency sake blocking is preferred, or perhaps a context
    switch until the thread no longer blocks. It would then still be necessary
    for the thread to somehow deal with responses... so this this seem to need
    multiple threads to work together for the blocking situation.

    The memory system/chips would probably also need some modifications to be
    able to deal with these memory requests and return responses.

    Perhaps also special wiring/protocols to be able to "pipeline"/"transfer as
    much of these requests/responses back and forth.

    So what you think of a "barrel" like addition to current amd/intel x86/x64
    cpu's and there memory systems ?!? Possible or not ?!?

    This idea described above is a bit messy... but it's the idea that counts...
    if cpu manufacturers interested I might work it out some more to see how it
    would flesh out/work exactly ;)

    Bye,
    Skybuck.
     
    Skybuck Flying, Jun 9, 2011
    #1
    1. Advertising

  2. "Skybuck Flying" wrote in message
    news:b3cc5$4df0a2aa$5419acc3$1.nb.home.nl...

    Hello,

    Question is:

    Can a x86/x64 cpu/memory system be changed into a barrel processor ?

    I shall provide an idea here and then you guys figure out if it would be
    possible or not.

    What I would want as a programmer is something like the following:

    1. Request memory contents/addresses with an instruction which does not
    block, for example:

    EnqueueReadRequest address1

    Then it should be possible to "machine gun" these requests like so:

    EnqueueReadRequest address1
    EnqueueReadRequest address2
    EnqueueReadRequest address3
    EnqueueReadRequest address4
    EnqueueReadRequest address5

    2. Block on response queue and get memory contents

    DequeueReadResponse register1

    do something with register1, perhaps enqueue another read request

    DequeueReadResponse register2
    DequeueReadResponse register3

    If the queues act in order... then this would be sufficient.

    Otherwise extra information would be necessary to know which is what.

    So if queues would be out of order then the dequeue would need to provide
    which address the contents where for.

    DeQueueReadResponse content_register1, address_register2

    The same would be done for writing as well:

    EnqueueWriteRequest address1, content_register
    EnqueueWriteRequest address2, content_register
    EnqueueWriteRequest address3, content_register

    There could then also be a response queue which notifies the thread when
    certain memory addresses where written.

    DequeueWriteResponse register1 (in order design)

    or

    DequeueWriteResponse content_register1, address_register2 (out of order
    design)


    There could also be some special instructions which would return queue
    status without blocking...

    Like queue empty count, queue full count, queue max count and perhaps a
    queue up count which could be used to change queue status in case something
    happened to the queue.

    For example each queue has a maximum ammount of entries available.

    The queueing/dequeuing instructions mentioned above would block until they
    succeed (meaning their request is placed on queue or response removed from
    queue)

    The counting instructions would not block.

    This way the cpu would have 4 queues at least:

    1. Read Request Queue
    2. Read Response Queue
    3. Write Request Queue
    4. Write Response Queue

    Each queue would have a certain maximum size.

    Each queue has counters to indicate how much "free entries there are" and
    how much "taken entries there are".

    For example, these are also querieable via instructions and do not block the
    thread, the counters are protected via hardware mutexes or so because of
    queieing and dequeing
    but as long as nothing is happening these counters should be able to return
    properly.

    Little correct: full should have been fill:

    GetReadRequestQueueEmptyCount register
    GetReadRequestQueueFillCount register

    GetReadResponseQueueEmptyCount register
    GetReadResponseQueueFillCount register

    GetWriteRequestQueueEmptyCount register
    GetWriteRequestQueueFillCount register

    GetWriteResponseQueueEmptyCount register
    GetWriteResponseQueueFillCount register

    All instructions should be shareable by threads... so that for example one
    thread might be postings read requests and another thread might be
    retrieving those read responses.

    Otherwise the first thread might block because of read request full, and
    nobody responding to response queue.

    Alternatively perhaps the instructions could also be made non-blocking, and
    return a status code to indicate if they operation succeeded or not, however
    then an additional code or mode would also be necessary to specify if it
    should be blocking or non-blocking... which might make things a bit too
    complex, but this is hardware-maker decision... in case many threads sharing
    is too difficult or impossible or too slow then non-blocking might be
    better, the thread can then cycle around read responses and see if anything
    came in so it can do something... however this would lead to high cpu
    usage... so for efficiency sake blocking is preferred, or perhaps a context
    switch until the thread no longer blocks. It would then still be necessary
    for the thread to somehow deal with responses... so this this seem to need
    multiple threads to work together for the blocking situation.

    The memory system/chips would probably also need some modifications to be
    able to deal with these memory requests and return responses.

    Perhaps also special wiring/protocols to be able to "pipeline"/"transfer as
    much of these requests/responses back and forth.

    So what you think of a "barrel" like addition to current amd/intel x86/x64
    cpu's and there memory systems ?!? Possible or not ?!?

    This idea described above is a bit messy... but it's the idea that counts...
    if cpu manufacturers interested I might work it out some more to see how it
    would flesh out/work exactly ;)

    Bye,
    Skybuck.
     
    Skybuck Flying, Jun 9, 2011
    #2
    1. Advertising

  3. Skybuck Flying

    Joel Koltner Guest

    "Skybuck Flying" <> wrote in message
    news:b7aee$4df0a36d$5419acc3$1.nb.home.nl...
    > Can a x86/x64 cpu/memory system be changed into a barrel processor ?


    [deletia]

    Not directly, but they... sort of... already are: The high-end Intel and AMD
    x86 CPUs are all superscalar designs, which means that internally the CPU is
    viewed as a collection of "resources" -- ALUs, instruction decoders, memory
    read units, memory write units, etc. -- and that there are (typically)
    multiple instances of each of these resources, and the CPU scheduler tries
    very hard to always keep all the resources busy, which effectively means that
    multiple instructions can be executed simultaneously (this effectively
    implements your "AddRequest, AddRequest, GetResponse, GetResponse" protocol
    that you'd like).

    Now, add on the hyper-threading that's been around for a number of years now,
    and I'd say you have a result that, in practice, is not that far from a barrel
    processor. In fact, it's probably better insofar as popular metrics such as
    performance/(# of transistors*clock rate*power) or somesuch in that the
    dynamic scheduling that a superscalar CPU performs is often more efficient
    than a straight barrel implementation when you're running "general purpose"
    code such as a web browser or word processor (although I would expect that
    barrel CPUs have instructions that provide "hints" to the schedulers to
    suggest it not switch threads or to keep or flush the caches or whatever just
    as superscalar CPUs do... but also recall that when HT was added to Intel's
    x86 CPUs, for certain workloads the HT actually slowed down the overall
    throughput a bit too...).

    As I think you've surmised, the trick to achieving high performance with CPUs
    is to prevent stalls. This is of course a non-trivial problem, and companies
    like Intel and AMD invest enormous resources into trying to get just a little
    bit better performance out of their designs; you can be certain that someone
    at these companies has very carefully considered which aspects of a barrel
    processor design they might "borrow" to improve their performance.

    ---Joel
     
    Joel Koltner, Jun 9, 2011
    #3
  4. The only thing my program needs to do is fire off memory requests.

    However it seems the x86 cpu blocks on the first memory request and does
    nothing else.

    This is AMD X2 3800+ processor.

    Perhaps newer processors don't have this problem anymore but I would
    seriously doubt that.

    So unless you come up with any prove I am going to dismiss your story is
    complex-non-relevant-bullshit.

    It's not so hard to write a program which requests random memory accesses.

    You apperently should try it sometime.

    Bye,
    Skybuck.
     
    Skybuck Flying, Jun 10, 2011
    #4
  5. Skybuck Flying

    Joel Koltner Guest

    "Skybuck Flying" <> wrote in message
    news:58426$4df16c1b$5419acc3$1.nb.home.nl...
    > The only thing my program needs to do is fire off memory requests.
    >
    > However it seems the x86 cpu blocks on the first memory request and does
    > nothing else.


    Hmm, it shouldn't do that, assuming there aren't any dependencies between the
    next handful of instructions and the first one there. (But note that if you
    perform a load operation and the data isn't in the caches, it takes *many tens
    to hundreds* of CPU cycles to fetch the data from external DRAM; hence you
    *will* stall. There actually are instructions in the x86 architecture these
    days for "warming up" the cache by pre-fetching data, though -- this can help
    a lot when you know in advance you'll need data, e.g., a few hundred cycles
    from now; if you're looping over big sets of data, you just pre-fetch the next
    block while you work on the current one.)

    A program that requests random memory accesses will very quickly stall for a
    long time (after the first couple of instructions), as you quickly exhaust the
    number of "memory read" resources available and have near-constant cache
    misses. Few real-world pograms exhibit behavior that bad AFAIK, although I
    expect that some large database applications (that have to run through
    multiple indices for each request, where the indices and/or data are too big
    for the caches) might approach it.

    ---Joel
     
    Joel Koltner, Jun 10, 2011
    #5
  6. Skybuck Flying

    Paul Guest

    Re: Can a x86/x64 cpu/memory system be changed into a barrel processor?

    Joel Koltner wrote:
    > "Skybuck Flying" <> wrote in message
    > news:58426$4df16c1b$5419acc3$1.nb.home.nl...
    >> The only thing my program needs to do is fire off memory requests.
    >>
    >> However it seems the x86 cpu blocks on the first memory request and
    >> does nothing else.

    >
    > Hmm, it shouldn't do that, assuming there aren't any dependencies
    > between the next handful of instructions and the first one there. (But
    > note that if you perform a load operation and the data isn't in the
    > caches, it takes *many tens to hundreds* of CPU cycles to fetch the data
    > from external DRAM; hence you *will* stall. There actually are
    > instructions in the x86 architecture these days for "warming up" the
    > cache by pre-fetching data, though -- this can help a lot when you know
    > in advance you'll need data, e.g., a few hundred cycles from now; if
    > you're looping over big sets of data, you just pre-fetch the next block
    > while you work on the current one.)
    >
    > A program that requests random memory accesses will very quickly stall
    > for a long time (after the first couple of instructions), as you quickly
    > exhaust the number of "memory read" resources available and have
    > near-constant cache misses. Few real-world pograms exhibit behavior
    > that bad AFAIK, although I expect that some large database applications
    > (that have to run through multiple indices for each request, where the
    > indices and/or data are too big for the caches) might approach it.
    >
    > ---Joel
    >


    The Intel processor also has prefetch options, and works with both
    incrementing memory access patterns or decrementing patterns. Using
    a "warm up" option is one thing, but the processor should also be
    able to handle prefetch on its own.

    Perhaps AMD has something similar ? Since this is posted to comp.arch,
    someone there should know. Skybuck's processor has an integrated memory
    controller, so there are possibilities.

    http://blogs.utexas.edu/jdm4372/201...ory-bandwidth-part-3-single-thread-read-only/

    Both Intel and AMD, will have documentation on their website, addressing
    the need to optimize programs to run on the respective processors. And
    that is a good place for a programmer to start, to find the secrets
    of getting best performance.

    Paul
     
    Paul, Jun 10, 2011
    #6
  7. Skybuck Flying

    Ken Hagan Guest

    Re: Can a x86/x64 cpu/memory system be changed into a barrelprocessor ?

    On Fri, 10 Jun 2011 01:58:09 +0100, Skybuck Flying
    <> wrote:

    > The only thing my program needs to do is fire off memory requests.
    >
    > However it seems the x86 cpu blocks on the first memory request and does
    > nothing else.


    How do you know? The whole point about out-of-order execution is that it
    is transparent to the software, so it is not possible to write a program
    whose behaviour depends on whether blocking occurs or not.

    If you have a logic analyzer and you think you have results that prove
    in-order behaviour then you'll have to provide more details. That said,
    such things are well outside my comfort zone so I personally won't be able
    to help.
     
    Ken Hagan, Jun 10, 2011
    #7
  8. Skybuck Flying

    MitchAlsup Guest

    Re: Can a x86/x64 cpu/memory system be changed into a barrelprocessor ?

    On Jun 10, 3:44 am, "Ken Hagan" <> wrote:
    > On Fri, 10 Jun 2011 01:58:09 +0100, Skybuck Flying  
    >
    > <> wrote:
    > > The only thing my program needs to do is fire off memory requests.

    >
    > > However it seems the x86 cpu blocks on the first memory request and does  
    > > nothing else.


    The CPU will not block if all of the outstanding accesses are to write-
    back cacheable memory.

    > How do you know? The whole point about out-of-order execution is that it  
    > is transparent to the software,


    No, the whole point of precise exceptions is to be trasnparent to
    software. The point of OoO is to improve performance, adding precise
    exceptions to OoO gives you high performance and is relatively
    transparent to software (but not entirely).

    > so it is not possible to write a program  
    > whose behaviour depends on whether blocking occurs or not.


    One can EASILY detect blocking (or not) by comparing the wall clock
    time on multi-million memory access codes. One can infer the latencies
    to the entire cache hierchy including main memory and whether or no
    main memory accesses are being processed with concurrency.

    Mitch
     
    MitchAlsup, Jun 10, 2011
    #8
  9. "Ken Hagan" wrote in message news:eek:...

    On Fri, 10 Jun 2011 01:58:09 +0100, Skybuck Flying
    <> wrote:

    > The only thing my program needs to do is fire off memory requests.
    >
    > However it seems the x86 cpu blocks on the first memory request and does
    > nothing else.


    "
    How do you know?
    "

    Good question, but not really.

    Let's just say I have a lot of programming experience.

    Some programs can do a lot while some can do only a little bit.

    The last category falls into "a lot of memory accesses".

    I have done many tests by now to confirm this.

    The evidence is not 100% water tight or 100% certain but I would be very
    surprised if it was not the thruth.

    Especially since gpu seems to execute it much faster and this was even on
    dx9 hardware instead of cuda, however those results are also in doubt,
    because
    it's almost to good to be true it fluctuated a bit.

    "
    The whole point about out-of-order execution is that it
    is transparent to the software, so it is not possible to write a program
    whose behaviour depends on whether blocking occurs or not.
    "

    What does this have to do with what I wrote... ? It's up to the programmer
    if he wants to use the blocking instructions or not.

    It's not that much of a big deal... windows has plenty of thread-blocking
    api's, which are designed to be blocking on purpose, to save cpu.

    Thread's even have an apc queue... where "messages/events" can be posted
    too... when they wake up... they can process it.

    "
    If you have a logic analyzer and you think you have results that prove
    in-order behaviour then you'll have to provide more details. That said,
    such things are well outside my comfort zone so I personally won't be able
    to help.
    "

    I have also read reports claiming that the cpu is 91% or so waiting on main
    memory.

    Bye,
    Skybuck.
     
    Skybuck Flying, Jun 10, 2011
    #9
  10. "Analyzing" the real world is pretty useless and the reason is very simple:

    Computer programs which were written which were slow would be dismissed by
    users.

    Programmers try to write programs so they be fast.

    Do not think that slow programs would be released.

    Therefore it becomes a self-forfilling prohecy...

    And be analyzing the current situation and adepting to that... you also keep
    the
    "chicken and egg" problem alive.

    No better hardware then no better software.

    Or vice versa:

    No slow software, they no faster hardware needed.

    Lastly:

    Ask yourself one very important big question:

    What does the R stand for in RAM ?

    I also tried prefetch it helps something like 1%, pretty fricking useless.

    Bye,
    Skybuck.
     
    Skybuck Flying, Jun 10, 2011
    #10
  11. Already tried prefetching for RAM it's pretty useless...

    Especially for random access, especially for dependancies, especially when
    the software doesn't yet know what to ask next.

    However the problem may be parallized.

    However the CPU stills blocks.

    Therefore making it parallel doesn't help.

    Only threading helps, but with two or four cores that doesn't impress.

    I might read the article later on but I fear I will be wasting my time.

    I scanned it a little bit, the code assummes insequence memory... pretty
    lame, it has nothing to do with R in RAM.

    Also my memory seeks are very short, 4 to 6 bytes, therefore fetching more
    is pretty useless.

    Bye,
    Skybuck.

    "Paul" wrote in message news:isru62$u64$...

    Joel Koltner wrote:
    > "Skybuck Flying" <> wrote in message
    > news:58426$4df16c1b$5419acc3$1.nb.home.nl...
    >> The only thing my program needs to do is fire off memory requests.
    >>
    >> However it seems the x86 cpu blocks on the first memory request and does
    >> nothing else.

    >
    > Hmm, it shouldn't do that, assuming there aren't any dependencies between
    > the next handful of instructions and the first one there. (But note that
    > if you perform a load operation and the data isn't in the caches, it takes
    > *many tens to hundreds* of CPU cycles to fetch the data from external
    > DRAM; hence you *will* stall. There actually are instructions in the x86
    > architecture these days for "warming up" the cache by pre-fetching data,
    > though -- this can help a lot when you know in advance you'll need data,
    > e.g., a few hundred cycles from now; if you're looping over big sets of
    > data, you just pre-fetch the next block while you work on the current
    > one.)
    >
    > A program that requests random memory accesses will very quickly stall for
    > a long time (after the first couple of instructions), as you quickly
    > exhaust the number of "memory read" resources available and have
    > near-constant cache misses. Few real-world pograms exhibit behavior that
    > bad AFAIK, although I expect that some large database applications (that
    > have to run through multiple indices for each request, where the indices
    > and/or data are too big for the caches) might approach it.
    >
    > ---Joel
    >


    The Intel processor also has prefetch options, and works with both
    incrementing memory access patterns or decrementing patterns. Using
    a "warm up" option is one thing, but the processor should also be
    able to handle prefetch on its own.

    Perhaps AMD has something similar ? Since this is posted to comp.arch,
    someone there should know. Skybuck's processor has an integrated memory
    controller, so there are possibilities.

    http://blogs.utexas.edu/jdm4372/201...ory-bandwidth-part-3-single-thread-read-only/

    Both Intel and AMD, will have documentation on their website, addressing
    the need to optimize programs to run on the respective processors. And
    that is a good place for a programmer to start, to find the secrets
    of getting best performance.

    Paul
     
    Skybuck Flying, Jun 10, 2011
    #11
  12. Skybuck Flying

    Joel Koltner Guest

    "Skybuck Flying" <> wrote in message
    news:60cd6$4df281c5$5419acc3$1.nb.home.nl...
    > Programmers try to write programs so they be fast.


    Well, yes and no -- these days, the vast majority of programs are at least
    *initially* written more from the point of view of trying to get them to be
    maintanable and correct (bug-free); after that has occurred, if there are any
    significant performance bottlenecks (and in many programs, there may not be
    because the app is waiting on the human user for input or the Internet or
    something else quite slow), programmers go back and work on those
    performance-critical areas.

    "We should forget about small efficiencies, say about 97% of the time:
    premature optimization is the root of all evil" -- Donald Knuth -- see:
    http://en.wikipedia.org/wiki/Program_optimization, where there are other good
    quotes as well, such as "More computing sins are committed in the name of
    efficiency (without necessarily achieving it) than for any other single
    reason - including blind stupidity."

    Also keep in mind that since the vast majority of code is now written in a
    high-level language, it's primarily the purview of a *compiler* to generate
    "reasonably" efficient code -- it has much more intimate knowledge of the
    particular CPU architecture being targeted than the programmer usually does;
    most programmers should be concetrating on efficient *algorithms* rather than
    all these low level details regarding parallelism, caching, etc.

    I mean, when I first started programming and learned C (back in the early
    '90s), you were often told a couple of "tricks" to make the code run faster at
    the expense of readibility. Today the advice is completely the opposite: Make
    the code as readable as possible; a good compiler will generally create output
    that's just as efficient as the old-school code ever was.

    > Do not think that slow programs would be released.


    "Slow" is kinda relative, though. As much as it pains me and others around
    here at times, it's hard to argue that just because a program is truly glacial
    on a 233MHz Pentium (the original "minimum hardware requirement" for Windows
    XP), if it's entirely snappy on a 2.4GHz CPU it's not *really* "slow."

    > No better hardware then no better software.


    Hardware has gotten better, and software design is still struglling to catch
    up: There's still no widely-adopted standard that has "taken over the world"
    insofar as programming efficiently for multi-core CPUs. Take a look at
    something like the GreenArrays CPUs: http://greenarraychips.com/ -- 144 CPU
    cores, and no fall-off-the-log easy method to get all of them to execute in
    parallel for many standard procedural algorithms.

    > Ask yourself one very important big question:
    > What does the R stand for in RAM ?


    Notice that your motherboard is populated with SDRAM, which is a rather
    different beast than "old school" RAM -- it's not nearly as "random" as you
    might like, at least insofar as what provides the maximum bandwidth.

    ---Joel
     
    Joel Koltner, Jun 10, 2011
    #12
  13. Skybuck Flying

    Joel Koltner Guest

    You know, Skybuck, if you don't like the way the x86 behaves, these days it's
    entirely straightforward to sit down and whip up your own "dream CPU" in an
    FPGA of your choice. :) OK, it's not going to run at 2.8GHz, but you can
    pretty readily get one to operate at 100MHz or so which, in some specialized
    applications, can be faster than a much more highly-clocked general-purpose
    CPU.

    And you'd get to start cross-posting to comp.arch.fpga as well! Wouldn't that
    be fun? Time to start reading up on VHDL or Verilog, perhaps? :)

    > I scanned it a little bit, the code assummes insequence memory... pretty
    > lame, it has nothing to do with R in RAM.


    As has been mentioned, these days RAM is so much slower than the CPU that, for
    truly random access, you can sometimes kill a hundred CPU clock cycles waiting
    for each result. Painful indeed...

    ---Joel
     
    Joel Koltner, Jun 10, 2011
    #13
  14. Skybuck Flying

    Dave Platt Guest

    In article <2ec60$4df2825b$5419acc3$1.nb.home.nl>,
    Skybuck Flying <> wrote:

    >Already tried prefetching for RAM it's pretty useless...
    >
    >Especially for random access, especially for dependancies, especially when
    >the software doesn't yet know what to ask next.
    >
    >However the problem may be parallized.
    >
    >However the CPU stills blocks.
    >
    >Therefore making it parallel doesn't help.
    >
    >Only threading helps, but with two or four cores that doesn't impress.


    And, depending on the threading model of the processor, it may not
    help at all. In many CPUs, multiple threads running in the same core
    are sharing the same local cache and memory bus - run two threads
    which "fight" over the bus doing the same sorts of inefficient
    accesses, and the throughput of each thread drops by half (roughly).

    >I might read the article later on but I fear I will be wasting my time.
    >
    >I scanned it a little bit, the code assummes insequence memory... pretty
    >lame, it has nothing to do with R in RAM.


    "Random access" does *not* imply "equal, constant-time access". I
    don't think it has ever done so, at least not in the computing
    industry. Certain forms of sequential or nearly-sequential access
    have always been faster, on most "random access" devices.

    All that "random" means, in this context, is that you are *allowed* to
    access memory locations in an arbitrary sequence - you are not being
    *forced* into a purely sequential mode of access.

    >Also my memory seeks are very short, 4 to 6 bytes, therefore fetching more
    >is pretty useless.


    You're facing a characteristic which is inherent in the way that DRAM
    works. Your "barrel processor" approach really won't help with this.

    The characteristic is this: DRAM is organized, internally, into
    blocks. It takes the DRAM chips a significant amount of time to
    prepare to transfer data in or out over the memory bus, and it takes a
    significant amount of time to transfer each byte (or word, or
    whatever) over the bus to/from the CPU. Every time you want to access
    a different area of the DRAM, you have to "pay the price" for the time
    needed to access that part of the chip and transfer the data.

    This is, in a sense, no different that what happens when you access a
    hard drive (which is also "random access"). Time is required to move
    the head/arm, and wait for the platter to rotate.

    In the case of DRAM, the "motion" is that of electrical charge, rather
    than a physical arm... but it's motion nevertheless (it takes work and
    expends energy) and it takes time.

    In *any* CPU architecture (single, multi-threaded, multi-core, barrel,
    etc.) that depends on DRAM, you'll run into memory-bus stalls if you
    try accessing memory in patterns or ways which exceed the capacity of
    the CPU's own local (static) registers and cache.

    Your barrel architecture, with a queue of requests submitted but not
    yet satisfied by the DRAM controller, will run into trouble in just
    the same way. Eventually your queue of requests will fill up (unless
    your CPU has an infinite amount of queue space) and you won't be able
    to queue up any more requests until DRAM gets around to delivering
    some of the data you asked for a while ago.

    A big part of smart programming design, is figuring out when solving
    your problem in the "obvious" way (e.g. accessing memory at random) is
    going to be inherently inefficient, and then figuring out ways to
    "rewrite the problem" so that it's easier to solve more efficiently.

    A common approach (dating back many decades) is to figure out ways of
    sorting some of your inputs, so that you can process them in sorted
    order more efficiently.

    --
    Dave Platt <> AE6EO
    Friends of Jade Warrior home page: http://www.radagast.org/jade-warrior
    I do _not_ wish to receive unsolicited commercial email, and I will
    boycott any company which has the gall to send me such ads!
     
    Dave Platt, Jun 10, 2011
    #14
  15. What you wrote is old and foolish wisdom, I will give a for simple example
    of how foolish it is:

    You can spent a great deal of time trying to come up with a better algorithm
    for the "travelling salesman" problem or whatever.

    But if you never take a look at the actual transportation device and it
    turns out it was implemented with snails it's useless none the less.

    Only god knows how many programmers have wasted time after time after time
    trying to implement something, some algorithm, some program and ultimately
    end up with useless slow crap that nobody in this world needs.

    If your software competitor does understand hardware better and does come up
    with an optimized design from the start guess who is going to loose:

    You, you, you and you.

    To be able to write good/fast software at all requires some understanding of
    how the hardware works, what it's performance characteristics are, what the
    numbers are etc.

    The deeper the understanding the better, however with all this "magic"
    (crap?) going on in the background/cpu tricks it's hard for programmers to
    understand what's going on.

    These tricks might also be counter-productive, some have already mentioned
    hyperthreading as counter productive.

    Compilers don't optimize algorithms, they don't determine your algorithm or
    data structure or if you should use blocking or non blocking code, compilers
    are usually about the little things, the instructions, some instructions
    optimizations here and there... these are usually little optimizations,
    perhaps up to 30% or so from human written code, but that won't help if the
    program is 1000 to 10000% inefficient.

    Not all programmers are equal, some are noobs and some are frustrated
    "experts" or "experienced" programmers seeking more performance from their
    hardware.

    Noobs are nice but when it comes to writing high performance programs it's
    pretty safe to dismiss them, since they are still struggling to learn how to
    write decent programs, and have enough theory to understand first.

    For the experts there is also a danger that knowing to much about the
    hardware, trying to seek to much about the hardware might actually prevent
    them from writing anything at all, because they either can't make up their
    mind, or they know it’s not going to give the desired performance, or always
    seeking more.

    For some it might be wise not to write anything and to wait it out until
    some good hardware comes along so they can pour their energy into that.

    Shall we forget about the noobs for a moment, shall we move on towards
    experts for a moment, which have actually already written many programs, and
    now these experts are looking for ways to make these programs run faster,
    these programs are trying to solve problems and it takes a lot of time for
    the program to solve the problem.

    In other words they want to solve the problem faster.

    So far perhaps multi-core makes it possible because it has local data cache,
    every core has it's own data cache, this could be one reason why multi-core
    works.

    However it could also be because of more memory accesses, I am not yet sure
    which of the reasons leads to the higher performance.

    Is multi-core a "cache solution" ? Or is it more like a "barrel processor"
    solution ?

    ^ This is important question and important answer to find out.

    If it's the first case then it's not the second case and my assumption that
    second case might lead to be better performance might be wrong.

    However not really, because a barrel processor could also "simply" divide
    it's work onto multiple chips which would also all be connected to their own
    processor.
    ^ Still a bit vague but I am getting an idea which I shall sketch below:

    Memory cells:
    0 1 2
    012345678901234567890123456789
    ##############################

    Queues:

    Q Q Q

    Processors:
    P P P P P P P P


    Each queue takes responsibility for certain parts of the memory chips.

    Instead of the processor communicating directly with the entire memory chip,
    the processors start communicating with the queues and place their requests
    in the appriorate queue.

    This divides the work somewhat, especially for random access.


    The queues now communicate with the memory chips, the queues never overlap
    with each other's memory responsibility.

    So Q1 takes 0 to 9
    So Q2 takes 10 to 19
    So Q3 takes 20 to 29

    This way multiple memory address requests can be forfilled at the same time.

    The processors might also be able to go on and not worry about it to much
    the queue's take care of it.

    The question is if the processors can queue it fast enough, probably so...

    Some queue locking might have to be done if multiple processors try to
    request from same memory region... though smarter programmers/programs might
    not do that and take responsibility for their own memory sections and use
    their own memory sections and make sure it don't overlap.

    Seems like a pretty good plan to me... I would be kinda surprised if
    processors/memories not already do this ?! ;)


    Bye,
    Skybuck.
     
    Skybuck Flying, Jun 11, 2011
    #15
  16. See my replay to one of the other guys, it contains a design how multiple
    queues could be used to speed up the system.

    I shall repeat it here somewhat:

    Memory-------------------------------->

    Queues ------>

    Processors-------->

    Each queue is attached to a part of the memory.

    The processors fill their requests to the queues instead of directly to
    memory. This would free the processors to do something else.

    The queues also distribute to work load to the memory and it's a form of
    distribution.

    In this way the memory could also start to work in parallel.

    Regarding your post:

    Anyway, one thread per core probably works because each core has it's own
    cache, but the question remains if multi-core would also increase speed with
    main memory.

    Also the filling up of the queues does not have to be a problem and probably
    is not a problem it might even be desired.

    Since the idea is to keep the processor and the memory system at work, while
    one processor might be waiting for queue space to become available others
    might already
    be processing a lot. It's about machine gunning the memory system.

    You yourself write the memory system needs to do a lot of stuff.

    Ask yourself now the following question: what happens if the cpu is waiting
    for this ?!?

    Answer: nothing.

    The cpu does nothing.

    I suggest the cpu does something, for example prepare to next and the next
    and the next memory request, at least this might hide the latency of the cpu
    side of
    doing these memory requests, so that would at least help somewhat.

    Ultimately the cpu can only be serviced with memory as fast as the memory
    system can deliver it so I agree with you on that part somewhat, but this is
    also
    the part that probably needs to be worked on...

    Memory systems need to be able to deliver memory faster and perhaps more in
    parallel to cpu's, and cpu's need to be able to requests more in parallel
    too.

    The cpu is probably ready to process data but is now in current situation
    being starved of data, not a good situation.

    Adding more and bigger caches to cpu's is probably not the answer because
    this takes away of the number of cores that could be available and takes
    away some of the processing power that could else have been present.

    My fear would be that x86 will ultimately loose out because it applied trick
    after trick after trick after trick to try and keep it alive instead of
    trying to solve the inherent problem
    which is a slow memory system.

    Trying to solve this by adding it's own memory system with "cache" is
    probably not a good solution and will ultimately kill off x86.

    CPU manufacturer will have to work together with memory manufacturer to try
    and come up with a solution which will make the cpu be able to work at full
    speed being fed by memory.

    Bye,
    Skybuck.
     
    Skybuck Flying, Jun 11, 2011
    #16
  17. Skybuck Flying

    Jamie Guest

    Re: Can a x86/x64 cpu/memory system be changed into a barrel processor?

    Skybuck Flying wrote:

    > See my replay to one of the other guys, it contains a design how
    > multiple queues could be used to speed up the system.
    >
    > I shall repeat it here somewhat:
    >
    > Memory-------------------------------->
    >
    > Queues ------>
    >
    > Processors-------->
    >
    > Each queue is attached to a part of the memory.
    >
    > The processors fill their requests to the queues instead of directly to
    > memory. This would free the processors to do something else.
    >
    > The queues also distribute to work load to the memory and it's a form of
    > distribution.
    >
    > In this way the memory could also start to work in parallel.
    >
    > Regarding your post:
    >
    > Anyway, one thread per core probably works because each core has it's
    > own cache, but the question remains if multi-core would also increase
    > speed with main memory.
    >
    > Also the filling up of the queues does not have to be a problem and
    > probably is not a problem it might even be desired.
    >
    > Since the idea is to keep the processor and the memory system at work,
    > while one processor might be waiting for queue space to become available
    > others might already
    > be processing a lot. It's about machine gunning the memory system.
    >
    > You yourself write the memory system needs to do a lot of stuff.
    >
    > Ask yourself now the following question: what happens if the cpu is
    > waiting for this ?!?
    >
    > Answer: nothing.
    >
    > The cpu does nothing.
    >
    > I suggest the cpu does something, for example prepare to next and the
    > next and the next memory request, at least this might hide the latency
    > of the cpu side of
    > doing these memory requests, so that would at least help somewhat.
    >
    > Ultimately the cpu can only be serviced with memory as fast as the
    > memory system can deliver it so I agree with you on that part somewhat,
    > but this is also
    > the part that probably needs to be worked on...
    >
    > Memory systems need to be able to deliver memory faster and perhaps more
    > in parallel to cpu's, and cpu's need to be able to requests more in
    > parallel too.
    >
    > The cpu is probably ready to process data but is now in current
    > situation being starved of data, not a good situation.
    >
    > Adding more and bigger caches to cpu's is probably not the answer
    > because this takes away of the number of cores that could be available
    > and takes away some of the processing power that could else have been
    > present.
    >
    > My fear would be that x86 will ultimately loose out because it applied
    > trick after trick after trick after trick to try and keep it alive
    > instead of trying to solve the inherent problem
    > which is a slow memory system.
    >
    > Trying to solve this by adding it's own memory system with "cache" is
    > probably not a good solution and will ultimately kill off x86.
    >
    > CPU manufacturer will have to work together with memory manufacturer to
    > try and come up with a solution which will make the cpu be able to work
    > at full
    > speed being fed by memory.
    >
    > Bye,
    > Skybuck.
    >


    Give it up!

    Your so far behind the eight ball that you look like a dinosaur.

    Jamie
     
    Jamie, Jun 11, 2011
    #17
  18. Skybuck Flying

    Joel Koltner Guest

    "Skybuck Flying" <> wrote in message
    news:addb1$4df2b862$5419acc3$1.nb.home.nl...
    > If your software competitor does understand hardware better and does come up
    > with an optimized design from the start guess who is going to loose:
    >
    > You, you, you and you.


    Actually "conventional wisdom" in business today is that "first to market" is
    often far more important than "bug-free and feature-laden." Sadly this is
    true in many cases, although there are plenty of counter-examples as well:
    Tablet PCs were largely ignored (even though they'd been around for a decade
    or so) until Apple introduced the iPad, and now they're the fastest growing
    segment of PCs.

    > To be able to write good/fast software at all requires some understanding of
    > how the hardware works, what it's performance characteristics are, what the
    > numbers are etc.


    Again, it really depends on the application. If you're writing a web browser,
    of the dozen guys you might have on the team doing so, I doubt more than 1 or
    2 really need to understand the underlying hardware all that well. Heck, a
    lot of people -- myself included -- use library files for cross-platform
    development specifically so that we don't *have* to understand the low-level
    architecture of every last OS and CPU we're targeting; many applications just
    don't need every last once of CPU power available.

    > The deeper the understanding the better, however with all this "magic"
    > (crap?) going on in the background/cpu tricks it's hard for programmers to
    > understand what's going on.


    That's very true.

    But look... I grew up with a Commodore 64. It was very cool, and I knew a
    large fraction of everything there was to know about it, both at the hardware
    and the software levels. But today's PCs are different -- there's *no one
    single person at Intel who thoroughly understands every last little technical
    detail of a modern Pentium CPU*, just as there's *no one single person at
    Microsoft who thoroughly understands every last little technical detail of
    Windows*. That's just how it is for desktop PCs -- they're so complex, very
    few people are going to code at, e.g., the raw assembly level for an entire
    application (a notable exception might be someone like Steve Gibson -- and
    even there, his assembly code ends up calling OS routines that were written in
    C...); might find some comfortable balance between development time and
    performance.

    (One can have that same sort of "Commodore 64" experience today with the
    myriad of microcontrollers available. Or heck, build your own system-on-chip
    in an FPGA... cool beans!)

    > Compilers don't optimize algorithms, they don't determine your algorithm or
    > data structure or if you should use blocking or non blocking code, compilers
    > are usually about the little things, the instructions, some instructions
    > optimizations here and there... these are usually little optimizations,
    > perhaps up to 30% or so from human written code, but that won't help if the
    > program is 1000 to 10000% inefficient.


    Agreed, although I think you underestimate just how good optimizing compilers
    are as well -- in many cases they're far better than the average programmer in
    rearranging code so as to optimize cache access and otherwise prevent pipeline
    stalls.

    > However it could also be because of more memory accesses, I am not yet sure
    > which of the reasons leads to the higher performance.


    Join the crowd. As has been mentioned, Intel and AMD spend many millions of
    dollars every year simulating all sorts of different CPU architectures in
    their attempts to improve performance.

    ---Joel
     
    Joel Koltner, Jun 11, 2011
    #18
  19. Skybuck Flying

    Guest

    On Fri, 10 Jun 2011 18:25:41 -0700, "Joel Koltner"
    <> wrote:

    >"Skybuck Flying" <> wrote in message
    >news:addb1$4df2b862$5419acc3$1.nb.home.nl...
    >> If your software competitor does understand hardware better and does come up
    >> with an optimized design from the start guess who is going to loose:
    >>
    >> You, you, you and you.

    >
    >Actually "conventional wisdom" in business today is that "first to market" is
    >often far more important than "bug-free and feature-laden."


    The real problem is "feature-laden" trumps "bug-free" every time.

    >Sadly this is
    >true in many cases, although there are plenty of counter-examples as well:
    >Tablet PCs were largely ignored (even though they'd been around for a decade
    >or so) until Apple introduced the iPad, and now they're the fastest growing
    >segment of PCs.


    Yup. Couldn't give 'em away until Jobs put the "cool" label on them. ...and
    there was still resistance. Anyone remember the iMaxi for the iPad?

    >> To be able to write good/fast software at all requires some understanding of
    >> how the hardware works, what it's performance characteristics are, what the
    >> numbers are etc.

    >
    >Again, it really depends on the application. If you're writing a web browser,
    >of the dozen guys you might have on the team doing so, I doubt more than 1 or
    >2 really need to understand the underlying hardware all that well. Heck, a
    >lot of people -- myself included -- use library files for cross-platform
    >development specifically so that we don't *have* to understand the low-level
    >architecture of every last OS and CPU we're targeting; many applications just
    >don't need every last once of CPU power available.


    He did state "good/fast" as assumptions. ;-)

    >> The deeper the understanding the better, however with all this "magic"
    >> (crap?) going on in the background/cpu tricks it's hard for programmers to
    >> understand what's going on.

    >
    >That's very true.


    Yup. Having debugged the "magic", even with insider scoop, I can agree that
    it's a bitch. ;-)

    >But look... I grew up with a Commodore 64. It was very cool, and I knew a
    >large fraction of everything there was to know about it, both at the hardware
    >and the software levels. But today's PCs are different -- there's *no one
    >single person at Intel who thoroughly understands every last little technical
    >detail of a modern Pentium CPU*, just as there's *no one single person at
    >Microsoft who thoroughly understands every last little technical detail of
    >Windows*. That's just how it is for desktop PCs -- they're so complex, very
    >few people are going to code at, e.g., the raw assembly level for an entire
    >application (a notable exception might be someone like Steve Gibson -- and
    >even there, his assembly code ends up calling OS routines that were written in
    >C...); might find some comfortable balance between development time and
    >performance.


    If you "ignore" things like the process, physics, and other gooey stuff, I bet
    you're wrong. I can well imagine that there are CPU architects in Intel who
    do know all the gory details of a particular CPU. They may not know the
    circuit-level functioning but from a micro-architecture standpoint, I'm sure
    there are some who do.

    >(One can have that same sort of "Commodore 64" experience today with the
    >myriad of microcontrollers available. Or heck, build your own system-on-chip
    >in an FPGA... cool beans!)


    Too much like work. ;-)

    >> Compilers don't optimize algorithms, they don't determine your algorithm or
    >> data structure or if you should use blocking or non blocking code, compilers
    >> are usually about the little things, the instructions, some instructions
    >> optimizations here and there... these are usually little optimizations,
    >> perhaps up to 30% or so from human written code, but that won't help if the
    >> program is 1000 to 10000% inefficient.

    >
    >Agreed, although I think you underestimate just how good optimizing compilers
    >are as well -- in many cases they're far better than the average programmer in
    >rearranging code so as to optimize cache access and otherwise prevent pipeline
    >stalls.


    The compilers are smarter than the "average programmer"? That's supposed to
    be surprising?

    >> However it could also be because of more memory accesses, I am not yet sure
    >> which of the reasons leads to the higher performance.

    >
    >Join the crowd. As has been mentioned, Intel and AMD spend many millions of
    >dollars every year simulating all sorts of different CPU architectures in
    >their attempts to improve performance.
    >

    ....and millions more verifying that their CPUs actually do what they're
    supposed to.
     
    , Jun 11, 2011
    #19
  20. Skybuck Flying

    mikea Guest

    In alt.comp.periphs.mainboard.asus zzzzzzzz <> wrote:
    > On Fri, 10 Jun 2011 18:25:41 -0700, "Joel Koltner"
    > <> wrote:
    >
    >>"Skybuck Flying" <> wrote in message
    >>news:addb1$4df2b862$5419acc3$1.nb.home.nl...
    >>> If your software competitor does understand hardware better and does come up
    >>> with an optimized design from the start guess who is going to loose:
    >>>
    >>> You, you, you and you.

    >>
    >>Actually "conventional wisdom" in business today is that "first to market" is
    >>often far more important than "bug-free and feature-laden."

    >
    > The real problem is "feature-laden" trumps "bug-free" every time.
    >
    >>Sadly this is
    >>true in many cases, although there are plenty of counter-examples as well:
    >>Tablet PCs were largely ignored (even though they'd been around for a decade
    >>or so) until Apple introduced the iPad, and now they're the fastest growing
    >>segment of PCs.

    >
    > Yup. Couldn't give 'em away until Jobs put the "cool" label on them. ...and
    > there was still resistance. Anyone remember the iMaxi for the iPad?
    >
    >>> To be able to write good/fast software at all requires some understanding of
    >>> how the hardware works, what it's performance characteristics are, what the
    >>> numbers are etc.

    >>
    >>Again, it really depends on the application. If you're writing a web browser,
    >>of the dozen guys you might have on the team doing so, I doubt more than 1 or
    >>2 really need to understand the underlying hardware all that well. Heck, a
    >>lot of people -- myself included -- use library files for cross-platform
    >>development specifically so that we don't *have* to understand the low-level
    >>architecture of every last OS and CPU we're targeting; many applications just
    >>don't need every last once of CPU power available.

    >
    > He did state "good/fast" as assumptions. ;-)
    >
    >>> The deeper the understanding the better, however with all this "magic"
    >>> (crap?) going on in the background/cpu tricks it's hard for programmers to
    >>> understand what's going on.

    >>
    >>That's very true.

    >
    > Yup. Having debugged the "magic", even with insider scoop, I can agree that
    > it's a bitch. ;-)
    >
    >>But look... I grew up with a Commodore 64. It was very cool, and I knew a
    >>large fraction of everything there was to know about it, both at the hardware
    >>and the software levels. But today's PCs are different -- there's *no one
    >>single person at Intel who thoroughly understands every last little technical
    >>detail of a modern Pentium CPU*, just as there's *no one single person at
    >>Microsoft who thoroughly understands every last little technical detail of
    >>Windows*. That's just how it is for desktop PCs -- they're so complex, very
    >>few people are going to code at, e.g., the raw assembly level for an entire
    >>application (a notable exception might be someone like Steve Gibson -- and
    >>even there, his assembly code ends up calling OS routines that were written in
    >>C...); might find some comfortable balance between development time and
    >>performance.

    >
    > If you "ignore" things like the process, physics, and other gooey stuff, I bet
    > you're wrong. I can well imagine that there are CPU architects in Intel who
    > do know all the gory details of a particular CPU. They may not know the
    > circuit-level functioning but from a micro-architecture standpoint, I'm sure
    > there are some who do.
    >
    >>(One can have that same sort of "Commodore 64" experience today with the
    >>myriad of microcontrollers available. Or heck, build your own system-on-chip
    >>in an FPGA... cool beans!)

    >
    > Too much like work. ;-)
    >
    >>> Compilers don't optimize algorithms, they don't determine your algorithm or
    >>> data structure or if you should use blocking or non blocking code, compilers
    >>> are usually about the little things, the instructions, some instructions
    >>> optimizations here and there... these are usually little optimizations,
    >>> perhaps up to 30% or so from human written code, but that won't help if the
    >>> program is 1000 to 10000% inefficient.

    >>
    >>Agreed, although I think you underestimate just how good optimizing compilers
    >>are as well -- in many cases they're far better than the average programmer in
    >>rearranging code so as to optimize cache access and otherwise prevent pipeline
    >>stalls.

    >
    > The compilers are smarter than the "average programmer"? That's supposed to
    > be surprising?
    >
    >>> However it could also be because of more memory accesses, I am not yet sure
    >>> which of the reasons leads to the higher performance.

    >>
    >>Join the crowd. As has been mentioned, Intel and AMD spend many millions of
    >>dollars every year simulating all sorts of different CPU architectures in
    >>their attempts to improve performance.
    >>

    > ...and millions more verifying that their CPUs actually do what they're
    > supposed to.


    Can you say "F00F"? How about "2+2=3.9999999999999"? Sometimes they
    miss an important case -- or, even worse, an important _class_ of cases.

    --
    I suspect that if the whole world agreed to run on GMT, France would still
    insist on GMT+1 just to annoy the British.
    -- Seen in a newsgroup thread on Daylight Saving Time
     
    mikea, Jun 11, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Eric
    Replies:
    4
    Views:
    309
    Rolf Egil Sølvik
    May 1, 2004
  2. banu
    Replies:
    3
    Views:
    671
    Yousuf Khan
    May 13, 2004
  3. Jadranko Albert

    New Windows Vista x86 and x64 v7.1

    Jadranko Albert, Jan 29, 2007, in forum: ATI
    Replies:
    2
    Views:
    227
    First of One
    Jan 30, 2007
  4. hizark21

    Barrel plug-in adapter for AC-adapter

    hizark21, Jul 8, 2009, in forum: Laptops
    Replies:
    1
    Views:
    542
    BillW50
    Jul 10, 2009
  5. Bill Giovino
    Replies:
    0
    Views:
    302
    Bill Giovino
    Apr 26, 2011
Loading...

Share This Page