* Bart:
> Can you point me to any documentation that explains this any better?
> I've never heard of this about the 1k/2k blades. Most any multiprocess
> system architecture I am familiar with has some overhead with the
> memory controller.
Standard multiprocessor computers have a common memory controller and
common memory. This is called UMA (Uniform Memory Access) architecture.
It looks like this:
[I/O]
|
[CPU0]---[CPU1]
|
[MEMORY CONTROLLER]-[MEMORY]
Good examples of UMA machines are most servers and workstations with
older intel XEON processors (pre-XEON 5500 series). These XEON
processors have a common (or with XEON 5000 series separate) FSB to
communicate with an external memory controller (Northbridge). This
basically means for a given situation the performance of each CPU
accessing memory is always the same, no matter which CPU does the access
and no matter which area of the system memory is accessed. However,
because the memory controller is outside the CPU, accessing memory takes
time (higher latency), and especially with older XEONs with common
single FSB the FSB limits the actual bandwidth available to the system
memory.
However, the UltraSPARCIII-based Sun machines like the SB1000/2000/2500
as well as AMD Opteron-based computers are NUMA[1] (Non-Uniform Memory
Access) architecture. NUMA means that every processor has its own memory
controller (which in case of UltraSPARC III and AMD Opteron is built
into the CPU) and its own local memory. NUMA looks like this:
[I/O]
|
[CPU0]-[MEMORY CONTROLLER]-[MEMORY]
|
[CPU1]-[MEMORY CONTROLLER]-[MEMORY]
|
[I/O]
The advantage is that every CPU has very fast access to its local RAM
(low latency), and it doesn't have to share the bandwidth with the other
processor. However, as soon as a CPU has to access memory connected to
another CPU, things get much slower as it has to go over the other
processor to access its memory. Now the memory performance depends which
part of the system memory has to be accessed, if it is local it is fast,
if it is connected to another processor it is slow. Therefore NUMA needs
a NUMA-aware OS (like Solaris, Windows or Linux) which distributes
processes and assigns memory in a way that processes use system RAM
connected to the processor it runs on.
As to the Sun Blade 1000/2000/2500, it is a crippled NUMA system which
basically looks like this:
[I/O]
|
[CPU0]-[MEMORY CONTROLLER]-[MEMORY]
|
[CPU1]-[MEMORY CONTROLLER]
While both processors do have memory controllers, only the first CPU can
actually have physical RAM. This means all processes running on the
second processor have to go over the primary one to access RAM as the
second CPU doesn't have local memory. This has quite a huge impact on
memory-intensive multiprocessor applications.
Ben
[1]
http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access