1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

SGI takes Itanium & Linux to 1024-way

Discussion in 'Intel' started by Yousuf Khan, Jul 18, 2004.

  1. Yousuf Khan

    Yousuf Khan Guest

    Yousuf Khan, Jul 18, 2004
    #1
    1. Advertisements

  2. Yousuf Khan

    Robert Myers Guest

    "The users get one memory image they have to deal with," he [Pennington,
    the interim director of NCSA] said. "This makes programming much easier,
    and we expect it to give better performance as well."

    Too early to call it a trend, but I'm encouraged to see the godfather of
    the "Top" 500 list talking some sense as well:

    callysto.hpcc.unical.it/ hpc2004/talks/dongarra-survey.ppt

    slides 37 and 38.

    A single system image is no simple cure. It may not be a cure at all.
    But it's encouraging that somebody is taking it seriously enough to
    build a kilonode machine with a single address space.

    "Scalability" being a challenge for such installations (you can't just
    order more boxes and more cable and take another rural county out of
    agricultural production to move "up" the "Top" 500 list) the premium is
    on processors with high single-thread throughput.

    RM
     
    Robert Myers, Jul 18, 2004
    #2
    1. Advertisements

  3. Robert Myers wrote:

    [SNIP]
    Hats off to SGI, kilonode ssi is a neat trick. :)

    Let's say you write code that makes use of a large single system
    image machine. Let's say SGI fall behind the curve and you need
    answers faster : Where can you go for another large single system
    image machine ?

    I see that kind of awfully clever machine as vendor lock-in waiting
    to happen. If you want to avoid lock-in you end up writing your
    code to the lowest common denominator, and in this case that will
    probably remove any advantage gained by SSI (application depending
    of course).

    Cheers,
    Rupert
     
    Rupert Pigott, Jul 18, 2004
    #3
  4. Yousuf Khan

    Robert Myers Guest

    What curve are we keeping up with these days?

    The difference in scalability between the Altix and Blue Gene is
    interesting mostly if you're trying to hit arbitrarily definied
    milestones in a Gantt chart.

    For hydro, a factor of ten in machine size is a 78% increase in number
    of grid points available to resolve a given scale: whoop-de-ding. Maybe
    there's something different about actinide-lanthanide decay series
    that's worth understanding. I'll get around to it some time--even
    though I strongly suspect I'm being led on a wild goose chase. The real
    justification for the milestones on the Gantt chart of the last of the
    big spenders is that a petaflop is a nice big round number for a goal.
    Blue Gene is now not awfully clever? :).

    Commodity chip, flat address space. That sounds pretty vanilla to me.
    How do you get more common than that? You can get an Itanium box with a
    flat address space to your own personal work area much more readily than
    you can get a Blue Gene.

    There is no way not to leave you with the idea that I think single-image
    machines are the way to go. I don't know that, and I'm not even certain
    what course of investigation I would undertake to decide whether the way
    to go or not. What I like about the single address space is that it
    would appear to make the minimum architectural imposition on problem
    formulation.

    RM
     
    Robert Myers, Jul 19, 2004
    #4
  5. Extend that argument further and you are buying Xeons.

    The point is 1000 node machines with shared address spaces don't
    fall out of trees. Who said anything about BlueGene anyways ?
    Over the long run I think it will be very hard to justify the
    extra engineering and purchase cost over message passing gear.
    People made the similar argument for CISC machines too. VAX
    polynomial instructions come to mind. :)

    Cheers,
    Rupert
     
    Rupert Pigott, Jul 19, 2004
    #5
  6. Yousuf Khan

    Robert Myers Guest

    There is a fair question that could be asked for almost any application
    these days: why not ia-32 (probably with 64-bit extensions). When
    you've got superlinear interconnect costs, you want each node to be as
    capable as possible. The application of that argument to Itanium in
    this particular argument is wobbly, since the actual usefulness of
    Itanium may be just as theoretical as the usefulness of the clusters
    I've been worrying about.
    I did. Blue Gene was the best contrast I could think of to a single
    image Itanium machine in terms of cost, energy efficiency, and
    scalability. There is no fundamental reason why BlueGene couldn't
    become widely used and accepted, but it probably won't be because it
    won't show up in the workspace of your average graduate student or
    postdoc.

    Your question is what do we do when we need more than 1000 nodes. It's
    a fair question, but not the only one you could ask. My questions are:
    where does the software that runs on the big machine come from, in what
    environment was it developed, at what cost, and with what opportunities
    for continued development.
    Hardware is cheap, software is expensive. If we've run out of
    interesting things to do with making processors astonishingly powerful
    and inexpensive, we certainly haven't run out of interesting things to
    do in making interconnect astonishingly powerful and inexpensive.
    The RISC/CISC argument went away when microprocessors were developed
    that could hide RISC execution behind a CISC programming model. The
    neat hardware insight (RISC) did not, in the end, impose itself on
    applications. No more should a particular hardware reality about
    multi-processor machines impose itself on applications.

    RM
     
    Robert Myers, Jul 19, 2004
    #6
  7. Cluster style systems should be fairly easy to come by at that level.

    Apps written for clusters should port to another cluster system easier
    than apps written for a shared memory system to a cluster. It's a
    matter of choice over the long run... If you use the unique features
    of a kilonode Itanium box then you're basically locked-in. Clearly
    this is not an issue for some establishments, Cray customers are a
    good example. :p
    Of course. Like I said I don't see 1000 node ssi machines falling
    out of trees. I do see depts assembling a few hundred beige boxes
    and a nightmare of hodgepodge of switches though. :)
    I think that is well in hand to be honest. Plenty of options there
    but everyone ends up going Ethernet anyways. :p
    I doubt this would have gone very far without the highly visible
    captive market (ie: WINTEL desktop).

    I have indirectly acknowledged that NUMA machines can in principle do
    the same job as clusters (it would be silly not to). They implement a
    superset of the comms functionality required by clusters.

    HOWEVER... There are some differences in the market place. The captive
    market is rather small, there is less money to develop whizz bang
    solutions and amortize the cost than there was with x86. What we have
    seen are folks who can't afford to splash a few $1m on a box building
    clusters that are "good enough" and that is how the market has been
    broadened. The SSI machines don't have the stranglehold on the market
    that x86 did.

    Opteron is interesting because it is sort of halfway there, but still
    constrained to small processor count machines. HT is not a spec that
    you can DL over the web and peruse at your leisure, but I note that
    folks who have are not particularly happy with it's error handling.
    Apparently it just crashes and burns so you have to reset the sucker,
    which is UNACCEPTABLE on a SSI machine. Think of the fun^Wchallenge
    in handling that in the OS and applications.

    If they fix HT or someone proves to me that it recovers fine, then
    maybe my opinion. In the meanwhile I think interconnect for beige
    boxes will get even better, as icky as that might be... And yes, I
    *do* understand that some apps really don't fit clusters well, fair
    play, go tug your forelock at SGI's door. :)

    Cheers,
    Rupert
     
    Rupert Pigott, Jul 19, 2004
    #7
  8. Yousuf Khan

    Hank Oredson Guest

    Hank Oredson, Jul 19, 2004
    #8
  9. Yousuf Khan

    Robert Myers Guest

    They are, indeed, and they are widely used.
    You are apparently arguing for the desirability of folding the
    artificial computational boundaries of clusters into software. If
    that's a necessity of life, I can learn to live with it, but I'm having
    a hard time seeing it as desirable. We are so fortunate as to live in a
    universe that presents itself to us in midtower-sized chunks? I'm
    worried. ;-).
    Can you give an example of something that you think would happen?
    Well, you said it; I didn't. If you have an environment where more
    flops are an end in themselves--and we do have such an environment--then
    you don't have to worry about how much productivity your nightmare
    produces as long as the photo in the alumni newsletter looks convincing.
    Even more depressing, if your goal is to crank out papers and Ph.D.
    theses, you may do pretty well with beige boxes and cheap labor and have
    very little impact on applied science and technology, because people
    trying to solve real world problems can't wait for a grad student and a
    post doc to spend a semester getting the cluster shaken down, and even
    if they could it wouldn't make any economic sense because the labor
    costs are too high.
    I really do think, now that PCI Express is here, that the day of
    infiniband, at least for this particular space, is finally at hand.

    I was actually imagining that there is really nothing to keep the
    prerequisites for a single image box from becoming more of a commodity.

    I take the current market fragmentation as confirmation of my world view
    that none of the tools we currently possess are really all that good. ;-).

    There is a national lab presentation that argues rather touchingly that
    supercomputers really can produce results that are qualitatively better
    than workstations. You think that successful bureaucrat would even have
    brought it up if he hadn't been challenged on the matter?

    The optimistic view is that the chaos we currently see is the HPC
    equivalent of the pre-Cambrian explosion and that natural selection will
    eventually give us a mature and widely-adopted architecture. My purpose
    in starting this discussion was simply to opine that single image
    architectures have some features that make them seem promising as a
    survivor--not a widely-held view, I think.
    Geez, Rupert, they couldn't possibly be as bad as IBM used to be. :).
    I can live with clusters. It may be that living with clusters is an
    inevitable necessity. I'm not yet ready to give up on a single address
    space, though.

    RM
     
    Robert Myers, Jul 20, 2004
    #9
  10. That happens with SSI systems too. There is a load of information that
    has been published about scaling on SGI's Origin machines over the
    years. IIRC Altix is based on the same Origin 3000 design. You may
    remember that I quizzed Rob Warnock on this, he said that there were
    in practice little gotchas that tend to crop up at particular #'s of
    procs. He even noted that the gotcha processor counts tended to change
    with the particular generation of Origin.
    In my mind it's a question of fitting our computing effort to reality
    as opposed to living in an Ivory Tower. Some goals, while worthy,
    desirable, or even partially achievable, are basically impossible to
    achieve in reality. A genuinely *flat* address space is impossible
    right here and now. That SSI Altix box will *not* have *flat* address
    space in terms of time. It is a NUMA machine. :)
    Depends on the app. Stuff like memory mapping one large file for read
    and occasional write could cause some fantastic locking + latency
    issues when it comes to porting. :)

    [SNIP]
    Shaking down large + fast machines has traditionally been a costly
    and risky business. Look at all those machines that spent hours
    with grads all over them and didn't really make an impact, thinking
    of stuff like the bigger ETAs, TM-5s didn't seem to do much either.

    Shaking down Crays took some time too, although to be fair they do
    have a good rep for reliability once setup. However Crays are toys
    by comparison to contemporary big systems (component count etc)...

    In terms of sorting out clusters and stuff there is obviously a
    niche there, from what I read it appears to be getting filled too.
    Yeah, interconnect is catching up at bloody last. You will always
    have latency problems while we're communcating < c m/s though,
    regardless of whether you present your network to the application
    as a single address space or not.
    I mentioned Opteron, if HT really does suffer from crash+burn on
    comms failure then it is holding itself back. If that ain't the
    case I'd have figured that a tiny form factor Opteron + DRAM +
    router cards would be a reasonable component for high-density
    clusters and beige SSI machines. You'd need some facility for
    driving some links for longer distances than HT currently allows
    too ($$$). The next thing holding you back is tuning the OS + Apps
    to a myriad of possible configurations... :(

    [SNIP]
    I'm sure they'll have their place. But in the long run I think that
    PetaFLOP pressure will tend to push people towards message passing
    style machines. Consdier this though : Internet is becoming more and
    more prominent on daily life. The Spooks must have a fair old time
    keeping up with the sheer volume of data flowing around the globe.
    Distributed processing is a natural fit here, SSI machines just would
    not make sense. More and more governments and their civil servants
    will want to make use of this surveillance resource too, check out
    the rate at which legislation is legitimising their intrusion on the
    individual's privacy. The War on Terror has added more fuel to that
    growth market too. :)
    Probably not because they are a niche player beholden to a few very
    powerful customers.
    Fair enough. Just don't hold your breath waiting for a kilonode SSI
    machine to fall into your lap. :)

    Cheers,
    Rupert
     
    Rupert Pigott, Jul 20, 2004
    #10
  11. Yousuf Khan

    Robert Myers Guest

    Well, yes, it is. The spread in latencies is more like a half a
    microsecond, as opposed to five microseconds for the latest and greatest
    of the DoE build-to-order specials.

    On the question of Ivory Towers vs. reality, I believe that I am on the
    side of the angels, naturally. If you believe the right question really
    is: "What's the least expensive way we can get a high Linpack score?",
    then clusters are a slam dunk, but I don't think that anybody worth
    talking to on the subject really thinks that's the right question to be
    asking.

    As to access to 1000-node and even bigger machines, I don't need them.
    What I need is to know what kind of machine a code is likely to run on
    when somebody decides an NCSA-type installation is required.

    How you will _ever_ scale _anything_ to the kinds of memory and and
    compute requirements required to do even some very pedestrian problems
    properly is my real concern, and, from that point of view, no
    architecture currently on the table, short of specialized hardware, is
    even in the right universe.

    Given that _nothing_ currently available can really do the physics
    right--with the possible exception of things like the Cell-like chips
    the Columbia QCD people are using--and that nothing currently available
    really scales in a way that I can imagine, I'm inclined to give heavy
    emphasis to useability.
    I understand just enough about operating systems to know that building a
    1000-node image that runs on realizable hardware is a real
    tour-de-force. I also understand that you can take off-the-shelf copies
    of, say, RedHat Linux, and some easily-obtainable clustering software
    and (probably) get a thousand beige boxes to run like a kilonode
    cluster. Someone else (Linus, SGI, et al) wrote the Altix OS. Someone
    else (Linus, RedHat, et al) wrote the OS for the cluster nodes. I don't
    want to fiddle with either one. You want me to believe that I am better
    off synchronizing processes and exchanging data across infiniband stacks
    and through trips in and out of kernel and user space and with heaven
    only knows how many control handoffs for each exchange than I am reading
    and writing to my own user space under the control of a single OS, and I
    just don't.

    I'm guessing that, the promise of Opteron for HPC notwithstanding, HT is
    going to be marginalized by PCI Express/Infiniband.
    Nothing that _I_ say about distributed processing is going to slow it
    down, that's for sure, and that isn't my intent. If you've got a
    google-type task, you should use google-type hardware. Computational
    physics is not a google-type task.

    RM
     
    Robert Myers, Jul 20, 2004
    #11
  12. I find a claim of 500ns very hard to believe given the physical size of
    the machine... I suppose they could cheat and slow down all accesses to
    within 500ns of the worst case, but I don't believe SGI would compromise
    in that way. The DoE build-to-order specials were considerably larger
    when I looked at them last and that would make a significant difference
    even before you took interconnect into account.
    It's a question of which route is going to provide the solutions over
    the long haul. NUMA/SSI has to solve the exact same problems as Message
    Passing, just that it hides it from the programmer (in theory). As a
    programmer I hate stuff that's swept under the carpet, as it usually
    trips me up sometime later.

    I had this debate with a friend who was convinced that threads were the
    way of the future... He ran into a wall pretty quickly and decidede that
    they were OK up to a point because he ended up having to go coding in a
    message passing style despite using a thread mechanism. Performance and
    maleability were the key issues for his relatively modest problem.
    Last time I checked BlueGene/L and QCD shared people, design and
    expertese. No surprise they are similar to Cell in your estimation.
    I don't really believe in silver bullets, I have come to accept that
    there is no one true way to build MPP machines. Another way of putting
    it is that General Purpose machinary usually sucks for pushing the
    limits of a particular field.

    [SNIP]
    Hell yeah. Programmer nearly always has more domain knowledge than
    the Compiler, OS, Interconnect and Processor. Why not use it ?
    I don't think that is necessary. In fact I know it is not necessary,
    I had 100+ processes per processor in a 300 node grid back in the 90s
    and it was old hat then. No OS, no Ethernet, no TCP/IP, no Inifiband
    was necessary.

    TCP/IP & Ethernet (insert world+dog problem solving interconnect de
    jour) uber alles is not helping anyone.

    [SNIP]
    Sigh... Just more guff in the way of sanity and lightweight comms.
    If I was working at an outfit mucking with this kind of gear I'd wear
    a T-Shirt with "CUT THE CRAP" on it. :)

    SGI/Alpha 21364 get their latency figures by not trying to solve world
    + dog's problems with their interconnect. The interconnect is purpose
    built for the job. The performance is *not* a function of NUMA/SSI, it
    is a pre-requisite for NUMA/SSI. That is *precisely* where QCD/BlueGene
    are coming from too. Think about it...


    Cheers,
    Rupert
     
    Rupert Pigott, Jul 20, 2004
    #12
  13. Yousuf Khan

    Robert Myers Guest

    A NASA press release from last November
    http://www.arc.nasa.gov/aboutames-pressrelease.cfm?id=10000087 states
    the worst-case communication latency to be "less than a microsecond" for
    a 512-processor Altix. I've had my hands on sharper numbers, but I
    can't find them on the instant. The physical size of the machine can't
    be _that_ much of an issue, 3x10e8 m/s x 10e-6 s/us = 300m/us.

    You have to lay the carpet somewhere. The question is: which details to
    hide and which details to force the user to worry over. We may not
    agree about what to hide, but you have to hide something, and people
    have plenty enough to think about without adding details that really
    have nothing to do with the actual calculation...
    ....and I find a featureless computational space attractive, even if the
    featurelessness is factitious. If the programming models were settled
    and the software tools mature, I might not think that making unnecessary
    details visible weren't such an imposition, but the programming models
    aren't settled and the software tools aren't mature.

    I'd rather have someone lay a computational model on a plain background,
    rather than justify the computational model as appropriate because it's
    what the hardware dictates--and that _is_ what has happened with
    BlueGene and RedStorm.

    Because there are so many other things to think about.
    Lightweight threads, lightweight comms. All possible, I guess. The
    people who have the resources to provide the leadership don't seem to
    find the enterprise interesting. Look at the fight for survival
    infiniband has had.
    Exactly so. If you want to get a single image to run on a kilonode,
    your comms have to be pretty slick. As a result, you won't have to pay
    too much attention to lame stories about "nearest neighbor" comms. I
    like that.

    RM
     
    Robert Myers, Jul 20, 2004
    #13
  14. Stills sounds unlikely to me. There aren't any 512 node Altixen
    falling out of the trees round here so I am unable to independantly
    confirm or deny their results. :)
    Who doesn't ?
    Ah, now there we differ. I have been bitten by too many corner
    cases, to much erroneous behaviour and far too often by insufficiently
    spec'd systems. It's just not funny anymore, and the root cause 9/10
    times is the vendor trying to do too much.
    Why should hardware solve that for software ? It's a software problem,
    not a hardware one.
    I believe you have been pointed at papers that detail specific
    applications that BlueGene was designed to solve. If the DoE dudes
    want to do something different with it, so be it. It wasn't designed
    in a vacuum.

    [SNIP]
    Eh ? The whole point of a programmer is to fit the domain knowledge
    to the tool (as far as that is possible).

    The argument for simple hardware is that the programmer has *less*
    corner cases to worry about and can spend less time fighting the
    hardware. Simplicity has other benefits that the customer does not
    see : It helps the Vendor do a more thorough validation of the
    platform.

    The amount of time I have spent working around errant HW and SW
    and wished for something closer to the metal you would not believe.

    This is from a guy who thinks C sucks for app programming too. :p
    Infiniband is about 10,000,000 miles away from lightweight comms,
    compare and contrast with IEEE-1355 for example.
    Point is though that HW and SW layered on top to present an illusion
    of a single address space isn't for free and you still trip over the
    stuff it hides under the red carpet.

    Cheers,
    Rupert
     
    Rupert Pigott, Jul 20, 2004
    #14
  15. Yousuf Khan

    Robert Myers Guest

    Now _that_ is discouraging. If we don't know how to put a large number
    of processors together so that the environment presented to the
    application is reliable, we are in trouble.

    But are you sure you want to go down this road? We were talking about a
    specific vendor here.
    Oh, I'm turning into a casual and careless human factors engineer. ;-).

    If you give people a hardware environment that invites confusion between
    the hardware and software model, people will, perforce, be confused.

    I do think that you have unrealistic expectations for the capabilities
    and instincts of the average practitioner of the computational arts. If
    you give people the opportunity to obsess about hardware details, that's
    what they will obsess about.
    Not to worry. I've actually had people say more intelligent and
    insightful things about the logic of the packet-switched architecture of
    BlueGene and RedStorm and there are more intelligent things written
    down, but I've heard and seen the nearest-neighbor argument often enough
    to believe that that's how too many people are thinking, no matter how
    wrong the logic may be. The argument is actually considerably more
    complicated, and the matter is far from settled or even clear in my own
    mind.
    This is an issue that might warrant the attentions of a careful
    anthropologist; viz, the difference between what people claim they would
    insist on in terms of reliability and what they actually accept and the
    practical consequences of self-delusion.

    If what you are saying is an accurate reflection of reality, then I
    would say: essay less.

    Someone who really does essay less, of course, risks losing a
    competitive advantage, possibly even to the extent of losing the
    opportunity to compete entirely.

    In the world of "good enough" commodity hardware, maybe "good enough"
    isn't good enough at all.
    Even your _own_ c code? ;-).
    I was putting infiniband forward only as an example of what happens to
    anything that isn't ethernet. :). A quick perusal of IEEE-1355 reveals
    that it has the same problem everything else has: bandwidth requirements
    are increasing faster than people can even write specs.
    Well, I do at least take your point.

    RM
     
    Robert Myers, Jul 21, 2004
    #15
  16. I maintain that this is best left to the application and OS to sort
    out. The HW can do a lot to assist by providing features to help with
    fault detection and isolation.
    Most big systems already have. What do you think checkpointing is
    about ?

    FWIW I think SGI are one of the better outfits around, I like their
    NUMAFlex (hopefully remembered the right name) stuff, looks neat.

    [SNIP]
    If anything I think eschewing complexity in hardware would help clear
    this up somewhat. In case you haven't noticed CISC machines have a habit
    of being treated as RISC machines. The confusion is "WTF are all these
    clever instructions around for ? Why don't I have enough registers ?".
    The gold diggers are getting down-sized and their jobs exported. The
    guys who care about their work will fight tooth and nail to keep their
    jobs and so I think the net result will be an improvement in skill
    levels.
    With MPP my feeling is that you *have* to obsess about hardware details
    at the moment.

    [SNIP]
    "Good enough" hardware has a habit of pushing the cost elsewhere,
    ie: Onto the developer.
    Hell yeah. I hate using C for stuff like string bashing for example,
    way too fiddly. Compare and contrast with something like Python.

    [SNIP]
    IEEE-1355 has it's origins some fifteen years back. Switch to a
    diff PHY layer but keep the switching etc.

    The point about 1355 is : At the logical level it specs pretty
    much **** all. That mandates that vendors implement **** all,
    which means you have **** all to go wrong or get in the
    programmer's way. :)

    Cheers,
    Rupert
     
    Rupert Pigott, Jul 21, 2004
    #16
  17. Yousuf Khan

    Robert Myers Guest

    Possibly so.

    I would summarize our positions as "giving people the illusion of a
    featureless computational space frees users to think about other things"
    and "you're only kidding yourself; in the end, it won't help, because
    the complexity is there and you'll have to deal with it, anyway."

    As to gold-diggers and competence and whatnot, I believe the problem is
    harder than you seem to. The seductive trap of computation is that you
    can almost always do _something_. The practitioner has no choice but to
    deal with issues, like instabilities that lead to floating point errors,
    that keep the computation from proceeding. Hardware and software issues
    that keep the computation from proceeding must similarly be dealt with.
    Once you've dealt with those issues, how much time is left for
    mathematics, science, and engineering? Often, not enough, and, if
    you've got a product, survival demands a declaration of victory and
    moving on.

    The subtext of the current push for more flops is that it will all get
    better when the computers get bigger. There are problems that you just
    cannot do without more muscle. To the extent that we acquire the
    ability to address larger classes of problems, things will, indeed, be
    getting better. As to the credibility and usefulness of computation,
    I'm not entirely certain that things are getting better.

    "But a single system image won't help," I'm sure you will say. Fair enough.

    As always, though, the complexity has to go somewhere. What I can see
    of IEEE 1355 looks like an open source project to me. With open source,
    you don't have critical information hidden behind NDA's and much of the
    decision making and discussion is out in the open and can easily be
    accessed. Better than vendor-driven committees? I certainly think so.
    You still wind up with many of the same problems, though: software
    encrusted with everybody's favorite feature and interfaces that get
    broken by changes that are made at a level you have no control over
    (like the kernel) and that ripple through everything, for example.

    A fair number of people who get involved in these discussions are people
    with a Physics/EE background who are fairly confident do-it-yourselfers,
    and a fair bit of the puttering comes from places where there are people
    wandering around with screwdrives who also know C and a little physics.
    I wonder if part of what you object to with systems like Altix is that
    it seems like movement away from open systems and back to the bad old
    days. Could a bunch of geeks with a little money from, say, DARPA, do
    better? Maybe. I think it's been tried at least once. ;-).

    RM
     
    Robert Myers, Jul 21, 2004
    #17
  18. Yes. I am painfully aware of Mashey's concerns about pushing
    complexity from one place to another.
    LOL, not at all. It was a write up of the T9000's VCP. Bits and pieces
    of that technology have made their way into proprietry solutions.

    [SNIP]
    Of course, but it's easier to change a kernel than it is to respin
    silicon, or replace several thousand busted boards, right ? A lot of
    MPP machines seem to give the customer access to the kernel source
    which makes it easier for the desperados to fix the problems. :)

    [SNIP]
    I don't have a problem with Altix at all. I have a *concern* that
    the SSI feature is rather like putting Chrome on a Porsche 917K if
    you are really interested in getting good perf out of it on an
    arbitary problem + dataset. Data locality is still a key issue.

    I don't deny that it will make some apps easier, but in those cases
    you are wide open to vendor lock-in IMO. There are worse vendors
    than SGI of course, and I don't think they would be quite as evil
    as IBM were reputed to be.

    For those two reasons I question the long term viability of SSI
    MPP machines.

    Cheers,
    Rupert
     
    Rupert Pigott, Jul 21, 2004
    #18
  19. Yousuf Khan

    Tony Nelson Guest

    Lets say, instead, that one has an application that seems to require a
    256 node machine, but that need might grow in the next couple of years.
    SGI's announcement takes the risk out of choosing SGI for that
    application.

    And after a few more years, a then current 256 node machine will be able
    to take the place of a current 1024 node monster, if the application
    doesn't grow too much and one is only worried about the machine or SGI
    wearing out.
    ____________________________________________________________________
    TonyN.:'
    '
     
    Tony Nelson, Jul 22, 2004
    #19
  20. Tony Nelson wrote:

    [SNIP]
    Regardless you are still effectively locked in if you become dependant
    on the SSI feature.

    There are also some other factors to take into account... Such as does
    your application scale to 1024 on that mythical machine ? If it does
    not who do you turn to if you are committed to SSI ?
    Assuming clock rate cranking continues to pay off and the compilers
    improve significantly. I figure it'll come down to how much cache Intel
    can cram onto an IA-64 die, and that is a diminishing returns game.

    BTW : If you read through the immense amount of opinionated stuff I
    posted you will see that I actually give SGI some credit. The question
    I raise though is : Is SSI really that useful given the lock-in factor ?

    Cheers,
    Rupert
     
    Rupert Pigott, Jul 22, 2004
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.