Where best to discuss support/bugs in 10.3 for XServe

Discussion in 'Apple' started by Frank Durda IV, Jan 4, 2004.

  1. I am new to the comp.sys.mac* groups (sorry for the cross post - please set
    any followups to the more appropriate place). It is not clear from the group
    descriptions which would be most appropriate to what I want to discuss,
    but one of these seemed most likely.

    My company has a number of XServes (normal and cluster/headless versions),
    each with a single fully-loaded XRAID attached. We have had some of these
    for 18 months or so, and apart from several really irritating (and in some
    cases, limiting) bugs, we did okay using these first-generation server units
    in a storage server configuration, even though we have to leave a big chunk
    of the disk space on each of the 2TB XRAID units unallocated at all times
    to avoid triggering a repeatable and fatal bug that we discovered.

    However, when we upgraded to a 10.3 Beta, not only were the old serious bugs
    that we had reported a year earlier not fixed, but we picked up some new
    serious bugs, including a weekly panic which we were eventually able to work
    around, but don't yet have a proper fix for. In fact, to be able to report
    that crash bug to Apple support, we had to upgrade to the official 10.3
    release and show it still failed. It did, and in going to the official
    10.3 release over the 10.3 beta, our crash rate abruptly increased from
    the one per week (which we discovered a way to avoid) to as many as ten
    per day which we can only partly but not predictably avoid.

    These systems have no third-party software installed (just what comes on
    the OSX release CD), and only Apple hardware, all configured in ways shown
    in Apple documents. Apart from the intermittent administrator, no one logs
    into these systems.

    One of todays crashes looks like this in the panic.log (BTW, 10.3.2 and
    10.3-RELEASE crash equally well and all the recent crashes are virtually
    identical):

    Sat Jan 3 07:34:07 2004

    Unresolved kernel trap(cpu 0): 0x300 - Data access DAR=0x0000000000000058 PC=0x0
    0000000001A1348
    Latest crash info for cpu 0:
    Exception state (sv=0x2D173A00)
    PC=0x001A1348; MSR=0x00009030; DAR=0x00000058; DSISR=0x40000000; LR=0x001A12DC; R1=0x1C14BA10; XCP=0x0000000C (0x300 - Data access)
    Backtrace:
    0x001A12DC 0x0018EACC 0x001A2DC0 0x001A24B0 0x0023DAA4 0x00093C00 0x00740065
    Proceeding back via exception chain:
    Exception state (sv=0x2D173A00)
    previously dumped as "Latest" state. skipping...
    Exception state (sv=0x2C504500)
    PC=0x9005F4CC; MSR=0x0000F030; DAR=0x00003100; DSISR=0x0A000000; LR=0x000023A4; R1=0xBFFFFCC0; XCP=0x00000030 (0xC00 - System call)

    Kernel version:
    Darwin Kernel Version 7.0.0:
    Wed Sep 24 15:48:39 PDT 2003; root:xnu/xnu-517.obj~1/RELEASE_PPC


    panic(cpu 0): 0x300 - Data access
    Latest stack backtrace for cpu 0:
    Backtrace:
    0x000833B8 0x0008389C 0x0001ED8C 0x00090800 0x00093A6C
    Proceeding back via exception chain:
    Exception state (sv=0x2D173A00)
    PC=0x001A1348; MSR=0x00009030; DAR=0x00000058; DSISR=0x40000000; LR=0x001A12DC; R1=0x1C14BA10; XCP=0x0000000C (0x300 - Data access)
    Backtrace:
    0x001A12DC 0x0018EACC 0x001A2DC0 0x001A24B0 0x0023DAA4 0x00093C00 0x00740065
    Exception state (sv=0x2C504500)
    PC=0x9005F4CC; MSR=0x0000F030; DAR=0x00003100; DSISR=0x0A000000; LR=0x000023A4; R1=0xBFFFFCC0; XCP=0x00000030 (0xC00 - System call)

    Kernel version:
    Darwin Kernel Version 7.0.0:
    Wed Sep 24 15:48:39 PDT 2003; root:xnu/xnu-517.obj~1/RELEASE_PPC



    *********

    We also are a tad concerned about wacky system messages that every one of
    our systems produce on each boot (even after processors, logic board and
    RAM were replaced on one of them):

    ApplePMU::pMU FORCED SHUTDOWN, CAUSE = -93

    and
    localhost /usr/libexec/panicdump: Error (-1) setting variable - 'aapl,panic-info'
    localhost SystemStarter: Loading Shared IP extension
    localhost SystemStarter: crash reporter (208) did not complete successfully.

    and a variety of others.


    We don't seem to be getting any action out of Apple on fixing any of our big
    issues despite having support contracts, although they did recently suggest
    reporting our bugs to various open source mailing lists.
    (If that's what they really want a customer that is paying for maintenance
    to have to go and do...)

    Subsequently, I'm curious as to what newsgroups these and the other OSX flaws
    we've uncovered should be discussed further.

    Thanks.


    Frank Durda IV - only this address works:|"I used to take unwanted Microsoft
    <LOSEuhclem.jan04%nemesis.lonestar.org> | Compact Discs and use them as
    You must remove the "LOSE" to mail me. | drink coasters, but their CDs, like
    http://nemesis.lonestar.org | their software, have holes in them."
    Copr. 2004, ask before reprinting.
     
    Frank Durda IV, Jan 4, 2004
    #1
    1. Advertisements

  2. Frank Durda IV

    Jim Polaski Guest

    I think this is going to be a question best answered by one of the folks
    here who is a Mac Developer.
     
    Jim Polaski, Jan 4, 2004
    #2
    1. Advertisements

  3. Frank Durda IV

    ZnU Guest

    You'd probably have better luck on the MacOSX-admin mailing list:

    http://www.omnigroup.com/developer/mailinglists/macosx-admin/

    [snip]

    --
    "Our country puts $1 billion a year up to help feed the hungry. And we're by far
    the most generous nation in the world when it comes to that, and I'm proud to
    report that. This isn't a contest of who's the most generous. I'm just telling
    you as an aside. We're generous. We shouldn't be bragging about it. But we are.
    We're very generous."
    -- George W. Bush in Washington, D.C., July 16, 2003
     
    ZnU, Jan 4, 2004
    #3
  4. Frank Durda IV

    Peter KERR Guest

    There are lists & discussion boards also at

    http://lists.apple.com/

    http://discussions.info.apple.com/

    This looks like a catch-all trap that is reported on desktop systems
    with faulty ram, ie. physically bad chips, or chips not meeting Apples
    timing specs. But if you have factory or approved vendor fitted ram...
    There are a swag of these that come and go with every incremental
    upgrade. Normally they don't do anything outright bad, but most are
    caused by mismatch between what a StartupItem script asks and what the
    current binary can actually do. I gave up trying to clean them out
    because the next upgrade brought them back and/or new ones...
     
    Peter KERR, Jan 4, 2004
    #4
  5. Frank Durda IV

    Peter KERR Guest

    As a curious aside, how many of those do you have stacked in one rack?
    & what temperatures are you seeing in them?

    I have a single X-Serve which I'm trying to build a hush kit for, so we
    can bring it "indoors". Without going into details, I have the internal
    ambient at 43C and cpu at 49C. Server Monitor redlines internal ambient
    at 55C, but has no redline for cpu.
     
    Peter KERR, Jan 4, 2004
    #5
  6. : As a curious aside, how many of those do you have stacked in one rack?
    : & what temperatures are you seeing in them?

    Well, the XServes alternate between XRAID units, so the 1U XServes are
    essentially 4U apart, but there is the XRAID with less depth between them.
    Our units are mounted in both 4-point rack enclosures and open-frame racks.

    Our earliest XServes would not mount in any of the 4-point racks we owned
    because the mounting hardware didn't align, so those units just rest on top
    of the XRAIDs they control. The mounting hardware that came with the XServe
    cluster/headless model would mount correctly in these 4-point racks, but the
    XServe lids still collapse when the insides are pulled out of the unit,
    requiring two and sometimes three people to re-insert the XServe into the
    lid pan. (That's one to hold the XServe, one to lift the collapsed XServe
    lid pan at the rear of the rack, and sometimes one to keep that silly power
    cord lock clip from catching on the machine immediately below during XServe
    re-insertion. The secret appears to be to connect the power cord to the
    XServe prior to insertion, attach the locking clip to the power cord, and
    then snake the cord to the back with the guy who already has to be in the
    back to hold the lid up. Then you only need two people.)

    At this exact moment, there aren't more than two XServes in close proximity
    to one another in our shop, as they are scattered around the facility.

    Since we don't do GUI, we don't see those GUI-only thermal displays very
    often and so I don't recall what range of temperature values I saw last.
    The rooms have a controlled ambient of about 69F 24x7, but hot spots
    immediately behind some of the 4-point racks get up into the 80s.

    Based on what I've seen, the 4-drive XServe could never seriously be used
    with multiple units touching one another in a normal enclosed rack. The
    air flow through the unit is just too feeble despite the "charcoal starter"
    grade fans inside. I think it really relies on the entire case acting as
    a heat sink, something impossible to do if units are tightly stacked, as
    only the top and bottom units will see any significant heat sinking.
    We've also had trouble with the little tiny vent holes in the front becoming
    clogged with dust, but don't know how much that affects processor temperature.

    The headless/cluster unit has far more reasonable looking airflow, but can
    be a pain to use for other reasons.

    I'll mention that most of our 1U PCs also suffer from heat problems when
    mounted against one another, even in open-frame racks, so we put some air
    gap between each system. This is true of slower Celeron/Duron 1U PCs too.
    The 1U PCs also tend to go through processor fans at an astounding rate.
    One year is about all the life we see from them before they lose enough
    revs to start causing problems or just die.


    : I have a single X-Serve which I'm trying to build a hush kit for, so we
    : can bring it "indoors".

    Yeah, XServes are noisy. Just about everyone who has visited our quieter
    data center has noticed the loudest machine in that part of the room,
    competing well in the decibel department against an Ascend TNT, which has
    something like ten fans in its case. I suspect that if Apple had made more
    and larger vent holes in the front of the XServe case, it would be easier
    to get fresh air inside and the fans inside could have been made
    smaller/quieter.


    Frank Durda IV - only this address works:|"Microsoft says no-pay open-source
    <LOSEuhclem.jan04%nemesis.lonestar.org> | and public domain software are evil.
    You must remove the "LOSE" to mail me. | Then they should pull all Internet
    http://nemesis.lonestar.org | protocols, BASIC, and web/HTML
    Copr. 2004, ask before reprinting. | support from their products at once."
     
    Frank Durda IV, Jan 5, 2004
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.