SABERTOOTH X58 clock drift

Discussion in 'Asus' started by DevilsPGD, Apr 6, 2011.

  1. DevilsPGD

    DevilsPGD Guest

    Anyone else out there with a SABERTOOTH X58 have trouble keeping the
    clock accurate?

    I've got two of these systems, both with i7-950 CPUs, and both seem to
    be losing several minutes an hour until NTP notices and corrects it,
    then the process starts over again.

    Updated the BIOS (longshot, I know), all applicable drivers on my
    primary system are as up to date as possible with drivers from
    individual manufacturers where available. On the second system, I'm
    running only ASUS reference drivers for the motherboard itself, plus
    applicable drivers for add-on hardware.

    The two systems have little else in common besides the CPU, motherboard,
    same brand and type of RAM (12GB and 6GB) and same keyboard, all other
    components are different. Both running Windows 7 Pro SP1 x64.
     
    DevilsPGD, Apr 6, 2011
    #1
    1. Advertisements

  2. DevilsPGD

    Paul Guest

    I find the best overview on time keeping, is provided by the virtual
    machine software writers. They do a much better job, than any single
    description provided by an OS designer.

    http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf

    Network time protocol (NTP), can do a couple things for you. At
    the instant you make a request to a server (sync), you get a
    single point correction to the correct time. But, in addition,
    some clients will also track the nature of the offset, and can
    compute a "drift" factor. Say the hardware clock source (oscillator)
    that is used to pace the clock tick interrupts, is 20 ppm slow.
    After enough observations have been made, NTP can "see" the steady drift.
    This allows two kinds of corrections to be made. You can make a
    correction each time you consult the server (say, every three days).
    But, you can also "dribble out" corrections to the perceived drift factor,
    at a much higher speed. Say, for example, you know you'll lose 10 seconds
    within the next three days. You could take the inverse of that, and
    every 7.2 hours, make an "unsynced" one second correction. When the
    third day rolls around, if the time keeping error is a simple drift,
    then your sync will be "bang on".

    Because of that design possibility, with the right NTP client,
    you can virtually eliminate initial oscillator accuracy as an issue.
    And this is why I wouldn't personally waste a lot of time, belaboring
    the quality of motherboard implementations. If there is a steady
    drift, you could fix it.

    But drift isn't entirely like that. Perhaps you're 20ppm slow, but
    there is also a slight temperature dependency (real oscillators use
    temperature compensation, via the temperature coefficient of some
    of the components in the circuit). So if the room gets
    warm, that creates a divergence from the simple model. While an NTP
    client can do the "first order dribble", it would need to take
    into account all other physical dependencies and curve fit them,
    to do more than that. Say, for example, the computer had an oscillator
    temperature sensor - temperature readings could be captured at the
    point of NTP time sync, and you could then build a temperature model.
    (You would then make much more frequent local temperature measurements,
    to compute the expected amount of required correction, and dribble
    that out too.)

    Now, in addition to the various ideas in the VMWare document, on
    real computers you also have the issue of SMM. SMM allows a motherboard
    manufacturer to run BIOS code, while the OS is running, without
    you knowing it, or having any means to detect it directly. This
    can degrade the real-time responsiveness of a computer, and mainly
    because the OS has no say in the matter - the SMM is not maskable.

    http://en.wikipedia.org/wiki/System_Management_Mode

    A possible reason for running such code at regular intervals, is
    for fan speed control. Or perhaps some other control function,
    such as something to do with the number of phases enabled on
    the Vcore regulator or the like.

    When SMM runs, the OS has no way of knowing it. If the SMM
    code execution time is short, no harm is done. If the SMM
    execution time exceeds the period of one clock tick interrupt,
    then a clock tick can get lost. This causes the software maintained
    clock to run slow. And if the SMM isn't consistently causing
    a problem, then NTP can't null out all the effects.

    An indirect way to monitor this, is checking "DPC service latency".
    DPC is a deferred procedure call. On your computer, when an interrupt
    routine services an interrupt, it saves the "heavyweight" part of
    the code for later. Only the most critical code runs at interrupt level,
    while code requiring longer run time, runs at user level. A DPC is
    scheduled for later, to finish servicing the interrupt. The following
    program, measures how long it takes for a DPC sitting in queue,
    to finally get serviced. Long delays, implies *something* is going
    on in the background.

    http://www.thesycon.de/deu/latency_check.shtml

    Now, one thing I've noticed on my system here, is there is a pretty
    good sized spike, when my video card changes in or out of 3D mode.
    But other than that, I don't see any signs I have an SMM problem.
    My RMS latency is pretty low, as seen in the DPC latency check window.

    Gigabyte released a couple of boards, that needed BIOS updates to
    cure DPC latency issues. To track an issue like this, you find
    people who build audio workstations, and see what boards they're having
    problems with. The motherboard manufacturers never admit to what
    monkey business they're up to, under the hood, so it's not
    possible to say much more about SMM code, and why a BIOS update
    may or may not fix it. They certainly won't admit "our SMM code
    exceeded one clock tick" in their release notes. Presumably more
    than Gigabyte has had this problem - I'm not trying to pick on
    Gigabyte here, I just read a couple long-running threads about
    attempts to get that kind of stuff fixed. Some people are very
    sensitive to the qualities of that DPC latency check tool above,
    and they'll toss motherboards that don't have good behavior.

    Another mechanism that destroys time keeping, is actual hardware
    defects. On the Nforce2 chipset, some kind of issue with the
    interrupt controller, while the chipset was slightly overclocked,
    caused really bad time problems. And a log of the sync info from
    NTP, showed +/- errors of large magnitude (sometimes fast, sometimes
    slow). So much so, that even if NTP was cranked to the wall, the
    system clock was useless. Disabling APIC (one of the two flavors
    of legacy interrupt controller), or returning the FSB clock
    to a canonical value, would fix it (most of the time). Not every
    Nforce2 system experienced that - more info on that one, can be
    found on Nforcershq.com .

    So maybe that'll give you some ideas to look into.

    HTH,
    Paul
     
    Paul, Apr 6, 2011
    #2
    1. Advertisements

  3. DevilsPGD

    DevilsPGD Guest

    In message <ini61a$g2o$> someone claiming to be Paul
    I'm aware I can bandaid around it with NTP, but 5+ minutes per hour is a
    pretty serious defect, well beyond an acceptable level of clock drift
    for any purpose.

    I'm running a stock default BIOS configuration, except that I've enabled
    AHCI. I did experiment with turning off the spread-spectrum options (a
    poor implementation can apparently cause clock drift issues).

    Think it's worth playing with APIC or is that likely to be specific to
    Nforce2's implementation? At least from what I can tell the problem
    seems to be more related to Linux's APIC implementation.

    FWIW I actually don't even care about an omission, if Asus would release
    a BIOS update fixing it I'd be a happy guy. My desktop is on the
    latest, the other one is on the as-shipped BIOS, both seem to be having
    the same issue.
     
    DevilsPGD, Apr 8, 2011
    #3
  4. DevilsPGD

    Paul Guest

    Well, I'm grasping at straws here.

    Time keeping, while the OS is running, is based on counting clock tick
    interrupts. But alternate mechanisms exist, which do much the same thing.

    All schemes, eventually trace to a motherboard oscillator.

    Even with spread spectrum, the oscillator has a "mean" value, meaning if you
    count the pulses over an interval longer than milliseconds, the spread
    is no longer apparent. If the time keeping function was interested in
    microsecond level resolution, there might be a barely visible effect.
    But at the seconds level, this is all averaged out and invisible.

    Say you have a 100MHz oscillator. With "center spread" type enabled,
    the mean value is 100.0MHz, when measured over a seconds long interval.
    With "down spread", the mean value might be 99.5MHz. Even without
    calibration, I doubt down spread could account for a 5 minutes per hour
    level of error. And any timekeeping scheme, should have initial calibration,
    to establish whether the tick rate has any relationship to a canonical
    value.

    http://www.mecxtal.com/images/ssc_centerdown.gif

    An NTP client, should be able to null out any "average" behavior. Say,
    for example, the motherboard oscillator was off by 1%, because the
    register in the oscillator chip didn't have a "near enough" value.
    (Some motherboards actually cheat, and the manufacturer bumps the
    clock a tiny bit, to win at benchmarking done in reviews.)
    By means of calibration against the RTC, or by the usage of NTP,
    any deviation from the correct average value could be handled.
    By dribbling out corrections at regular intervals, you get correct
    clock time to the nearest second.

    It's a matter of finding out, what is either causing the interrupts
    to get lost, to not be counted, for too many interrupts to show up,
    and so on. Doing the DPC Lat test, is intended to show potentially
    how much time any SMM routine might be stealing. But other than that,
    it would be pretty hard to say anything definitive about the
    interrupt arrival rate itself. I've tried in the past, to access
    any performance counters that might be available on a per
    interrupt basis, but I was not successful at doing that.
    With the systems we used to build years ago, we had extensive
    interrupt monitoring capabilities, mainly because we fouled
    up interrupts so often :) About all I can get from a modern
    system, is a total count of interrupts from all sources.

    If the problem would "stand still", NTP could fix it. It's situations
    where the problem occurs sporadically, that prevents NTP from fixing
    it. In some cases, it's the usage of a single application, that upsets
    clock time. Since the clock tick interrupt, by design, has a very
    high priority, normally userland shouldn't be able to do anything
    like that (upset the clock tick).

    On a Linux system, the kernel boot line has some options, so you
    can change the source used for timekeeping. For example, I've
    tried in the past, to get Linux OSes running better within VPC2007,
    and my notes mention this as one of my test cases. I switched to PIT,
    because the virtual environment didn't happen to have HPET. The
    problem I had, was audio sound playing at the wrong sampling rate
    ("chipmunks" problem). I found the best solution, was to build
    up a Gentoo system, eliminate PulseAudio, and the problem was
    mostly eliminated. Sound still didn't work quite as well as
    a Windows OS in the same virtual environment, but it was getting
    a lot closer. The VGA mode selected here, is used to reduce the
    time it takes the OS to shutdown at the end of a VPC2007 session.
    There is no equivalent effect, on real hardware, so if I was booting
    this OS native, I wouldn't need the VGA option.

    GRUB_CMDLINE_LINUX_DEFAULT="vga=786 noacpi clocksource=pit"

    Paul
     
    Paul, Apr 8, 2011
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.