1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

A possible solution for using BWT (especially wheeler transform) in video decoders/playback.

Discussion in 'Nvidia' started by Skybuck Flying, Aug 31, 2009.

  1. Hello,

    I just had an idea how BWT, especially the wheeler transform algorithm could
    possibly be successfully used in video decoders.

    I haven't thought the idea through but I want to publish it just in case
    it's a good idea:

    The idea is as follows:

    1. Usually video's have multiple "key" frames followed by some
    "intermediate" frames.

    2. BWT could be started in parallel on multiple key frames.

    However this would require a little change in for example Microsoft's Video
    for Windows... which simply always start at just a single frame.

    Video for Windows would need to be turned into a "parallel" version where
    decoders would be fed multiple key frames... or let's call them "key

    So that a form of "video key-stream pipelining" occurs.

    This way multiple processors/cores/threads could be working on decoding
    different sections of the video.

    The drawback is probably:

    3. Requiring more bandwidth to store the results until actually needed.

    Why is this a solution you might be wondering and what was the problem in
    the first place ?

    I will try to explain...

    The problem with wheeler-transform is that it's a pointer-chasing problem
    for the decoder.
    The processor needs to fetch a pointer/index, wait for it... then fetch the
    next pointer/index, and the next and then next etc... this slows down the
    processor a lot... it looks like so:

    The cpu is not fast enough to decode this in time... the gpu can't even do
    this kind of decoding (or it's very hard/slow) so that's also not an option.

    However what if the cpu could start working way in advance before the frame
    is actually needed ?! this would give the cpu more time to work on it...

    only problem is with intermediate frames... they also need a lot of time...
    maybe limit
    bwt to key frames only or so ?!? ;) :)

    Well anyway... by letting multiple cpu's work on more key frames the decode
    speed should improve somewhat...

    Let's think of it as interleaving video key-stream decoding.


    CPU1: Frame 1 to 8
    CPU2: Frame 9 to 16
    CPU3: Frame 17 to 24
    CPU4: Frame 25 to 32

    This way 4 cpu's are being used instead of just one.

    Now the problem is ofcourse... can cpu1 decode 8 frames in the needed time

    The answer is ofcourse: no funny enough... because if it could then only one
    cpu would
    be necessary.

    However look what happens at cpu2... by the time cpu1 is done... cpu2 will
    be done, cpu3 will
    be done... cpu4 will be done...

    So those frames will all be done... So my conclusion so far is:

    Only the startup takes a while... but after that it should run smooth.

    Skybuck Flying, Aug 31, 2009
    1. Advertisements

  2. I also just had a possible "hack" idea for Video for Windows decoders to
    turn them into parallel decoders.

    The idea is too simply take the input from windows and immediatly return
    with a black screen... thereby fooling windows into believing that the input
    was actually decoded into output.

    But instead the decoder simply uses this trick to quickly fetch multiple
    frames from windows...

    Until the decoder has for example 30 frames... then it can start working on
    the frames in parallel...

    Then when windows ask for frame 31 to 60... in reality frames 1 to 30 are

    This would screw the bars in the "video controls" a little bit... but that
    could be acceptable LOL ;) :)

    Also another drawback is that the last 30 frames might not be played ?!?
    Hmm... :)

    Could also be acceptable for now LOL :)

    Skybuck Flying, Aug 31, 2009
    1. Advertisements

  3. However one last problem is with the audio... it would be off by 30 frames
    that could be a problem... This would require a audio decoder which would do
    the same to compensate for it... ouch ! ;)

    Skybuck Flying, Aug 31, 2009
  4. However I am not sure how windows would react to an immediate black screen
    being returned...

    Maybe windows would simply wait 1/30 of a second before requesting the next

    In that case this trick would ofcourse not work...

    (Maybe windows buffers some frames but I doubt it ;))

    Video for Windows is old... maybe other frameworks offer more possibilities
    for parallel/multi-threaded video decoders ?

    Skybuck Flying, Aug 31, 2009
  5. Skybuck Flying

    stefanbanev Guest

    Even BWT is one of the fastest context modeler it is the most trivial
    one and probably one of the most inefficient in term of its entropy
    output. Approaches similar to CALIC generalized for 3D case would be
    more efficient in this regard and frankly its context locality makes
    mach more sense for 2D/3D scalar data than unbounded BWT context which
    really works great for textual data where an exact contexts match is
    common but it is not the case for 2D/3D image data. However if you may
    come up with BWT idea for "fuzzy" contexts it may work very well for
    image compression and if you may generalize it for context-symmetric
    2D/3D cases it would be a monumental achievement ;o) I'm not aware
    about such development in public domain though.

    stefanbanev, Sep 2, 2009
  6. Another idea is to move as much processing as possible to the GPU... and
    leave the Wheeler running on the CPU... combining this with other
    compression algorithms might just give enough playback speed to be

    Skybuck Flying, Sep 3, 2009
  7. I never heard of CALIC before.

    I did find a document about it:


    And I also found some object oriented calic source code which seems quite


    (That posting contains the link to source code which is here:)


    (The dude in the posting says that there are even better compression
    algorithms out there... let me know if you find anything... since I didn't
    find much on those links... source code is required as well as prove ! ;))

    The document about CALIC only shows grey images why is that ? Does CALIC
    only work well for grey images ? I would assume it would work well for
    colored images as well ? ;)

    Also what would it's speed be for video... that's important question as well

    I will take a look at it some time and compare it to bwt just to see how
    calic performs ;)

    Skybuck Flying, Sep 3, 2009
  8. Skybuck Flying

    stefanbanev Guest

    Well, the paradigm better/worse is very data-dependent for data
    compression since universal compressor belongs to domain of
    "perpetuum mobile". For some kind of data some algorithms work better
    for another others are better.
    Look, I referenced to CALIC not as to some compression panacea; it
    just utilizes very short pixel-contexts what makes more sense for
    image compression then BWT. In fact in published form, in my opinion,
    it is a weak algorithm but in the context of discussion it was a good
    reference to make the point. Any published compression algorithm is
    just an illustration of some ideas but really advanced implementations
    are very-very far away from easy to digest published version. So, it
    is a waste of time trying to adapt something published as it is to get
    an outstanding result unless the goal is just to generate other

    stefanbanev, Sep 3, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.