Motherboard Forums


Reply
Thread Tools Display Modes

Skybuck's Universal Code 4 (memory efficient)

 
 





















Skybuck Flying
Guest
Posts: n/a

 
      05-26-2007, 08:01 AM


Skybuck's Universal Code 4:

The first version of Skybuck's Universal Code describes the basic idea but
it's memory inefficient for large ammounts of data/information.

For each data bit a marker bit was necessary.

Also the interleaving of bits could be cumbersome for software and maybe
even hardware

Thus I present to you a somewhat more efficient universal code, which is
still variable bit-based but more efficient, though I am not sure if I like
the bitness of it... maybe I'll think up a byte version for
intel system which are octal/byte based

For now however the basic concept for version 4 is:

Length1 + Length2 + Data

This represent 3 fields stuck together in the information stream.

The first field describes the length of the second field. The second field
describes the length of the data.

The first field is like a marker for the second field, and the second field
is length for the data in binary encoding.

So I could also be described as:

Marker + Length + Data

But this could confuse people with version 1... where the term marker ment a
full marker...

So maybe describe it better as:

LengthMarker + Length + Data

or

LengthOfLength + Length + Data.


The first field consists out of zero's with a 1 terminator to describe the
length of the second field, the second field uses binary encoding, and the
data can be anything.


An example:

Suppose the following data has to be encoded/stored:

1234567890123
Hello World !


13 characters.

Length = 13 + 1 for null terminator = 14.

Bits needed to encoded 14 in binary: 0x1 + 1x2 + 1x4 + 1x8 = 4 bits.

Thus also 4 bits needed for the marker.

The information stream as stored as follows:

length marker:
0001

length:
0111

data:
xxxxxxetc.

00010111xxxxxxetc


the length field describes the data length in bits to make it bit flexible.


So now software can read the information stream as follows:

// read length marker:

LengthMarkerCount := 0;
repeat
if ReadBit( Bit ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
break; // error.
end;
until Bit = 1;

// allocate length field.

SetLengthInBits( Length, LengthMarkerCount );

// read length:
LengthCount := 0;
repeat
if ReadBit( Bit ) then
begin
SetBit( Length, LengthCount, Bit );
LengthCount := LengthCount + 1;
end else
begin
break; // error, or retry, inform, etc.
end;
until LengthCount = LengthMarkerCount;

// allocate data field in bits.

SetLengthInBits( Data, Length );

// read data
DataCount := 0;
repeat
if ReadBit( Bit ) then
begin
SetBit( Data, DataCount, Bit );
DataCount := DataCount + 1;
end else
begin
break; // error or retry.
end;
until DataCount = Length;


// ofcourse the example would not be complete without a write example how to
encode information so let's do that too:

For example take the string as example.

DataCountInBits := Length( 'Hello World !'+#0 ) * 8;
LengthCountInBits := BitNeededFor( DataCountInBits ); // use log functions
to determine bits needed to encode the value in binary.
LengthMarkerCountInBits := LengthCountInBits; // same

// now first write the marker:

if LengthMarkerCountInBits = 0 then exit; // strange error which should not
happen.

LengthMarkerCount := 0;

repeat
if LengthMarkerCount = LengthMarkerCountInBits then
begin
// write terminating bit.
if WriteBit( 1 ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
// write leading bit(s)
if WriteBit( 0 ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end;

until LengthMarkerCount = LengthMarkerCountInBits;

// now write the binary length field.

if LengthCountInBits = 0 then exit; // strange error which should not
happen.

LengthCount := 0;

repeat
if LengthCount = LengthCountInBits then
begin
if GetBit( Length, LengthCount, Bit ) then
begin
// write binary length bit.
if WriteBit( Bit ) then
begin
LengthCount := LengthCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
break; // memory error.
end;
end;

until LengthCount = LengthCountInBits;

// now write the data field:

if DataCountInBits = 0 then exit; // strange error which should not happen.

DataCount := 0;

repeat
if DataCount = DataCountInBits then
begin
if GetBit( Data, DataCount, Bit ) then
begin
// write data bit.
if WriteBit( Bit ) then
begin
DataCount := DataCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
break; // memory error.
end;
end;

until DataCount = DataCountInBits;


There, that should give you an idea of how to do it. Ofcourse this is only
concept code... not really tested it... but should work nicely.

Some random and some crappy thoughts about using it in software development
source codes (can be skipped for reading):

(*
I am not sure if I like working with bits like this.. so as I wrote
before... I might think up a byte version of the universal code... which
might be more easy to program and handle...

not sure yet... working with bits can be straight forward to... but then
again sometimes not... depends on the source/classes etc... don't know
really, haven't tried yet...

Extra classess, and methods definetly needed... would be handy to have some
code which can simply encoded any type of data really... like a pointer and
a size or so... probably
easy to make... but also everything has to be "serialized" and
"deserialized" which is a bit weird and such... also offsets go out the
window <- can almost no longer be used
since you don't know where you were or going to etc.. or maybe it's still
possible... probably not though...

First you have to serialize the previous fields before you can add the next
fields... that could be a drawback... but then again.. it doesn't really
have to be like that if you work
with seperate fields in memory... and then finally those fields can be
encoded and decoded seperately or they could be copied if they already
serialized / deserialized... so this kinda
creates a problem... since many possible design decissions possible...

Ofcourse it would be handy if the software works directly on universal
codes... this would make the software scale better automatically without
much or even any code changes necessary.

Working with encoded fields should therefore be recommanded with special
algorithm which take adventage of it, to create more flexiblity... and
finally for network and storage purposes
the fields could simply be copied... the serialize them after each other as
a sort of simple compression...

since on current hardware the fields will have some unused bits because the
memory works with bytes... etc.
*)

Though this code is complexer I like it better since it's more memory
efficient and actually also more speed efficient... less marker bits to
process.

Even for small numbers like 32 bits or 64 bits it's not that bad: for
example:

64 bits require 1x1,1x2,1x4,1x8,1x16,1x32,1x64 = 7 bits.

So that's a total of 7 markers bits, 7 length bits, 64 data bits = 78 bits.

Another extreme example, the opposite a large file:

20 Gigabyte = 20 * 8 = 160 gigabits.

8 bits for marker, 8 bits for length field, 160 gigabits = 160 gigabits plus
a little.

Compare that with 320 gigabits for version 1.

So a major efficiency improvement and still very probably just as flexible !


Well a bit long and messy post, but I am lazy nowadays =D

But thinking about starting to use this stuff to make my software more
future-ready =DDD

BYYYYYYYYYEEEEEE,
BYEEEEEEEEEEEEEEEEEEEEEE,
Skybuck.

P.S.: My you enjoy universal codes, and maybe the Skybuck's Power be with
you =D


 
Reply With Quote
 
MooseFET
Guest
Posts: n/a

 
      05-26-2007, 02:49 PM
On May 26, 12:01 am, "Skybuck Flying" <s...@hotmail.com> wrote:
[...]
> Length1 + Length2 + Data
>
> This represent 3 fields stuck together in the information stream.
>
> The first field describes the length of the second field. The second field
> describes the length of the data.
>
> The first field is like a marker for the second field, and the second field
> is length for the data in binary encoding.



If you go back and look in the responces to the earlier posts you made
on this subject, you will find a couple far more efficient and less
limiting suggestions were made back then. They would have allowed you
to transmit 8 bit data in far less than the 14 bits your method is
suggesting.

This method you are now posting is nothing new. I was something like
it implemented in the 1980s.

BTW: You can do better if you leave the LSBs off the numbers.



 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      05-26-2007, 03:57 PM

"MooseFET" <> wrote in message
news: ps.com...
> On May 26, 12:01 am, "Skybuck Flying" <s...@hotmail.com> wrote:
> [...]
>> Length1 + Length2 + Data
>>
>> This represent 3 fields stuck together in the information stream.
>>
>> The first field describes the length of the second field. The second
>> field
>> describes the length of the data.
>>
>> The first field is like a marker for the second field, and the second
>> field
>> is length for the data in binary encoding.

>
>
> If you go back and look in the responces to the earlier posts you made
> on this subject, you will find a couple far more efficient and less
> limiting suggestions were made back then.


I read all reactions and I can not remember seeing such a suggestion.

> They would have allowed you
> to transmit 8 bit data in far less than the 14 bits your method is
> suggesting.


It's funny to see you did not get the method correct or maybe you just made
a sloppy mistake.

For 8 bits the encoding will be:

8 bit for data.

Prefixed with:

The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.

Prefixed with:

The marker for the length field which is: 4 bits as well.

For a total of 8 + 8 = 16 bits

> This method you are now posting is nothing new. I was something like
> it implemented in the 1980s.


Maybe, maybe not.

You made same claims without backing them up with evidence

> BTW: You can do better if you leave the LSBs off the numbers.


I do not understand what you are hinting at.

Leaving out the least significant bit leads to inprecision ?

I don't want that !

Thanks anyway, funny to see america wake up and see my post around this time


Bye,
Skybuck.


 
Reply With Quote
 
MooseFET
Guest
Posts: n/a

 
      05-26-2007, 08:40 PM
On May 26, 7:57 am, "Skybuck Flying" <s...@hotmail.com> wrote:
> "MooseFET" <kensm...@rahul.net> wrote in message

[....]
> > On May 26, 12:01 am, "Skybuck Flying" <s...@hotmail.com> wrote:
> > [...]
> >> Length1 + Length2 + Data

>
> >> This represent 3 fields stuck together in the information stream.

>
> >> The first field describes the length of the second field. The second
> >> field
> >> describes the length of the data.

>
> >> The first field is like a marker for the second field, and the second
> >> field
> >> is length for the data in binary encoding.

>
> > If you go back and look in the responces to the earlier posts you made
> > on this subject, you will find a couple far more efficient and less
> > limiting suggestions were made back then.

>
> I read all reactions and I can not remember seeing such a suggestion.
>
> > They would have allowed you
> > to transmit 8 bit data in far less than the 14 bits your method is
> > suggesting.

>
> It's funny to see you did not get the method correct or maybe you just made
> a sloppy mistake.


No, it appears I gave your credit for understanding something you
didn't.

>
> For 8 bits the encoding will be:
>
> 8 bit for data.
>
> Prefixed with:
>
> The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.


That is in effiecient. You never need to code for a zero length so
the length field would only need 3 bits.

>
> Prefixed with:
>
> The marker for the length field which is: 4 bits as well.


Again it is not needed to allow for the zero length case so this is
only 3 bits.


> For a total of 8 + 8 = 16 bits


So it is worse than I thought.

>
> > This method you are now posting is nothing new. I was something like
> > it implemented in the 1980s.

>
> Maybe, maybe not.
>
> You made same claims without backing them up with evidence


Oh well, you can decide not to believe me if you like. Its up to you.

>
> > BTW: You can do better if you leave the LSBs off the numbers.

>
> I do not understand what you are hinting at.
>
> Leaving out the least significant bit leads to inprecision >?


There is no need to have an LSB on the length specifier. The number
can have an extra MSB in it for the case when needed to round it up to
the next length that can be specified. This reduces the length field
and the length of the length field.

It is still quite an inefficient way to do it. It is also limited in
length. There are far better options. Take a look at Huffman codes
you will see how they handle using differing numbers of bits. Any of
several systems based on those ideas would work better for you.





 
Reply With Quote
 
MooseFET
Guest
Posts: n/a

 
      05-26-2007, 09:19 PM
On May 26, 7:57 am, "Skybuck Flying" <s...@hotmail.com> wrote:
> "MooseFET" <kensm...@rahul.net> wrote in message

[....]
> > On May 26, 12:01 am, "Skybuck Flying" <s...@hotmail.com> wrote:
> > [...]
> >> Length1 + Length2 + Data

>
> >> This represent 3 fields stuck together in the information stream.

>
> >> The first field describes the length of the second field. The second
> >> field
> >> describes the length of the data.

>
> >> The first field is like a marker for the second field, and the second
> >> field
> >> is length for the data in binary encoding.

>
> > If you go back and look in the responces to the earlier posts you made
> > on this subject, you will find a couple far more efficient and less
> > limiting suggestions were made back then.

>
> I read all reactions and I can not remember seeing such a suggestion.
>
> > They would have allowed you
> > to transmit 8 bit data in far less than the 14 bits your method is
> > suggesting.

>
> It's funny to see you did not get the method correct or maybe you just made
> a sloppy mistake.


No, it appears I gave your credit for understanding something you
didn't.

>
> For 8 bits the encoding will be:
>
> 8 bit for data.
>
> Prefixed with:
>
> The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.


That is in effiecient. You never need to code for a zero length so
the length field would only need 3 bits.

>
> Prefixed with:
>
> The marker for the length field which is: 4 bits as well.


Again it is not needed to allow for the zero length case so this is
only 3 bits.


> For a total of 8 + 8 = 16 bits


So it is worse than I thought.

>
> > This method you are now posting is nothing new. I was something like
> > it implemented in the 1980s.

>
> Maybe, maybe not.
>
> You made same claims without backing them up with evidence


Oh well, you can decide not to believe me if you like. Its up to you.

>
> > BTW: You can do better if you leave the LSBs off the numbers.

>
> I do not understand what you are hinting at.
>
> Leaving out the least significant bit leads to inprecision >?


There is no need to have an LSB on the length specifier. The number
can have an extra MSB in it for the case when needed to round it up to
the next length that can be specified. This reduces the length field
and the length of the length field.

It is still quite an inefficient way to do it. It is also limited in
length. There are far better options. Take a look at Huffman codes.



>
> I don't want that !
>
> Thanks anyway, funny to see america wake up and see my post around this time
>
>
> Bye,
> Skybuck.



 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      05-27-2007, 03:18 AM
Ok,

Now I understand what you're saying.

Marker:

1

Length

0

Data

Missing.

Encoding an empty universal code could be handy for software which uses
classess to implement universal codes and simply uses this encoding to
describe "no data".

You mention other systems, are they universal codes, meaning they describe
their own length and scalable indefinetly ?

I know static huffman does not scale, it's static afterall

Nor will dynamic huffman probably. How will you encode the new character ?

Again you do not seem to crash the powerfull concept of a universal code =D
or you mistakenly think other systems are universal codes

Nor will I modify the universal code just to save one bit and create a
mismatch in binary system... cause that what your suggestion would mean.

0 would no longer represent zero... but 1 <- not gonna happen way to
confusing and I am lazy

Thanks anyway, it did show you have nice ideas

Bye,
Skybuck.


 
Reply With Quote
 
MooseFET
Guest
Posts: n/a

 
      05-27-2007, 04:07 AM
On May 26, 7:18 pm, "Skybuck Flying" <s...@hotmail.com> wrote:
> Ok,
>
> Now I understand what you're saying.
>
> Marker:
>
> 1
>
> Length
>
> 0
>
> Data
>
> Missing.


You've used more than 2 bits to say "no data" and 3 bits to say 1 or
zero.

Better coding based on the idea behind Huffman :

00 - Nodata
01 - True
10 - False
11 - All other cases

Now we have the three simple cases done now we can decide how
most efficient path goes from here. I'm lazy so I will only state a
possible path that stays more efficent than your suggestion:

At 2 bits, your method would take over 4 bits so we can step directly
to the 4 bit groups and always be better than your way:

1100 - 0 plus the following bit as a pair
1101 - 1 plus the following bit as a pair
1110 - Three bits follow
1111 - All other cases

This method can be extended without limit and will always be better.

The last time you posted this, someone pointed out a better pattern
than this but this one is good enough to prove the point.







 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      05-27-2007, 06:08 AM

"MooseFET" <> wrote in message
news: oups.com...
> On May 26, 7:18 pm, "Skybuck Flying" <s...@hotmail.com> wrote:
>> Ok,
>>
>> Now I understand what you're saying.
>>
>> Marker:
>>
>> 1
>>
>> Length
>>
>> 0
>>
>> Data
>>
>> Missing.

>
> You've used more than 2 bits to say "no data" and 3 bits to say 1 or
> zero.
>
> Better coding based on the idea behind Huffman :
>
> 00 - Nodata
> 01 - True
> 10 - False
> 11 - All other cases


Seems like you need 2 bits to describe true.

While my encoding only need 1 bit, sure there is overhead, but that overhead
is not part of the original data.

The original data can have any encoding that's the beauty of Skybuck's
universal code, existing encodings can be integrated into the universal
encoding.

No other special encodings are needed to represent data.

The data can stay in it's original form.

Only the marker and the length fields are prefixed to it.

Alternatively new encodings for the data section can be used as well for
example to represent larger numbers for arbitrary precision math.

> Now we have the three simple cases done now we can decide how
> most efficient path goes from here. I'm lazy so I will only state a
> possible path that stays more efficent than your suggestion:
>
> At 2 bits, your method would take over 4 bits so we can step directly
> to the 4 bit groups and always be better than your way:
>
> 1100 - 0 plus the following bit as a pair
> 1101 - 1 plus the following bit as a pair
> 1110 - Three bits follow
> 1111 - All other cases
>
> This method can be extended without limit and will always be better.
>
> The last time you posted this, someone pointed out a better pattern
> than this but this one is good enough to prove the point.


I do not completely understand why you are making a big deal out of trying
to come up with a better encoding for the 1 bit of data case...

Also it seems you are trying to use a new encoding... that's not too
shabby... it can probably be integrated into the universal code data field
as well lol... but not really necessary.. that would just be double.

So for now I do not agree that "it's better" or even "more efficient" and I
definelty do not see how this would be "more flexible" or "self describing
length ?".

That's a good question: where is the length field in your scenerio ?

Seems like you need the universal code after all

The data of the universal code can use any other encoding... isn't that
beautifull ? =D

Also what did you describe: mangled encoding ?

Or did you describe a new encoding for one of the fields: Marker, Length,
Data ?

I like keeping those fields nicely seperated

So I definetly do not want a mangled encoding. <- bleh .

Please clearify or try again and then come again =D

Bye,
Skybuck.


 
Reply With Quote
 
Skybuck Flying
Guest
Posts: n/a

 
      05-27-2007, 06:19 AM
Also,

Your reasoning must be inheritly flawed.

Huffman compression is based on the idea that some characters or data will
occur more often then others.

Binary encoding is the most efficient "general" encoding where no occurence
statistics are known before hand.

Then there is also dynamic huffman compression which learns the occurence as
it views the data... however this requires describing new characters...

And a very important question is how do you describe/encode the new
characters ?

Fixed bits for new characters/data is ofcourse flawed, because it's not
variable bit, thus not an universal code

Some without completely understand what your little tiny encoding thingy was
about.. it's probably based on flawed reasoning in the first place

:::

Bye,
Skybuck

P.S.: I like my universal encoding to be efficient for every possible
case/scenerio =D not optimized for certain scenerio's and it definetly
should self describe the length of itself/scale/be flexible ofcourse
otherwise it wouldn't be a universal code ! DUH =D

"MooseFET" <> wrote in message
news: oups.com...
> On May 26, 7:18 pm, "Skybuck Flying" <s...@hotmail.com> wrote:
>> Ok,
>>
>> Now I understand what you're saying.
>>
>> Marker:
>>
>> 1
>>
>> Length
>>
>> 0
>>
>> Data
>>
>> Missing.

>
> You've used more than 2 bits to say "no data" and 3 bits to say 1 or
> zero.
>
> Better coding based on the idea behind Huffman :
>
> 00 - Nodata
> 01 - True
> 10 - False
> 11 - All other cases
>
> Now we have the three simple cases done now we can decide how
> most efficient path goes from here. I'm lazy so I will only state a
> possible path that stays more efficent than your suggestion:
>
> At 2 bits, your method would take over 4 bits so we can step directly
> to the 4 bit groups and always be better than your way:
>
> 1100 - 0 plus the following bit as a pair
> 1101 - 1 plus the following bit as a pair
> 1110 - Three bits follow
> 1111 - All other cases
>
> This method can be extended without limit and will always be better.
>
> The last time you posted this, someone pointed out a better pattern
> than this but this one is good enough to prove the point.



 
Reply With Quote
 
Rudy Velthuis
Guest
Posts: n/a

 
      05-27-2007, 11:09 AM
27.05.2007 07:19:03, Skybuck Flying wrote:

> Also,
>
> Your reasoning must be inheritly flawed.


Yours is. Read up on Huffman coding and be amazed.
--
Rudy Velthuis http://rvelthuis.de

"Sometimes a scream is better than a thesis."
-- Ralph Waldo Emerson (1803-1882)
 
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Skybuck's Universal Code 4,Delphi Demonstration Program Skybuck Flying Asus 8 05-28-2007 11:30 PM
Re: Skybuck's Universal Code 4 (memory efficient) Skybuck Flying Asus 0 05-28-2007 08:40 PM
Re: Skybuck's Universal Code 4 (memory efficient) Skybuck Flying Asus 1 05-28-2007 04:04 PM
Re: Skybuck's Universal Code 4 (memory efficient) Skybuck Flying Asus 0 05-28-2007 07:48 AM
P5P800 SE PCI Slots and shared interrupts John Asus 4 02-24-2007 11:38 AM


All times are GMT. The time now is 06:06 AM.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43