Skybuck's Universal Code 4:
The first version of Skybuck's Universal Code describes the basic idea but
it's memory inefficient for large ammounts of data/information.
For each data bit a marker bit was necessary.
Also the interleaving of bits could be cumbersome for software and maybe
even hardware
Thus I present to you a somewhat more efficient universal code, which is
still variable bit-based but more efficient, though I am not sure if I like
the bitness of it... maybe I'll think up a byte version for
intel system which are octal/byte based
For now however the basic concept for version 4 is:
Length1 + Length2 + Data
This represent 3 fields stuck together in the information stream.
The first field describes the length of the second field. The second field
describes the length of the data.
The first field is like a marker for the second field, and the second field
is length for the data in binary encoding.
So I could also be described as:
Marker + Length + Data
But this could confuse people with version 1... where the term marker ment a
full marker...
So maybe describe it better as:
LengthMarker + Length + Data
or
LengthOfLength + Length + Data.
The first field consists out of zero's with a 1 terminator to describe the
length of the second field, the second field uses binary encoding, and the
data can be anything.
An example:
Suppose the following data has to be encoded/stored:
1234567890123
Hello World !
13 characters.
Length = 13 + 1 for null terminator = 14.
Bits needed to encoded 14 in binary: 0x1 + 1x2 + 1x4 + 1x8 = 4 bits.
Thus also 4 bits needed for the marker.
The information stream as stored as follows:
length marker:
0001
length:
0111
data:
xxxxxxetc.
00010111xxxxxxetc
the length field describes the data length in bits to make it bit flexible.
So now software can read the information stream as follows:
// read length marker:
LengthMarkerCount := 0;
repeat
if ReadBit( Bit ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
break; // error.
end;
until Bit = 1;
// allocate length field.
SetLengthInBits( Length, LengthMarkerCount );
// read length:
LengthCount := 0;
repeat
if ReadBit( Bit ) then
begin
SetBit( Length, LengthCount, Bit );
LengthCount := LengthCount + 1;
end else
begin
break; // error, or retry, inform, etc.
end;
until LengthCount = LengthMarkerCount;
// allocate data field in bits.
SetLengthInBits( Data, Length );
// read data
DataCount := 0;
repeat
if ReadBit( Bit ) then
begin
SetBit( Data, DataCount, Bit );
DataCount := DataCount + 1;
end else
begin
break; // error or retry.
end;
until DataCount = Length;
// ofcourse the example would not be complete without a write example how to
encode information so let's do that too:
For example take the string as example.
DataCountInBits := Length( 'Hello World !'+#0 ) * 8;
LengthCountInBits := BitNeededFor( DataCountInBits ); // use log functions
to determine bits needed to encode the value in binary.
LengthMarkerCountInBits := LengthCountInBits; // same
// now first write the marker:
if LengthMarkerCountInBits = 0 then exit; // strange error which should not
happen.
LengthMarkerCount := 0;
repeat
if LengthMarkerCount = LengthMarkerCountInBits then
begin
// write terminating bit.
if WriteBit( 1 ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
// write leading bit(s)
if WriteBit( 0 ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end;
until LengthMarkerCount = LengthMarkerCountInBits;
// now write the binary length field.
if LengthCountInBits = 0 then exit; // strange error which should not
happen.
LengthCount := 0;
repeat
if LengthCount = LengthCountInBits then
begin
if GetBit( Length, LengthCount, Bit ) then
begin
// write binary length bit.
if WriteBit( Bit ) then
begin
LengthCount := LengthCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
break; // memory error.
end;
end;
until LengthCount = LengthCountInBits;
// now write the data field:
if DataCountInBits = 0 then exit; // strange error which should not happen.
DataCount := 0;
repeat
if DataCount = DataCountInBits then
begin
if GetBit( Data, DataCount, Bit ) then
begin
// write data bit.
if WriteBit( Bit ) then
begin
DataCount := DataCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
break; // memory error.
end;
end;
until DataCount = DataCountInBits;
There, that should give you an idea of how to do it. Ofcourse this is only
concept code... not really tested it... but should work nicely.
Some random and some crappy thoughts about using it in software development
source codes (can be skipped for reading):
(*
I am not sure if I like working with bits like this.. so as I wrote
before... I might think up a byte version of the universal code... which
might be more easy to program and handle...
not sure yet... working with bits can be straight forward to... but then
again sometimes not... depends on the source/classes etc... don't know
really, haven't tried yet...
Extra classess, and methods definetly needed... would be handy to have some
code which can simply encoded any type of data really... like a pointer and
a size or so... probably
easy to make... but also everything has to be "serialized" and
"deserialized" which is a bit weird and such... also offsets go out the
window

<- can almost no longer be used
since you don't know where you were or going to etc.. or maybe it's still
possible... probably not though...
First you have to serialize the previous fields before you can add the next
fields... that could be a drawback... but then again.. it doesn't really
have to be like that if you work
with seperate fields in memory... and then finally those fields can be
encoded and decoded seperately or they could be copied if they already
serialized / deserialized... so this kinda
creates a problem... since many possible design decissions possible...
Ofcourse it would be handy if the software works directly on universal
codes... this would make the software scale better automatically without
much or even any code changes necessary.
Working with encoded fields should therefore be recommanded with special
algorithm which take adventage of it, to create more flexiblity... and
finally for network and storage purposes
the fields could simply be copied... the serialize them after each other as
a sort of simple compression...
since on current hardware the fields will have some unused bits because the
memory works with bytes... etc.
*)
Though this code is complexer I like it better since it's more memory
efficient and actually also more speed efficient... less marker bits to
process.
Even for small numbers like 32 bits or 64 bits it's not that bad: for
example:
64 bits require 1x1,1x2,1x4,1x8,1x16,1x32,1x64 = 7 bits.
So that's a total of 7 markers bits, 7 length bits, 64 data bits = 78 bits.
Another extreme example, the opposite a large file:
20 Gigabyte = 20 * 8 = 160 gigabits.
8 bits for marker, 8 bits for length field, 160 gigabits = 160 gigabits plus
a little.
Compare that with 320 gigabits for version 1.
So a major efficiency improvement and still very probably just as flexible !
Well a bit long and messy post, but I am lazy nowadays =D
But thinking about starting to use this stuff to make my software more
future-ready =DDD
BYYYYYYYYYEEEEEE,
BYEEEEEEEEEEEEEEEEEEEEEE,
Skybuck.
P.S.: My you enjoy universal codes, and maybe the Skybuck's Power be with
you =D