Data Structures
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures
RLE+ Bitset Encoding
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.rle-bitset-encoding
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.rle-bitset-encoding
RLE+ is a lossless compression format based on RLE. Its primary goal is to reduce the size in the case of many individual bits, where RLE breaks down quickly, while keeping the same level of compression for large sets of contiugous bits.
In tests it has shown to be more compact than RLE itself, as well as Concise and Roaring.
Format
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.format
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.format
The format consists of a header, followed by a series of blocks, of which there are three different types.
The format can be expressed as the following BNF grammar.
<encoding> ::= <header> <blocks>
<header> ::= <version> <bit>
<version> ::= "00"
<blocks> ::= <block> <blocks> | ""
<block> ::= <block_single> | <block_short> | <block_long>
<block_single> ::= "1"
<block_short> ::= "01" <bit> <bit> <bit> <bit>
<block_long> ::= "00" <unsigned_varint>
<bit> ::= "0" | "1"
An <unsigned_varint>
is defined as specified
here.
Blocks
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.blocks
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.blocks
The blocks represent how many bits, of the current bit type there are. As 0
and 1
alternate in a bit vector
the inital bit, which is stored in the header, is enough to determine if a length is currently referencing
a set of 0
s, or 1
s.
Block Single
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.block-single
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.block-single
If the running length of the current bit is only 1
, it is encoded as a single set bit.
Block Short
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.block-short
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.block-short
If the running length is less than 16
, it can be encoded into up to four bits, which a short block
represents. The length is encoded into a 4 bits, and prefixed with 01
, to indicate a short block.
Block Long
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.block-long
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.block-long
If the running length is 16
or larger, it is encoded into a varint, and then prefixed with 00
to indicate
a long block.
Note: The encoding is unique, so no matter which algorithm for encoding is used, it should produce the same encoding, given the same input.
Bit Numbering
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.bit-numbering
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.bit-numbering
For Filecoin, byte arrays representing RLE+ bitstreams are encoded using LSB 0 bit numbering.
HAMT
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.hamt
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.hamt
See the draft IPLD hash map spec for details on implementing the HAMT used for the global state tree map and throughout the actor code.
Other Considerations
-
State
reliable
-
Theory Audit
n/a
-
Edit this section
-
section-appendix.data_structures.other-considerations
-
State
reliable
-
Theory Audit
n/a
- Edit this section
-
section-appendix.data_structures.other-considerations
- The maximum size of an Object should be 1MB (2^20 bytes). Objects larger than this are invalid.