Stacked DRG PoRep
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg
This section describes Stacked DRG PoRep (SDR), the specific Proof-of-Replication (PoRep) used in Filecoin. In this construction, the prover encodes the original data into a replica and commits to it. An offline PoRep proves that the commitment to the replica is a valid commitment of the encoded original data.
SDR has been presented by Ben Fisch at EUROCRYPT19.
Introduction
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg
Background on Proof-of-Replication
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.background-on-proof-of-replication
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.background-on-proof-of-replication
Proof-of-Replication enables a prover P to convince a verifier V that P is storing a replica R, a physically independent copy of some data D, unique to P. The scheme is defined by a tuple of polynomial time algorithms (Setup, Replication, Prove, Verify). The assumption is that generation of a replica after Replicate must be difficult (if not impossible) to generate.
-
Setup: On setup, the public parameters of the proving systems are set.
-
Replicate: On replication, either a party or both (depending on the scheme, in our case the prover only!) generate a unique permutation of the original data D, which we call replica R.
-
Prove: On receiving a challenge, the prover must generate a proof that it is in possession of the replica and that it was derived from data D. The prover must only be able to respond to the challenge successfully if it is in possession of the replica, since would be difficult (if not impossible) to generate a replica that can be used to generate the proof at this stage
-
Verify: On receiving the proof, the verifier checks the validity of the proof and accepts or rejects the proof.
Time-bounded Proof-of-Replication
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.time-bounded-proof-of-replication
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.time-bounded-proof-of-replication
Timing assumption. Time-bounded Proof-of-Replication are constructions of PoRep with timing assumptions. The assumption is that generation of the replica (hence the Replication) takes some time t that is substantially larger than the time it takes to produce a proof (hence time(Prove)) and the round-trip time (RTT) for sending a challenge and receiving a proof.
Distinguishing Malicious provers. A malicious prover that does not have R, must obtain it (or generate it), before the Prove step. A verifier can distinguish an honest prover from a malicious prover, since the malicious one will take too long to answer the challenge. A verifier will reject if receiving the proof from the prover takes longer than a timeout (bounded between proving time and replication time).
Background on Stacked DRG PoRep
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.background-on-stacked-drg-porep
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.background-on-stacked-drg-porep
Stacked DRG PoRep (SDR) is a specific Proof-of-Replication construction that we use in Filecoin. SDR has been designed by Ben Fisch at EUROCRYPT19. At a high level, SDR ensures that the Replicate step is a slow non-parallelizable sequential process by using a special type of graph called Depth Robust Graphs (DRG).
Encoding using DRGs. A key is generated by sequentially labeling nodes in the graph such that each label depends on the labels of its parents. The depth robustness property of these graphs ensure that the sequential labeling steps are not parallelizable. The final labels are used as a key to encode the original data.
TODO: This probably needs a more thorough rewrite.
Stacked DRGs. SDR builds on the above by stacking DRG graphs into LAYERS
layers. Each layer is connected to the previous by a Bipartite Expander Graph. The combination of DRGs and expander graphs guarantee the security property of PoRep. As before, the key produced by the final layer is used to encode the original data, yielding the replica.
Generating SDR proofs. Given the following public parameters:
ReplicaId
is a unique replica identifier (see the Filecoin Proofs spec for details).CommD
is the Merkle tree root hash of the input data to the first layer.CommC
is the Merkle tree root hash of the SDR column commitments.CommRLast
is the Merkle tree root hash of the replica.CommR
is the on-chain commitment to the replica, dervied as the hash of the concatenation ofCommC
andCommRLast
.
An SDR proof proves that some data whose committment is CommD
has been used to run a Replicate
algorithm and generated some data. CommR
is the on-chain commitment to both the replicated data and to intermediate stages required to prove Replicate
was performed correctly.
An SDR proof consists of a set of challenged DRG nodes for each layer, a set of parent nodes for each challenged node and a Merkle tree inclusion proof for each node provided. The verifier can then verify the correct labeling of each node and that the nodes given were consistent with the prover’s commitments.
Making proofs succinct with SNARKs: The proof size in SDR is too large for blockchain usage (~100MB TODO: check this), mostly due to the large amount of Merkle tree inclusion proofs required to achieve security. We use SNARKs to generate a proof of knowledge of a correct SDR proof. In other words, we implement the SDR proof verification algorithm in an arithmetic circuit and use SNARKs to prove that it was evaluated correctly.
The SNARK circuit proves that given Merkle roots CommD
, and CommR
, the prover correctly derived the labels at each layer and correctly performed the final encoding.
PoRep in Filecoin
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.porep-in-filecoin
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.porep-in-filecoin
Proof-of-Replication proves that a Storage Miner is dedicating unique storage for each sector. Filecoin Storage Miners collect new clients’ data in a sector, run a slow encoding process (called Seal
) and generate a proof (SealProof
) that the encoding was generated correctly.
In Filecoin, PoRep provides two guarantees: (1) space-hardness: Storage Miners cannot lie about the amount of space they are dedicating to Filecoin in order to gain more power in the consensus; (2) replication: Storage Miners are dedicating unique storage for each copy of their clients data.
Glossary:
- sector: a fixed-size block of data of
SECTOR_SIZE
bytes which generally contains clients’ data. - unsealed sector: a concrete representation (on disk or in memory) of a sector’s that follows the “Storage Format” described in Client Data Processing (currently
paddedfr32v1
is the required default). - sealed sector: a concrete representation (on disk or in memory) of the unique replica generated by
Seal
from an unsealed sector. A sector contains one or more pieces. - piece: a block of data of at most
SECTOR_SIZE
bytes which is generally a client’s file or part of.
Stacked DRG Construction
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg
Public Parameters
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.public-parameters
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.public-parameters
The following public parameters are used in the Stacked DRG Replication and Proof Generation algorithms:
TODO:
The Appendix should explain why we picked those valuesJust interpolate a table of the Orient parameters and reconcile naming.
name | type | description | value |
---|---|---|---|
SECTOR_SIZE |
uint |
Number of nodes in the DRG in bytes | 68,719,476,736 |
LAYERS |
uint |
Number of Depth Robust Graph stacked layers. | 10 |
BASE_DEGREE |
uint |
In-Degree of each Depth Robust Graph. | 6 |
EXPANSION_DEGREE |
uint |
Degree of each Bipartite Expander Graph to extend dependencies between layers. | 8 |
GRAPH_SEED |
uint |
Seed used for random number generation in baseParents . |
TODO |
NODE_SIZE |
uint |
Size of each node in bytes. | 32B |
The following constants are computed from the public parameters:
name | type | description | computation | value |
---|---|---|---|---|
PARENTS_COUNT |
uint |
Total number of parent nodes | EXPANSION_DEGREE + BASE_DEGREE |
13 |
GRAPH_SIZE |
uint |
Number of nodes in the graph | SECTOR_SIZE / NODE_SIZE |
2,147,483,648 |
TREE_DEPTH |
uint |
Height of the Merkle Tree of a sector | LOG_2(GRAPH_SIZE) |
31 |
The following additional public parameters are required:
TAPER
:Float
: Fraction of each layer’s challenges by which to reduce next-lowest layer’s challenge count.TAPER_LAYERS
:uint
: Number of layers which should be tapered. FIXME: update for current tapering.Data
is a byte array initialized to the content of unsealed sector and will be mutated in-place by the replication process.
Hash Functions
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.hash-functions
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.hash-functions
We have describe three hash functions:
name | description | size of input | size of output | construction |
---|---|---|---|---|
KDFHash |
Hash function used as a KDF to derive the key used to label a single node. | TODO | 32B |
SHA256 |
ColumnHash |
Hash function used to hash the labeled leaves of each layer (see SDR Column Commitments). | TODO | 32B |
JubjubPedersen |
RepCompress |
Collision Resistant Hash function used for the Merkle tree. | 2 x 32B + integer height |
32B |
JubjubPedersen |
RepHash |
Balanced binary Merkle tree based used to generate commitments to sealed sectors, unsealed sectors, piece commitments, and intermediate parts of the Proof-of-Replication. | TODO | 32B |
Uses RepCompress |
RepHash
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.rephash
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.rephash
RepHash
is a vector commitment used to generate commitments to sealed sectors, unsealed sectors, piece commitments and intermediate stepds of the Proof-of-Replication. Filecoin uses a balanced binary Merkle tree for RepHash
. The leaves of the Merkle tree are pairs of adjacent nodes.
RepHash
inputs MUST respect a valid Storage Format. (TODO: What does this mean?)
Stacked DRG Graph
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.stacked-drg-graph
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.stacked-drg-graph
The slow sequential encoding required is enforced by the depth robustness property of the SDR graph.
Encoding with SDR: The data from a sector (of size SECTOR_SIZE
) is divided in NODE_SIZE
nodes (for a total of GRAPH_SIZE
nodes) and arranged in a directed acyclic graph. The structure of the graph is used to label the nodes sequentially to generate a key with which to encode the original data: in order to label a node, its parents must be labeled (see the “Layer Labeling” section below). We repeat this process for LAYERS
layers, where the input to a next layer is the output of the previous one.
Generating the SDR graph: The SDR graph is divided in LAYERS
layers. Each layer is a directed acyclic graph and it combines a Depth Robust Graph (DRG) and a Bipartite Expander graph.
TODO:
this isn’t quite right.
We provide an algorithm (SDR
) which computes the parents of a node. In high level, the parents of a node are computed by combining two algorithms: some parents (BASE_DEGREE
of them) are computed via the BucketSample
algorithm extended with a direct ordering of nodes, others (EXPANSION_DEGREE
of them) are computed via the Chung
algorithm.
SDRGraph
: SDR Graph algorithm
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.sdrgraph-sdr-graph-algorithm
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.sdrgraph-sdr-graph-algorithm
Overview: Compute the DRG and Bipartite Expander parents using respectively BucketSample
and ChungExpander
.
Inputs
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.inputs
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.inputs
name | description | Type |
---|---|---|
node |
The node for which the parents are being computed | uint |
layer |
The layer of the SDR graph | uint |
Outputs
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.outputs
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.outputs
name | description | Type |
---|---|---|
parents |
The ordered parents of node node on layer layer |
[PARENTS_COUNT]uint |
Algorithm
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.algorithm
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.algorithm
-
If
layer = 1
:- Compute
drgParents = BucketSample(node)
- Set
parents
to bedrgParents
.
- Compute
-
If
layer > 1
:- Compute
drgParents = BucketSample(node)
- Compute
expanderParents = ChungExpander(node)
- Set
parents
to be the concatenation ofdrgParents
andexpanderParents
- Compute
We provide below a more succinct representation of the algorithm:
TODO: Reference to code in filproofs/algorithms.go — or restructure all this.
Time-space tradeoff
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.time-space-tradeoff
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.time-space-tradeoff
Computing the parents using both BucketSample
and ChungExpander
for every layer can be an expensive operation, however, this can be avoided by caching the parents.
BucketSample
: Depth Robust Graphs algorithm
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.bucketsample-depth-robust-graphs-algorithm
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.bucketsample-depth-robust-graphs-algorithm
This section describes how to compute the “base parents” of the SDR graph, which is the equivalent of computing the parents of a Depth Robust Graph.
The properties of DRG graphs guarantee that a sector has been encoded with a slow, non-parallelizable process. We use the BucketSample
algorithm that is based on DRSample (
ABH17) and described in
FBGB18 and generates a directed acyclic graph of in-degree BASE_DEGREE
.
BucketSample
DRG graphs are random graphs that can be deterministically generated from a seed; different seeds lead with high probability to different graphs. In SDR, we use the same seed GRAPH_SEED
for each layer of the SDR graph such that they are all based on the same underlying DRG graph.
The parents of any node can be locally computed without computing the entire graph. We call the parents of a node calculated in this way base parents.
BucketSample
extends BucketSampleInner
to include the node’s ‘immediate predecessor’. Each node except the first in a
DRG generated by BucketSample
has the node whose index is one less than its own as a parent. This ensures that
visiting nodes whose indexes are sequential will result in a graph traversal in topological order.
ChungExpander
: Bipartite Expander Graphs
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.chungexpander-bipartite-expander-graphs
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.chungexpander-bipartite-expander-graphs
TODO: explain why we link nodes in the current layer
Each node in layers other than the first has EXPANSION_DEGREE
parents generated via the ChungExpander
algorithm. Note that the indexes returned refer to labels from the previous layer. TODO: Make this all clearer with explicit notation.
TODO: link to relevant filproofs/algorithms.go
Time-Space tradeoff
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.time-space-tradeoff-1
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.time-space-tradeoff-1
Computing these parents can be expensive (especially due to the hashing required by the Feistel algorithm). A miner can trade this computation by storing the expansion parents.
Feistel construction
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.feistel-construction
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.feistel-construction
We use three rounds of Feistel
to generate a permutation to compute the parents of the Bipartite Expander graph.
TODO: link to filproofs/feistel
Replication
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg
The Replication phase turns an unsealed sector into a sealed sector by first generating a key, then using the key to encode the orignal data.
Before running the Replicate
algorithm, the prover must ensure that the sector is correctly formatted with a valid “Storage Format” (currently paddedfr32v1
is the required default).
TODO: inputs are missing
The Replication Algorithm proceeds as follows:
- Calculate
ReplicaID
usingHash
(SHA256):
ReplicaID
is a 32-byte array constructed by hashing the concatenation of the following values:
ProverId
is a 32-byte array uniquely identifying a prover.SectorNumber
is an unsigned 64-bit integer in little-endian encoding represented as an 8-byte array.RandomSeed
is a 32-byte array of randomness extracted from the chain.CommD
is the Merkle root obtained by performingRepHash
on the original data represented inpaddedfr32v1
.
ReplicaID := Hash(ProverID || SectorNumber || RandomSeed || CommD)
- Perform
RepHash
onData
to yieldCommD
andTreeD
:
CommD, TreeD = RepHash(data)
Layer Labeling
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.layer-labeling
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.layer-labeling
TODO: DefineGraph
. We need to decide if this is an object we’ll explicitly define or if its properties (e.g.,GRAPH_SIZE
) are just part of the replication parameters and all the functions just refer to the same graphs being manipulated across the entire replication process. (At the moment I’ve avoided defining aGraph
structure as in other specs I didn’t see any object methods, just standalone functions.)
TODO: link to filproofs/algorithms
Proof Generation
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg
Overview:
- Challenge Derivation
- Proof Generation
- Circuit Proof Generation
TODO: write a single algorithm which includes the spec below
Challenge Generation
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.challenge-generation
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.challenge-generation
TODO: Link to filproofs/algorithms
Calculate LAYER_CHALLENGES : [LAYERS]uint
: Number of challenges per layer. (This will be passed to the SDR circuit proof.)
Derive challenges for each layer (call DeriveChallenges()
).
Witness Generation
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.witness-generation
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.witness-generation
TODO: Link to filproofs/algorithms
Layer Challenge Counts
-
State
-
Theory Audit
-
Edit this section
-
section-algorithms.porep-old.stacked_drg.layer-challenge-counts
-
State
-
Theory Audit
- Edit this section
-
section-algorithms.porep-old.stacked_drg.layer-challenge-counts
TODO: we should just list current parameters and show this as a calculation for correctness, this should not mandatory to implement.