Stacked DRG PoRep

Stacked DRG PoRep

This section describes Stacked DRG PoRep (SDR), the specific Proof-of-Replication (PoRep) used in Filecoin. In this construction, the prover encodes the original data into a replica and commits to it. An offline PoRep proves that the commitment to the replica is a valid commitment of the encoded original data.

SDR has been presented by Ben Fisch at EUROCRYPT19.

Introduction

Background on Proof-of-Replication

Proof-of-Replication enables a prover P to convince a verifier V that P is storing a replica R, a physically independent copy of some data D, unique to P. The scheme is defined by a tuple of polynomial time algorithms (Setup, Replication, Prove, Verify). The assumption is that generation of a replica after Replicate must be difficult (if not impossible) to generate.

  • Setup: On setup, the public parameters of the proving systems are set.

  • Replicate: On replication, either a party or both (depending on the scheme, in our case the prover only!) generate a unique permutation of the original data D, which we call replica R.

  • Prove: On receiving a challenge, the prover must generate a proof that it is in possession of the replica and that it was derived from data D. The prover must only be able to respond to the challenge successfully if it is in possession of the replica, since would be difficult (if not impossible) to generate a replica that can be used to generate the proof at this stage

  • Verify: On receiving the proof, the verifier checks the validity of the proof and accepts or rejects the proof.

Time-bounded Proof-of-Replication

Timing assumption. Time-bounded Proof-of-Replication are constructions of PoRep with timing assumptions. The assumption is that generation of the replica (hence the Replication) takes some time t that is substantially larger than the time it takes to produce a proof (hence time(Prove)) and the round-trip time (RTT) for sending a challenge and receiving a proof.

Distinguishing Malicious provers. A malicious prover that does not have R, must obtain it (or generate it), before the Prove step. A verifier can distinguish an honest prover from a malicious prover, since the malicious one will take too long to answer the challenge. A verifier will reject if receiving the proof from the prover takes longer than a timeout (bounded between proving time and replication time).

Background on Stacked DRG PoRep

Stacked DRG PoRep (SDR) is a specific Proof-of-Replication construction that we use in Filecoin. SDR has been designed by Ben Fisch at EUROCRYPT19. At a high level, SDR ensures that the Replicate step is a slow non-parallelizable sequential process by using a special type of graph called Depth Robust Graphs (DRG).

Encoding using DRGs. A key is generated by sequentially labeling nodes in the graph such that each label depends on the labels of its parents. The depth robustness property of these graphs ensure that the sequential labeling steps are not parallelizable. The final labels are used as a key to encode the original data.

TODO: This probably needs a more thorough rewrite.

Stacked DRGs. SDR builds on the above by stacking DRG graphs into LAYERS layers. Each layer is connected to the previous by a Bipartite Expander Graph. The combination of DRGs and expander graphs guarantee the security property of PoRep. As before, the key produced by the final layer is used to encode the original data, yielding the replica.

Generating SDR proofs. Given the following public parameters:

  • ReplicaId is a unique replica identifier (see the Filecoin Proofs spec for details).
  • CommD is the Merkle tree root hash of the input data to the first layer.
  • CommC is the Merkle tree root hash of the SDR column commitments.
  • CommRLast is the Merkle tree root hash of the replica.
  • CommR is the on-chain commitment to the replica, dervied as the hash of the concatenation of CommC and CommRLast.

An SDR proof proves that some data whose committment is CommD has been used to run a Replicate algorithm and generated some data. CommR is the on-chain commitment to both the replicated data and to intermediate stages required to prove Replicate was performed correctly.

An SDR proof consists of a set of challenged DRG nodes for each layer, a set of parent nodes for each challenged node and a Merkle tree inclusion proof for each node provided. The verifier can then verify the correct labeling of each node and that the nodes given were consistent with the prover’s commitments.

Making proofs succinct with SNARKs: The proof size in SDR is too large for blockchain usage (~100MB TODO: check this), mostly due to the large amount of Merkle tree inclusion proofs required to achieve security. We use SNARKs to generate a proof of knowledge of a correct SDR proof. In other words, we implement the SDR proof verification algorithm in an arithmetic circuit and use SNARKs to prove that it was evaluated correctly.

The SNARK circuit proves that given Merkle roots CommD, and CommR, the prover correctly derived the labels at each layer and correctly performed the final encoding.

PoRep in Filecoin

Proof-of-Replication proves that a Storage Miner is dedicating unique storage for each sector. Filecoin Storage Miners collect new clients’ data in a sector, run a slow encoding process (called Seal) and generate a proof (SealProof) that the encoding was generated correctly.

In Filecoin, PoRep provides two guarantees: (1) space-hardness: Storage Miners cannot lie about the amount of space they are dedicating to Filecoin in order to gain more power in the consensus; (2) replication: Storage Miners are dedicating unique storage for each copy of their clients data.

Glossary:

  • sector: a fixed-size block of data of SECTOR_SIZE bytes which generally contains clients’ data.
  • unsealed sector: a concrete representation (on disk or in memory) of a sector’s that follows the “Storage Format” described in Client Data Processing (currently paddedfr32v1 is the required default).
  • sealed sector: a concrete representation (on disk or in memory) of the unique replica generated by Seal from an unsealed sector. A sector contains one or more pieces.
  • piece: a block of data of at most SECTOR_SIZE bytes which is generally a client’s file or part of.

Stacked DRG Construction

Public Parameters

The following public parameters are used in the Stacked DRG Replication and Proof Generation algorithms:

TODO:
The Appendix should explain why we picked those values

Just interpolate a table of the Orient parameters and reconcile naming.

name type description value
SECTOR_SIZE uint Number of nodes in the DRG in bytes 68,719,476,736
LAYERS uint Number of Depth Robust Graph stacked layers. 10
BASE_DEGREE uint In-Degree of each Depth Robust Graph. 6
EXPANSION_DEGREE uint Degree of each Bipartite Expander Graph to extend dependencies between layers. 8
GRAPH_SEED uint Seed used for random number generation in baseParents. TODO
NODE_SIZE uint Size of each node in bytes. 32B

The following constants are computed from the public parameters:

name type description computation value
PARENTS_COUNT uint Total number of parent nodes EXPANSION_DEGREE + BASE_DEGREE 13
GRAPH_SIZE uint Number of nodes in the graph SECTOR_SIZE / NODE_SIZE 2,147,483,648
TREE_DEPTH uint Height of the Merkle Tree of a sector LOG_2(GRAPH_SIZE) 31

The following additional public parameters are required:

  • TAPER : Float: Fraction of each layer’s challenges by which to reduce next-lowest layer’s challenge count.
  • TAPER_LAYERS: uint: Number of layers which should be tapered. FIXME: update for current tapering. Data is a byte array initialized to the content of unsealed sector and will be mutated in-place by the replication process.

Hash Functions

We have describe three hash functions:

name description size of input size of output construction
KDFHash Hash function used as a KDF to derive the key used to label a single node. TODO 32B SHA256
ColumnHash Hash function used to hash the labeled leaves of each layer (see SDR Column Commitments). TODO 32B JubjubPedersen
RepCompress Collision Resistant Hash function used for the Merkle tree. 2 x 32B + integer height 32B JubjubPedersen
RepHash Balanced binary Merkle tree based used to generate commitments to sealed sectors, unsealed sectors, piece commitments, and intermediate parts of the Proof-of-Replication. TODO 32B Uses RepCompress
RepHash

RepHash is a vector commitment used to generate commitments to sealed sectors, unsealed sectors, piece commitments and intermediate stepds of the Proof-of-Replication. Filecoin uses a balanced binary Merkle tree for RepHash. The leaves of the Merkle tree are pairs of adjacent nodes.

RepHash inputs MUST respect a valid Storage Format. (TODO: What does this mean?)

Stacked DRG Graph

The slow sequential encoding required is enforced by the depth robustness property of the SDR graph.

Encoding with SDR: The data from a sector (of size SECTOR_SIZE) is divided in NODE_SIZE nodes (for a total of GRAPH_SIZE nodes) and arranged in a directed acyclic graph. The structure of the graph is used to label the nodes sequentially to generate a key with which to encode the original data: in order to label a node, its parents must be labeled (see the “Layer Labeling” section below). We repeat this process for LAYERS layers, where the input to a next layer is the output of the previous one.

Generating the SDR graph: The SDR graph is divided in LAYERS layers. Each layer is a directed acyclic graph and it combines a Depth Robust Graph (DRG) and a Bipartite Expander graph.

TODO:
this isn’t quite right.

We provide an algorithm (SDR) which computes the parents of a node. In high level, the parents of a node are computed by combining two algorithms: some parents (BASE_DEGREE of them) are computed via the BucketSample algorithm extended with a direct ordering of nodes, others (EXPANSION_DEGREE of them) are computed via the Chung algorithm.

SDRGraph: SDR Graph algorithm

Overview: Compute the DRG and Bipartite Expander parents using respectively BucketSample and ChungExpander.

Inputs
name description Type
node The node for which the parents are being computed uint
layer The layer of the SDR graph uint
Outputs
name description Type
parents The ordered parents of node node on layer layer [PARENTS_COUNT]uint
Algorithm
  • If layer = 1:

    • Compute drgParents = BucketSample(node)
    • Set parents to be drgParents.
  • If layer > 1:

    • Compute drgParents = BucketSample(node)
    • Compute expanderParents = ChungExpander(node)
    • Set parents to be the concatenation of drgParents and expanderParents

We provide below a more succinct representation of the algorithm:

TODO: Reference to code in filproofs/algorithms.go — or restructure all this.
Time-space tradeoff

Computing the parents using both BucketSample and ChungExpander for every layer can be an expensive operation, however, this can be avoided by caching the parents.

BucketSample: Depth Robust Graphs algorithm

This section describes how to compute the “base parents” of the SDR graph, which is the equivalent of computing the parents of a Depth Robust Graph.

The properties of DRG graphs guarantee that a sector has been encoded with a slow, non-parallelizable process. We use the BucketSample algorithm that is based on DRSample ( ABH17) and described in FBGB18 and generates a directed acyclic graph of in-degree BASE_DEGREE.

BucketSample DRG graphs are random graphs that can be deterministically generated from a seed; different seeds lead with high probability to different graphs. In SDR, we use the same seed GRAPH_SEED for each layer of the SDR graph such that they are all based on the same underlying DRG graph.

The parents of any node can be locally computed without computing the entire graph. We call the parents of a node calculated in this way base parents.

BucketSample extends BucketSampleInner to include the node’s ‘immediate predecessor’. Each node except the first in a DRG generated by BucketSample has the node whose index is one less than its own as a parent. This ensures that visiting nodes whose indexes are sequential will result in a graph traversal in topological order.

ChungExpander: Bipartite Expander Graphs
TODO: explain why we link nodes in the current layer

Each node in layers other than the first has EXPANSION_DEGREE parents generated via the ChungExpander algorithm. Note that the indexes returned refer to labels from the previous layer. TODO: Make this all clearer with explicit notation.

TODO: link to relevant filproofs/algorithms.go
Time-Space tradeoff

Computing these parents can be expensive (especially due to the hashing required by the Feistel algorithm). A miner can trade this computation by storing the expansion parents.

Feistel construction

We use three rounds of Feistel to generate a permutation to compute the parents of the Bipartite Expander graph.

TODO: link to filproofs/feistel

Replication

The Replication phase turns an unsealed sector into a sealed sector by first generating a key, then using the key to encode the orignal data.

Before running the Replicate algorithm, the prover must ensure that the sector is correctly formatted with a valid “Storage Format” (currently paddedfr32v1 is the required default).

TODO: inputs are missing

The Replication Algorithm proceeds as follows:

  • Calculate ReplicaID using Hash (SHA256):

ReplicaID is a 32-byte array constructed by hashing the concatenation of the following values:

  • ProverId is a 32-byte array uniquely identifying a prover.
  • SectorNumber is an unsigned 64-bit integer in little-endian encoding represented as an 8-byte array.
  • RandomSeed is a 32-byte array of randomness extracted from the chain.
  • CommD is the Merkle root obtained by performing RepHash on the original data represented in paddedfr32v1.
ReplicaID := Hash(ProverID || SectorNumber || RandomSeed || CommD)
  • Perform RepHash on Data to yield CommD and TreeD:
CommD, TreeD = RepHash(data)

Layer Labeling

TODO: Define Graph. We need to decide if this is an object we’ll explicitly define or if its properties (e.g., GRAPH_SIZE) are just part of the replication parameters and all the functions just refer to the same graphs being manipulated across the entire replication process. (At the moment I’ve avoided defining a Graph structure as in other specs I didn’t see any object methods, just standalone functions.)
TODO: link to filproofs/algorithms

Proof Generation

Overview:

  • Challenge Derivation
  • Proof Generation
  • Circuit Proof Generation
TODO: write a single algorithm which includes the spec below

Challenge Generation

TODO: Link to filproofs/algorithms

Calculate LAYER_CHALLENGES : [LAYERS]uint: Number of challenges per layer. (This will be passed to the SDR circuit proof.)

Derive challenges for each layer (call DeriveChallenges()).

Witness Generation

TODO: Link to filproofs/algorithms

Layer Challenge Counts

TODO: we should just list current parameters and show this as a calculation for correctness, this should not mandatory to implement.