Kadena: The First Real Private Blockchain

Note:  This blog will be broken into 3 sections in order to explain the evolution of how the consensus algorithm achieved by Raft was attempted to be fixed by Tangaroa and Juno and then finally solved by its distant relative Kadena.  The 3 sections include:
  1. Introduction and the Raft Consensus Algorithm
  2. Kadena’s predecessors – Tangaroa and Juno
  3. Kadena’s blockchain – ScalableBFT, Pact, and Pervasive Determinism

 Part 1: Introduction and the Raft Consensus Algorithm

Foreword

This series of posts will cover Kadena’s Blockchain. It uses ScalableBFT to offer high-performance (8k-12k transactions per second) with full replication and distribution at previously impossible scales (the capacity for more than 500 participating nodes). This, along with the multi layered security model and incremental hashing allow for a truly robust blockchain. Based on Raft and Juno, Kadena embeds a full smart contract language (Pact) into its blockchain that can be run as either public (plain text) or private (Double-Ratchet encrypted) transactions. It is a huge step forward in the blockchain space, possibly representing a new-generation of blockchain technology entirely by it’s introduction of the idea of “Pervasive Determinism.”

Similar to Bitcoin, Kadena’s blockchain is tightly integrated; understanding what it is capable of and what these capabilities imply requires covering a considerable amount of ground. As such, I’ve broken the post into 3 parts: Introduction & Raft, Kadena’s predecessors – Tangaroa & Juno, and Kadena’s Blockchain – ScalableBFT, Pact and Pervasive Determinism.

 

Introduction

The history behind Kadena is an interesting case study in the new field of blockchain consensus algorithms and distributed computing.  Kadena is a “distant relative” of the Raft consensus algorithm. The Raft consensus mechanism was followed by Tangaroa (A Byzantine Fault Tolerant (BFT) Raft) and the JP Morgan project Juno (A fork of Tangaroa), neither of which are longer under active development.  JP Morgan’s new blockchain Quorum is much different from Juno and uses a fusion of ideas from sidechains and Ethereum; public smart contracts are allowed on the blockchain in addition to private contracts which are represented as encrypted hashes and replicated via side-channels.   Kadena is the “next generation Juno.” It uses a new, but related, protocol called ScalableBFT that was spawned from the open source code of the Juno project and was built by the two key developers who built Juno.  Before diving deep into Kadena a brief history and description of Raft and the predecessors to Kadena need to be discussed.

 

Raft Consensus

The Raft consensus algorithm is a single leader-based system for managing a replicated log.  It uses a replicated state machine architecture and produces a result equivalent to Paxos but is structurally different. Keeping the replicated log consistent is the job of the consensus algorithm.  In this model, the leader does most of the work because it is issuing all log updates, validating transactions, and generally managing the cluster. Raft consensus guarantees a strict ordering and replication of messages.  It does not care what the message contain.

A new leader is elected using randomized timeouts, which are triggered if a follower receives no communication from the leader before the timeout fires.  These are called heartbeats.  If the follower receives no communication over this time period, it becomes a candidate and initiates an election.  A candidate that receives votes from a majority of the full cluster (nodes in the network) becomes the new leader.  Leaders typically operate until they fail.  The heartbeats are sent out to make sure the leader is still there, if nothing is received a new election takes place.

The following stages are how Raft comes to consensus:

 

1.     A cluster of Raft node servers is started with every node launching as a Follower. Eventually, one node will timeout, become a candidate, gain a majority of the votes and become the leader.

2.     Each node stores a log containing commands. It is the Leader’s job to accept new commands, strictly order the commands in its log, replicate its log to its followers, and finally inform followers when to commit logs that they have replicated. The consensus algorithm thus ensures that each server’s logs are the same order.

3.     Logs are “committed” when they have been replicated to a majority of nodes. The leader gathers the replication count and, upon a majority being seen, commits its own new log entries and informs its followers to do the same.

4.     Upon “commit” the command in each log entry is evaluated by a State machines. Because Raft is indifferent to the body of the command, any state machine can process committed entries. Moreover, consensus assures that command execution always takes place in the same order as the commands come from the Log which is strictly ordered.

5.     State machines will remain consistent as long as command executions are deterministic.

6.     When a client sends a command to one of the servers, that server will either forward the command to the leader or is the leader. The leader collects the new command, assigns it a Log Index, encapsulates it in a Log Entry, and adds the command to the uncommited portion of it’s log.

7.     Whenever the leader has uncommitted entries, it replicates this portion of the log to its followers. When the leader is informed of successful replication by a majority of the cluster, it commits the new entries and orders its followers to do the same.

8.     Whenever a new log entry is committed consensus about this entry has been reached. It is then evaluated by the (each server’s) state machine.

9.     From this point on, Raft is finished and implementers can decide how to handle responses; replying to the client or waiting for the client to query for the result.

Responses to the client are generally asynchronous.

The Raft consensus protocol is just that – a consensus algorithm. It does not have a notion of and is, by default, fully open to any client issuing commands. The only participation restriction it makes is on what nodes exist at a given time. Moreover, the leader has absolute authority over the cluster and orders the followers to replicate and commit. It does not assume Byzantine attacks, it needs to only handle crash faults, because nodes are assumed altruistic.

 

Kadena: The First Real Private Blockchain

 Part 2: Kadena’s predecessors – Tangaroa and Juno

Foreword

This series of posts will cover Kadena’s Blockchain. It uses ScalableBFT to offer high-performance (8k-12k transactions per second) with full replication and distribution at previously impossible scales (the capacity for more than 500 participating nodes). This, along with the multi layered security model and incremental hashing allow for a truly robust blockchain. Based on Raft and Juno, Kadena embeds a full smart contract language (Pact) into its blockchain that can be run as either public (plain text) or private (Double-Ratchet encrypted) transactions. It is a huge step forward in the blockchain space, possibly representing a new-generation of blockchain technology entirely by it’s introduction of the idea of “Pervasive Determinism.”

Similar to Bitcoin, Kadena’s blockchain is tightly integrated; understanding what it is capable of and what these capabilities imply requires covering a considerable amount of ground. As such, I’ve broken the post into 3 parts: Introduction & Raft, Kadena’s predecessors – Tangaroa & Juno, and Kadena’s Blockchain – ScalableBFT, Pact and Pervasive Determinism.

 

Tangaroa: The first step towards a BFT Raft

Tangaroa is Byzantine Fault Tolerant (BFT) variant of the Raft consensus algorithm inspired by the original Raft algorithm and the Practical Byzantine Fault Tolerance (PBFT) algorithm.  Byzantine fault tolerance refers to a class failures caused by malicious nodes attacking the network.  If some of the nodes go down it is imperative for the network to continue running without stopping. In standard Raft, you need to replicate a log entry to a majority of nodes in the cluster before committing it. For BFT consensus algorithms, including Tangaroa, the required cluster size is at least 2f + 1, where f is the number of failures you want to tolerate (including both crashed nodes and compromised nodes). Consensus is achieved by a majority vote of the cluster; if f <= 3 then cluster size = 7 and non-byzantine nodes = 4. Some BFT protocols can even require 3f+1.

A Byzantine Leader can decide to arbitrarily increase the commit index of other nodes before log entries have been sufficiently replicated, thus causing safety violations when nodes fail later on. Tangaroa shifts the commit responsibility away from the leader, and every node can verify for itself that a log entry has been safely replicated to a quorum of nodes and that this quorum agrees on an ordering.

Tangaroa allows clients that interrupt the current leadership if it fails to make progress, in the same way that other BFT Consensus algorithms allow to client to behave as a trusted oracle to depose certain nodes. This allows Tangaroa to prevent Byzantine leaders from starving the system but is very trusting of the client.

 

Leader Election and Stages

Tangaroa uses Raft as the foundation for consensus; thus there is a single leader.  In Tangaroa, as in Raft, each node is in one of the three states: leader, follower, or candidate. Similar to Raft, every node starts as a follower, one of which will eventually timeout and call an election. The winner of the election serves as the leader for the rest of the term; terms end when a new leader is elected. Sometimes, an election will result in a split vote, and the term will end with no leader; in this case, a follower will again time out (timeouts are reset when a vote is cast or an election is called) and start the voting process again.

To begin an election, a follower increments its current term and sends RequestVote (RV) Remote Procedure Call (RPCs) in parallel to each of the other nodes in the cluster asking for their vote. The RPCs Tangaroa uses are similarl to Raft’s RPCs with the exception that every RPC is signed and validated via PPK signatures. RPCs allow for a data exchange between different computers residing on a network and the signatures allow for receiving nodes to verify which node sent the RPC in addition to allowing any node to forward any other node’s RPC at any time.

When a Tangaroa node receives a RV RPC with a valid signature, it grants a vote immediately only if it does not currently have a leader (only occurs at startup). Otherwise, it begins the process that Tangaroa calls a “LazyVote.” Lazy voting’s purpose is to protect non-Byzantine Followers from electing a new leader when the leader is not faulty; without lazy voting, a byzantine node could trigger repeated elections at any time and starve the system. When a new RV is received by a follower, it saves the RV and waits for all of the following conditions to be met:

a)    The follower’s election timeout triggers fires before it handles a heartbeat from its current leader. If a heartbeat is received, the Lazy Vote is cleared.

b)    The RV’s new term is greater than its current term.

c)     The request sender is an eligible candidate (valid PPK signature and the client hasn’t banned the node).

d)    The node receiving the RV has not voted for another leader for the proposed term.

e)    The candidate shares a log prefix with the node that contains all committed entries. A node always rejects the request if it is still receiving heartbeat messages from the current leader, and it ignores the RequestVote RPC if the proposed term has already begun.

If a RequestVote is valid and for a new term, and the candidate has a sufficiently up to date log, but the recipient is still receiving heartbeats from the current leader, it will record its vote locally, and then send a vote response if the node itself undergoes an election timeout or hears from a client that the current leader is unresponsive. Under lazy voting, a node does not grant a vote to a candidate unless it believes the current leader is faulty.  This prevents nodes that start unnecessary elections from gaining the requisite votes to become leader and starve the system.

Nodes wait until they believe an election needs to occur before ever casting a vote. Once a vote is sent, the node will update its term number. It does not assume that the node it voted for won the election however, and it will still reject AppendEntries (AE) RPCs from the candidate if none of them contain a set of votes proving the candidate won the election. AE’s serve the dual purpose of heartbeats and carriers of new log entries that need replication. The candidate continues in the candidate state until one of the three things happens:

(a)  It wins the election by receiving a majority vote from the cluster. A candidate must save these votes—RequestVoteResponse (RVR) RPCs—for future distribution.

(b)  Another node establishes itself as a leader

(c)   A period of time goes by with no winner (i.e., it experiences another election timeout)

A candidate that wins the election then promotes itself to the leader state and sends an AE heartbeat messages that contains the votes that elected it and the updated term number to establish its authority and prevent new elections. The signed votes effectively prevents a byzantine node from arbitrarily promoting itself as the leader of a higher term. Moreover, each follower performs a recount on the aforementioned majority vote, validating and counting each vote the new leader transmitted to independently verify the validity of the election.

 

Governance

Like Raft, Tangaroa uses randomized timeouts to trigger leader elections. The leader of each term periodically sends heartbeat messages (empty AE RPCs) to maintain its authority. If a follower receives no communication from a leader over a randomly chosen period of time, the election timeout, then it becomes a candidate and initiates a new election.  In addition to the spontaneous follower-triggered elections Tangaroa also allows client intervention: when a client observes no progress with a leader for a period of time called the progress timeout, it broadcasts UpdateLeader RPCs to all nodes, telling them to ignore future heartbeats from what the client believes to be the current leader in the current term These followers will ignore heartbeat messages in the current term and time out as though the current leader had failed, starting a new election.

 

Data Received

The data (new commands) come from clients of the Raft cluster, who send requests to the leader. The leader replicates these requests to the cluster, and responds to the client when a quorum is reached in the cluster on that request. What constitutes a "request" is system-dependent.  How data is stored is system-dependent. It's important for state to persist to disk so that nodes can recover and remember information that they have committed to (which nodes they voted for, what log entries they have committed, etc.). The protocol can't work without this.

Tangaroa adds BFT to Raft evolution

Juno

The JP Morgan project Juno is fork of Tangoroa and was a proof of concept that was able to scale Tangaroa to include up to 50 nodes and get transaction speed up to 5000 transactions per second. The JPM team behind Juno saw the potential that a Tangaroa-like approach represented—a high performance private blockchain. They iterated on the idea for a year and open sourced the project in February ‘16; they added a smart contract language, fixed some design mistakes and succeeding is achieving a 10x performance increase.  It allowed for the number of nodes to vote to change while the system was running. Juno allowed for the adding and removing of nodes.  It was also a permissioned distributed system in which all of the nodes in the network were known. 

The stages of the mechanism and the leader election process are the same as Tangaroa (see above.)  Similarly, a transaction is considered live once it is fully replicated and committed to the log.  The leader decides the order of the commands and every node validates.  Each node independently decides when to commit a log entry based on evidence it receives from other nodes.  Every log entry is individually committed and incrementally hashed against the previous entry.  It takes approximately ~5ms for a single log entry to go from leader receiving the entry to full consensus being reached and network latency.

 

Kadena: The First Real Private Blockchain

 Part 3: Kadena’s blockchain – ScalableBFT, Pact, and Pervasive Determinism

Foreword

This series of posts will cover Kadena’s Blockchain. It uses ScalableBFT to offer high-performance (8k-12k transactions per second) with full replication and distribution at previously impossible scales (the capacity for more than 500 participating nodes). This, along with the multi layered security model and incremental hashing allow for a truly robust blockchain. Based on Raft and Juno, Kadena embeds a full smart contract language (Pact) into its blockchain that can be run as either public (plain text) or private (Double-Ratchet encrypted) transactions. It is a huge step forward in the blockchain space, possibly representing a new-generation of blockchain technology entirely by it’s introduction of the idea of “Pervasive Determinism.”

Similar to Bitcoin, Kadena’s blockchain is tightly integrated; understanding what it is capable of and what these capabilities imply requires covering a considerable amount of ground. As such, I’ve broken the post into 3 parts: Introduction & Raft, Kadena’s predecessors – Tangaroa & Juno, and Kadena’s Blockchain – ScalableBFT, Pact and Pervasive Determinism.

 

Cryptography

However different to Raft, each replica in a BFT Raft system (a family of algorithms that include Tangaroa, Juno, and Kadean’s ScalableBFT) computes a cryptographic hash every time it appends a new entry to its log. The hash is computed over the previous hash and the newly appended log entry. A node can sign its last hash to prove that it has replicated the entirety of a log, and other servers can verify this quickly using the signature and the hash.  BFT Raft nodes and clients always sign before sending messages and reject messages that do not include a valid signature.

BFT Rafts use Incremental Hashing enabling nodes to be certain that both the contents and ordering of other node’s logs match their own. Using this knowledge, nodes can independently commit log entries safely because both the contents and ordering of other node’s logs attested to via matching incremental hashes. BFT Rafts uses digital signatures extensively to authenticate messages and verify their integrity.  This prevents a Byzantine leader from modifying the message contents or forging messages and protects the cluster generally from a large number of Byzantine attacks.

 

Consensus

In Raft, a Leader is elected via randomized timeouts that trigger a Follower to propose itself as a Candidate and request votes. ScalableBFT also does this, but in a cryptographically secured way. For instance, if a Leader becomes unreachable, a timeout would trigger a new election but the election process is robust against Byzantine nodes declaring elections. ScalableBFT fixes the issues that Juno and Tangaroa encountered regarding lazy voting.

The Leader’s only unique capabilities are (1) ordering of new transactions prior to replication and (2) replicating new transactions to Follower nodes. From that point on, all nodes independently prove both consensus validity and individual transaction integrity.

The removal of anonymous participation is a design requirement for private blockchains, and this allowed for a high performance BFT Consensus mechanism to replace mining. ScalableBFT’s primary addition to the family of BFT Rafts is the ability to scale into the 1000’s of nodes without decreasing the system’s throughput.

Every transaction is replicated to every node. When a majority of nodes have replicated the transaction, the transaction is committed. Nodes collect and distributed information (incremental hash) about what they have replicated and use this information to independently decide when to commit (>50% of other nodes send them incremental hashes for uncommitted transactions that they agree with.) It basically works by doing a majority vote on what to commit. Committing a transaction doesn’t mean it will be executed, just that it has been permanently replicated by a majority of the cluster. Bad transactions, ones that error or have bad signatures, are replicated as well as consensus’ job is to provide perfect ordered replication. Committing a transaction allows each node to then independently evaluate (parse/decrypt/validate crypto/execute/etc…) each transaction in an identical way. Every transaction gets paired with an output, this can range from “bad crypto” to the output of the smart contract layer (which can also be an error).

Finally, besides the leader replicating new transactions to every node the nodes are more or less independent. Instead of “syncing” they broadcast “I have replicated up to log index N and it has an incremental hash of H” to the cluster and collect this information from other nodes; based on the results from other nodes each node can independently decided if the cluster has replicated past the bar needed to commit (a majority replication for some as of yet uncommitted log index N). Here’s the subtle part – the incremental hash implies replication of all that came before it. If the leader replicates 8k new transactions (which is what it currently does) each node need only distribute and gather evidence for the last transaction of that batch as it implies correct replication of the ones that came before it. Instead of sending 8k messages (one for each transaction) that attest to proper replication nodes only discuss the latest transaction. This is why Kadena needed so much pipelining because the team figured out how to commit 8k transactions at the same speed of committing a single transaction.

ScalableBFT represents a breakthrough in field of BFT consensus as it is the first and only deterministic BFT consensus mechanism that can scale past 100’s of nodes with full replication and encryption.  ScalableBFT also provides a unique security model known as pervasive determinism which provides security not just at the transaction level but at the consensus level as well while encrypting each and every transaction using the Noise Protocol. (talked about below)

 

Kadena Uses Deterministic Consensus

The consensus mechanism is deterministic if the consensus process is fully specified in the protocol and this process does not employ randomness. As was stated above, Raft, for example, uses randomized timeouts to trigger elections when a leader goes down (because the leader can't communicate "I'm about to crash" so instead there's a timeout that trips to prompt a node to check if the leader is down) but the election isn't part of consensus at the transaction level, it is instead a means to finding a node to orchestrate consensus.

ScalableBFT is deterministic and hardened such that:

 

1.     Each node will commit only when they have a majority of the cluster agreeing with them.

2.     The evidence of agreement must be fully auditable at any time.

3.     When lacking evidence of agreement do nothing.

Kadena is specifically designed for permissioned networks, and as such it assumes that certain attacks (like a DoS) are unlikely and are out of it's control. If one were to occur, the system would either lock (all nodes timeout eventually with but an election would never succeed) or sit idle. Once such an event ends, the nodes will come back into consensus and things will get back to normal. However, in a permissioned network administrators would have full control and kill the connection causing the issue.

 

 

Leader Election

Leader election is very similar to Raft in that any node can be elected leader, every node gets one vote per term, and elections are called when the randomized timeout one of the nodes fires (the timer is reset every time a node hears from the leader). The biggest difference is that in Raft a node that gets enough votes assumes leadership whereas in ScalableBFT a node that gets a majority of votes distributes those votes to every other node to demonstrate (in a BFT way) that it has been elected the leader by the cluster. ScalableBFT’s mechanism fixes issues seen in Juno and Tangaroa, like a “Runaway Candidate” where a non-Byzantine node has timed out due to a network partition but, because its Term has been incremented it can’t come back into consensus and instead continues timeout then then increment its term (“Runaway”.)

Raft consensus guarantees a strict ordering and replication of messages; it doesn’t matter what’s in each message and can range from random numbers to ciphertext to plain-text smart contracts. Kadena leverages the log layer as a messaging service when running in an encrypted context; much like Signal can run Noise protocol encryption over SMS ScalableBFT runs Noise over a blockchain. ScalableBFT adds consensus robustness, which the layer that deals with interpreting the messages assumes as a guarantee, but also incremental hashes that assure perfect replication of messages. Noise protocol slots between consensus and smart contract execution, encrypting/decrypting messages as needed; because the messages are ciphertext only only some of the normal tricks for avoiding a Cartesian blowup of live tests are needed to run per message without leaking information.

Security Model/Pervasive Determinism

Kadena uses the term “pervasive determinism” to describe “the idea of a blockchain that uses PPK-Sig based cryptography for authorship guarantees (like bitcoin) and is composed of a fully deterministic consensus layer in addition to a Turing-incomplete, single-assignment smart contract layer. The implications of a ‘pervasively deterministic’ blockchain are rather profound, as it allows for a bitcoin-ledger class of auditability to be extended deep into the consensus layer by chaining together multiple layers of cryptographic trust. Take as an example a transaction that loads a new smart contract module called “loans”. Say “loans” imports another module called “payments” that is already present in the chain. The successful import of “payments” alone implies the following (with each being fully auditable by cryptographic means):

      who signed the transaction that loaded “payments”

      what consensus nodes were in the cluster at the time of loading

      what consensus nodes agreed that the transaction was valid

      what nodes voted for the current leader at the time of loading

      who the leader was

      who the previous leader was

      etc.

 A pervasively deterministic system allows new transactions to leverage not only the cryptographic trust that naturally occurs as transactions are chained together in a blockchain, but also the trust of how those transactions entered the ledge in the first place. In so doing, you can create a system more secure than Bitcoin because the consensus process becomes as cryptographically trusted, auditable, and entangled as well, with transaction level executions implying that specific consensus level events occurred and with each implication being cryptographically verifiable.“

This provides BFT not just for the consensus layer but for the transaction layer (Bitcoin already does this) as well.  This is different from, say, PBFT which assumes that transactions sent from the client’s server are valid which leaves them with an ability to be compromised. Moreover, non-Raft BFTs generally entrust the client with the ability to depose/ban nodes. Pervasive Determinism takes an alternative viewpoint: trust nothing, audit everything.

Allowing ScalableBFT to incorporate pervasive determinism creates a completely paranoid system that is robust at each and every layer via permanent security (i.e. a form of cryptographic security that can be saved to disk). It has Bitcoin’s security model for transactions, extends this model to the consensus level, and adds smart contracts without the need for mining or the tradeoffs that most in the industry have become accustom to. It’s a real blockchain that’s fast and scalable.   

I asked Will Martino (co-founder of Kadena) for the specifics of how this worked for each layer:

What is your consensus-level security model?

 For replication, Kadena uses an incrementally hashed log of transactions that if identically replicated by each node. The agree on the contents of the log via the distributed signed messages containing the incremental hash of a given log index, which are then collected by other nodes and used to individually reason about when a commit is warranted. No duplicates are allowed in the log and replication messages from the leader containing any duplicates are rejected immediately. We use blake2 hashes and Term number to define uniqueness, allowing clients of the system to not worry about sending duplicates by accident nor about a malicious node/Man in the middle (MITM) resubmitting commands. We employ permanent security, a PPK sig based approach to authorship verification (or any type of approach that can be saved to disk) that is very similar to how bitcoin verifies transactions but at the consensus level (in addition to the transaction level). This is opposed to ephemeral security which uses secured channels (TLS) for authorship validation, a vastly inferior approach where the question “who sent the transaction X?” is answered not via PPK cryptography but via a consensus-level query because any individual node is incapable of providing a BFT answer.

 What is your transaction-level security model?

 The ideas of ephemeral and permanent security span both the consensus and transaction level, as it is consensus that hands the smart contract execution layer individual transactions. At the smart contract/transaction level we also use permanent security as well, supporting row level public key authorization natively in Pact. This is important because ephemeral implies that an attacker is one server away from impersonating an entity; secured channels work by point to point distribution of new transactions by the client/submitter to the cluster nodes over TLS and consensus secures that a given transaction should be committed and replicated. However, if an attacker hacks the client server holding the other end of the TLS connection, they can transact as if they were the client without the cluster being the wiser. Permanent security, on the other hand, has many keys for individual roles in a given entity thus requiring an attacker to gain access to the individual keys; further, with permanent security the CEO’s transactions are signed with a different key than the Mail Clerk’s transactions vs ephemeral where the “who is sending this transaction” is determined by a “from: X” field. If the same TLS connection is used to submit both the CEO’s and the Clerk’s transactions, then the authorship and authorization logic is a “because I said so/trust me” model vs a PPK-sig approach where you verify against the appropriate key before execution. Kadena’s Blockchain is designed to trust as little as possible; if we knew of a more paranoid or fine-grained approach than row-level PPK signatures we’d use that instead.

 What is your confidential transaction model?

 We use Double-Ratchet protocol (what Signal, WhatsApp, etc… use for encrypted communications) embedded into the blockchain (encrypted transaction bodies) for multi-party privacy preserving use cases. We work with the notion of disjoint databases via the `pact` primitive in Pact – they describe a multiphase commit workflow over disjoint databases via encrypted messages.

 

Smart Contracts

Pact is a full smart contract language whose interpreter is built in Haskell.  In Kadena every transaction is a smart contract and the Pact smart contract language is open sourced. Pact is database-focused, transactional, Turing-incomplete, single-assignment (variables cannot be changed in their lifetime), and thus highly amenable to static verification. Pact is also interpreted – the code you write is what executes on chain – whereas Solidity is compiled, making it difficult to verify code, and also impossible to rectify security issues in old language versions, once compiled. Pact ships with its own interpreter, but can run in any deterministic-input blockchain, and can support different backends, including commercial RDBMS. In the ScalableBFT blockchain, it runs with a fast SQLite storage layer.

 

 

Characteristics of the Kadena Blockchain

The Kadena Blockchain contains all these features:

 

In conclusion, Kadena has developed a fully replicating, scalable and deterministic consensus algorithm for private blockchains with high performance.  This blockchain solution can be a giant leap forward for financial services companies looking to employ a real private solution that remains true to many of the key bitcoin blockchain features without mining (Proof of Work), anonymity and censorship resistance while catering to the key design features that financial services are craving particularly scalability and confidentiality. 

 

 

 

 

 

 

 

 

The Trend Towards Blockchain Privacy: Zero Knowledge Proofs

One of the bigger trends in the blockchain world, particularly when it comes to financial services and specifically capital markets operations, has been a need for privacy and confidentiality in the course of daily business.  This has meant that blockchain solutions are being designed with this primary need in mind.  This has led to all the private blockchain solutions being developed today.

When you build for privacy and confidentiality there are tradeoffs that come with that. Mainly you lose transparency, which was the major feature of the the first blockchain: Bitcoin.  As originally designed a blockchain is a transparency machine.  In this system, the computers are distributed and no one entity controls the network.  Not only this but anyone can be a validator and anyone can write to or read from the network.  Clients and validators can be anonymous and all the data gets stored locally in every node. (replication).  This makes all transaction data public. The security of Bitcoin is made possible by a verification process in which all participants can individually and autonomously validate transactions.  While Bitcoin addresses the privacy problem by issuing pseudonymous addresses, it is still possible to find out who's addresses they are through various techniques.

This is the polar opposite of what is happening in the private blockchain world, where decentralization and transparency are not deemed as necessary for many capital markets use cases.  What is important is privacy and confidentiality, latency (speed) and scalability (able to maintain high performance as more nodes are added are added to the blockchain). Encrypted node to node (n2n) transactions where only the two parties involved in the transaction receive data.  In many of these systems there are opt ins for third party nodes (regulators) to be a part of the transaction.  Other systems being developed for similar purposes, which have been written about on this blog, have one designated block generator which collects and validates proposed transactions, periodically batching them together into a new-block proposal.  Consensus is provided by a Generator that applies rules (validates) agreed to by the nodes (chain cores) to the block and designated block signors. 

In these systems, decentralization is simply not necessary because all the nodes are known parties.  In private blockchains the nodes must be known in order to satisfy certain regulatory and compliance requirements. The focus has been on how to preserve privacy and confidentiality while achieving speed, scalability, and network stability.  Therefore, there are ways for legal recourse even between parties who don't necessarily trust each other.  

Strong, Durable Cryptographic Identification

What is Cryptography and Encryption?

As noted above with privacy and confidentiality being pivotal, encryption has become a major focus for all blockchains.  Many of these solutions are using advanced cryptographic techniques that provide strong mathematically provable guarantees for the privacy of data & transactions. 

In a recent blog post  titled "A Gentle Reminder About Encryption" by Kathleen Breitman of R3CEV,  she succintly provides a great working definition:

"Encryption refers to the operation of disguising plaintext, information to be concealed. The set of rules to encrypt the text is called the encryption algorithm. The operation of an algorithm depends on the encryption key, or an input to the algorithm with the message. For a user to obtain a message from the output of an algorithm, there must be a decryption algorithm which, when used with a decryption key, reproduces the plaintext."

If this encryption uses ciphertext to decrypt this plaintext, you get homomorphic encryption and this (combined with digital signature techniques) is the basis for the cryptographic techniques which will be discussed in this post.  Homomorphic encryption allows for computations to be done on encrypted data without first having to decrypt it.  In other words, this technique allows the privacy of the data/transaction to be preserved while computations are performed on it, without revealing that data/transaction.  Only those with decrypt keys can access what exactly that data/transaction was.

Homomorphic encryption means that decrypt(encrypt(A) + encrypt(B)) == A+B. This is known as homomorphic under addition.

So a computation performed on the encrypted data when decrypted is equal to a computation performed on the encrypted data.

The key question being asked is: How can you convince a system of a change of state without revealing too much information?

After all,  blockchains want to share a (change of) state; not information.  On a blockchain, some business process is at state X and now moves to state Y, this needs to be recorded and proved while preserving privacy and not sharing a lot of information.  Furthermore, this change of state needs to happen legally, otherwise there is a privacy breach.

Cryptographic techniques like zero knowledge proofs (ZKPs), which use different types of homomorphic encryption, separate:

1) reaching a conclusion on a state of affairs

2) the information needed to reach that state of affairs

3) show that that state is valid.

The rest of this post will discuss how the trend towards privacy has led to cryptographic techniques, some old and some new, being used to encrypt transactions and the data associated with them from everyone except the parties involved.  The focus will be on Zero Knowledge Proofs, zk SNARKs, Hawk, confidential signatures, state channels and homomorphic encryption.

The privacy problem on a blockchain is the main gap for deployment for all of the cryptographic solutions talked about below.

Outside of a blockchain, there are examples of homomorphic encryption in practice. CryptDB is an example of system that uses homomorphic encryption and other attribute preserving encryption techniques to query databases securely. It is used in production at Google and Microsoft amongst other places. It does have limitations though: you have to define the kinds of queries you want ahead of time and it is easy to leak data.  CryptDB provides confidentiality for data content and for names of columns and tables; however CryptDB does not hide the overall table structure, the number of rows, the types of columns, or the approximate size of data in bytes.  One method CryptDB uses to encrypt each data items is by onioning. This allows each data item to be placed in layers of increasingly stronger encryption.

Confidential Transactions

Gregory Maxwell designed a cryptographic tool (CT) to improve the privacy and security of Bitcoin-style blockchains. It keeps the amounts transferred visible only to participants in the transaction. CT's make the transaction amounts and balances private on a blockchain through encryption, specifically additively homomorphic encryption.  What users can see is is the balances of their own accounts and transactions that they are receiving.  Zero knowledge proofs are needed to demonstrate to the blockchain that none of the encrypted outputs contain a negative value.

The problem with Confidential Transactions is that they only allow for very limited proofs  as mentioned above.  zkSNARKs and Zero Knowledge Proofs (ZKPs) which will be described in detail below, allow you to prove virtually any kinds of transaction validation while keeping all inputs private. 

Zero Knowledge Proofs(ZKPs) 

Zero Knowledge Proofs (ZKPs) are not new.  They were first conceptualized in 1985  in a paper "The Knowledge Complexity of Interactive proof Systems."  A ZKP is a cryptographic technique which allows two parties (a prover and a verifier) to prove that a proposition is true, without revealing any information about that thing apart from it being true. In the case of cryptocurrencies and blockchains this will generally be data about transactional information.

"A zero-knowledge proof must satisfy three properties:

  1. Completeness: if the statement is true, the honest verifier (that is, one following the protocol properly) will be convinced of this fact by an honest prover.
  2. Soundness: if the statement is false, no cheating prover can convince the honest verifier that it is true, except with some small probability.
  3. Zero-knowledge: if the statement is true, no cheating verifier learns anything other than this fact. This is formalized by showing that every cheating verifier has some simulator that, given only the statement to be proved (and no access to the prover), can produce a transcript that "looks like" an interaction between the honest prover and the cheating verifier.

The first two of these are properties of more general interactive proof systems. The third is what makes the proof zero-knowledge."

zk-SNARKs

A zk-SNARK (zero-knowledge Succinct Non-Interactive Arguments of Knowledge) is a Zero Knowledge proof that is a way to prove some computational fact about data without revealing the data.  Zk-SNARKs are the underlying cryptographic tool used in Zcash and Hawk both of which are building blockchains with ZKPs and both will be explained later.  In the case of Zcash these SNARKs are used for verifying transactions and in the case of Hawk they are used for verifying smart contracts.  This is done while still protecting users privacy.

A zk-SNARK is a non-interactive zero-knowledge proof of knowledge that is succinct and  for which proofs are very short and easy to verify.  They can be thought of as little logic circuits that need to generate a proof of statement to verify each and every transaction.  They do this by taking a snapshot of of each transaction, generate a proof and then need to convince the receiving side that the calculation was done correctly without revealing any data except the proof itself.  The basic operation of a SNARK execution is a coded input into this circuit which can be decrypted.  

 Since zk-SNARKs can be verified quickly, and the proofs are small, they can protect the integrity of the computation without burdening non-participants. It should be noted that this technology is just now starting to mature but still has limitations.  They are very CPU intensive to generate proofs and it takes up to 1 minute to generate new proofs, so scaling is still an issue that needs to be resolved.

The very first data points for zk-SNARKs will be Zcash which is a combo of distributed state and proof that you own the assets.

Zcash

Zcash can be described as an encrypted open, permissionless, replicated ledger.  A cryptographic protocol for putting private data on a public blockchain.  Zcash can be thought of  as an extension of the bitcoin protocol.  Basically Zcash added some fields to the bitcoin transaction format to support encrypted transactions.  Zcash uses SNARKs (ZKPs) to encrypt all of the data and only gives decryption keys to authorized parties to see that data.   This could not be done on a public blockchain until now because if you encrypted everything in the past it would prevent miners from checking to see if transactions are valid.  ZKPs have made this possible by allowing the creator of a transaction to make a proof that the transaction is true without revealing the sender's address, the receiver's address and the transaction amount.  Zooko describes this by saying bitcoin has 3 columns, which are the three mentioned above (sender address, receiver address, transaction amount) and Zcash has 4.  The 4th column proof doesn’t know the sender address, the receiver address and amount transferred, but it does know that nobody could have created the proof that comes with the encrypted values unless they have a secret key which has sufficient value to cover the amount amount being transacted.  This is a proof that the data inside the encryption correctly satisfies the validity constructs. This allows the prevention of double spends and transactions of less than zero.

Zcash is mostly the same as bitcoin.  The miners and full nodes are transaction validators. Zcash uses POW that has miners checking ZKP’s attached to each transaction and getting a reward for validating those transactions.  Full nodes are the same except that if you have the private keys you can detect if some transactions have money that is there for you.  SNARKs make it so that miners can reject a transaction from someone if their private key doesn’t have enough money for that transaction.  By keeping all data private except for the 4th column it omits information from leaking onto a private blockchain which allows for everyone to view information about transactions.  zCash has selective transparency while bitcoin has mandatory transparency.  This means that Zcash can reveal specific things to specific people by permissioning.  It reveals specific transactions that anyone looking at them can verify in the blockchain.

Some differences from the zCash whitepaper include:

"Value in Zcash is carried by notes, which specify an amount and a paying key. The paying key is part of a payment address, which is a destination to which notes can be sent. As in Bitcoin, this is associated with a private key that can be used to spend notes sent to the address; in Zcash this is called a spending key.

A payment address includes two public keys: a paying key matching that of notes sent to the address, and a transmission key for a key-private asymmetric encryption scheme. “Key-private” means that ciphertexts do not reveal information about which key they were encrypted to, except to a holder of the corresponding private key, which in this context is called the viewing key. This facility is used to communicate encrypted output notes on the block chain to their intended recipient, who can use the viewing key to scan the block chain for notes addressed to them and then decrypt those notes.

The basis of the privacy properties of Zcash is that when a note is spent, the spender only proves that some commitment for it had been revealed, without revealing which one. This implies that a spent note cannot be linked to the transaction in which it was created."

Zcash is what's known as a decentralized anonymous payment schemes (DAP schemes).  A DAP scheme enables users to directly pay each other privately: the corresponding transaction hides the payment’s origin, destination, and transferred amount.   In Zcash, transactions are less than 1 kB and take under 6 ms to verify — orders of magnitude more efficient than the less-anonymous Zerocoin and competitive with Bitcoin.  However the privacy achieved is significantly greater than with Bitcoin.  De-anonymizing bitcoin has become much easier through services that track and monitor bitcoin movements and the data associated with it.  Mixer services allow for coins to be changed as they move through the system via a central party but this still is not sufficient enough.  The zCash whitepaper states:

"mixes suffer from three limitations: (i) the delay to reclaim coins must be large to allow enough coins to be mixed in; (ii) the mix can trace coins; and (iii) the mix may steal coins. For users with “something to hide,” these risks may be acceptable.  But typical legitimate users (1) wish to keep their spending habits private from their peers, (2) are risk-averse and do not wish to expend continual effort in protecting their privacy, and (3) are often not sufficiently aware of their compromised privacy."

The major motivations for ZKPs and the Zcash protocol are 1)privacy and 2)fungibility.  Fungibility is being able to substitute individual units of  something like a commodity or money for an equal amount.  This can be a real problem when some units of value are deemed less because they are considered "dirty".  Hiding the metadata history doesn't allow for a coin with a bad history to be rejected by a merchant or exchange.  Gregory Maxwell said "Insufficient privacy can also result in a loss of fungibility--where some coins are treated as more acceptable than others--which would further undermine Bitcoin's utility as money."

Zcash is expected to launch soon and with that the genesis block of the Zcash blockchain.  This will allow, like the bitcoin blockchain anyone in the world to mine, for Zcash. It will be an open, permissionless system (fully decentralized).  Users will be able to send it to anyone using zero-knowledge privacy.  

ZCash’s use of cutting edge cryptographic techniques comes with substantial risks. A cryptographic attack that permits the forging of zero knowledge proofs would allow an attacker to invisibly create unlimited currency and debase the value of Zcash. Attacks of this kind have been found and fixed in the recent past. Fortunately, the metadata hiding techniques used in Zcash tread are more production-hardened and can be considered less risky.

 

Hawk

Andrew Miller in his whitepaper: "Hawk: The Blockchain Model of Cryptography and Privacy-Preserving Smart Contracts" has developed a programmable smart contract system which works in much the same way as zCash for smart contracts.  Hawk does not store financial transactions on the blockchain and keeps the code of the contract private, data sent to the contract and money sent and received by the contract from the public.  It is only the proof that can seen and all other useful information is hidden. Like zCash, transparency is selective in Hawk and wouldn't need to be used by all smart contracts but based on use cases and the preferences of the parties involved.  It also aims to tackle the isssues of privacy and fungibility in much the same way as the zCash protocol.

The Hawk whitepaper does a great job of describing the motivation for contractual security it seeks to provide for financial transactions:

"While on-chain privacy protects contractual parties’ privacy against the public (i.e., parties not involved in the financial contract), contractual security protects parties in the same contractual agreement from each other. Hawk assumes that contractual parties act selfishly to maximize their own financial interest. In particular, they can arbitrarily deviate from the prescribed protocol or even abort prematurely. Therefore, contractual security is a multi-faceted notion that encompasses not only cryptographic notions of confidentiality and authenticity, but also financial fairness in the presence of cheating and aborting behavior."

According to Andrew Miller, Hawk is based on several cryptographic primitives.  It uses the same zero knowledge proof library as zCash, which is called libsnark.  Hawk also uses custom implementations of a lattice-based hash function, and public key encryption.  Hawk uses a jSnark tool which is open sourced. 

In Hawk, each party generates their own secret keys. Miller stated that "For each contract, there is also a trusted public parameter, similar to Zcash. The only way to generate these parameters is a process that involves generating a secret value in an intermediate step, which needs to be erased at the end of the protocol. To borrow Zcash's term for this, it's like a "toxic waste byproduct" of the setup procedure, and like all industrial waste, it must be securely disposed of. There are many options... we could do what Zcash does and use a multi-party computation to generate these parameters, simply let a trusted party do it (the trusted party only needs to be used once and can go offline afterwards), or use trusted hardware like SGX."

Miller has said there are some differences between Ethereum contracts and Hawk contracts.  Unlike Ethereum, the input language for private contracts in Hawk is C code.  A private Hawk contract is not a long running stateful process like an Ethereum contract, but rather a 1-shot contract that proceeds in phases, where it first receives the inputs from each party, and then it computes the outputs for each party. After the outputs are computed, the contract is finished and no longer holds any balance. So, it is a slightly different computing model. Hawk supports both private contracts as described above, as well as  public contracts which are exactly like those in Ethereum. (No privacy guarantees are provided for the public contracts, though).

As in Zcash, there are some challenges to blockchain scaling and optimizing cryptographic schemes so they are efficient when using ZKPs.  Hawk tries to do as much computation off chain as possible.  This is done because in public blockchains on chain computing gets replicated to every node and slows things down dramatically.  Producing the proof can take up to several minutes (which is long) and can be costly.  Nodes checking the proof only take milliseconds to do that.  Data from the whitepaper: In Hawk, it takes about a minute of CPU time for each participant in a Hawk contract.  On chain computation takes about 9 to 20 milliseconds.

Hawk has not announced a release date yet as they are still working on optimizing their snark compiling tools to enhance performance.  

State Channels

State channels allow for a payment channels that are off chain and allow for updates to any type of applications that have a change of state.  Like the Lightning Network,  two or more users can exchange payments that would normally require a blockchain transaction without needing to publish them on the  blockchain or wait for confirmations except when setting up or closing out the channel. 

Vitalik Buterin explains this in his paper for R3CEV "Ethereum Platform Review"

"State channels are a strategy that aims to solve the scalability challenge by keeping the underlying blockchain protocol the same, instead changing how the protocol is used: rather than using the blockchain as the primary processing layer for every kind of transaction, the blockchain is instead used purely as a settlement layer, processing only the final transaction of a series of interactions, and executing complex computations only in the event of a dispute.

State channels are not a perfect solution; particularly, it is less clear how they extend to massively-multi-user applications, and they offer no scalability improvements over the original blockchain in terms of its ability to store a large state size - they only increase de-facto transaction throughput. However, they have a number of benefits, perhaps the most important of which is that on top of being a scalability solution they are also a privacy solution, as the blockchain does not see any of the intermediate payments or contracts except for the final settlement and any disputes, and a latency solution, as state channel updates between two parties are instant - much faster than any direct on-blockchain solution, private or public, possibly could be, and potentially even faster than centralized approaches as channel updates from A to B can be secure without going through a centralized server."

State channels aim to address the scalability issues, privacy issues and confirmation delays associated with public blockchains while allowing actors who don't necessarily trust each other to transact.

 

Do You Need A Blockchain At All? Is Consensus Needed?

For many people all of these cryptographic methods which mask all of the transactional data will come as a surprise.  The blockchain is supposed to be a transparency machine in which anyone can join the network and as a result view all information on that network.  Even in private blockchains, there is a more open view into the data then the protocols that have been mentioned in this post.   Another question which might come to mind is if consensus is even needed since everything is private but the proof.   If the proof is only between the two parties involved in the transaction why is consensus needed and why use a public blockchain.  It may seem counterintuitive, but the answer is that yes a public blockchain is needed and so is consensus and its due to the privacy of the proofs.  Essentially, complete transparency is needed to maintain the privacy of the proofs.

ZKPs and blockchains complement each other.  You can't just use one to replace the other.  A blockchain is used to make sure the entire network can agree on some state which may or may not be encrypted. ZKPs allow you to be confident about some properties of that state.  In this scenario, you still need a canonical source of truth.  A view key that reveals all incoming transactions but not outgoing ones.  In order for this to happen, you need a fully decentralized ledger with consensus where everyone agrees with the data written there.  

For example, zcash has data which contains information which is useless and unreadable to most actors. It’s a database of commitments and opaque pieces of data.  It's just a way to synchronize data between actors.  (Zooko Wilcox has publicly stated that if Chainalysis graphed this out it would just be a series of timestamps of when a transaction occurred.)  In cases where the number of transactions are low, then timing attacks could reveal the originator of transactions, imagine this to be equivalent of just one node connected to a Tor network.

The real emphasis is on the wallet side for actors because this allows them to spend money and move assets around, in bitcoin you can take a private key and move bitcoin. Now it's more.  It’s a private key and a set of secrets you keep to prove previous proof and generate a new proof that you use to convince others.  For this, a  fully decentralized ledger is needed  with consensus where everyone agrees with the data written there. 

A blockchain is necessary because you need consensus layer from everyone: It is necessary to have an agreement of proofs in the ledger to move assets around later on, if that proof isn’t available in every node then you can’t convince anyone of the proof when you need to move assets later on.  These proofs need to be stored in an open way so the proofs can be seen as being verified and accepted by receiving parties.

There are two different layers: 1) Needs to be agreement on what proofs everyone accepts 2)  Needs to be agreement on what you can prove and what happens on proof of zero knowledge and what happens once you know the information.  How do you generate proof and pass that information to the next person?  The key is to get authority of the transaction by adding a proof or metadata to the transaction with some type of conditional script (if then statements for transaction acceptance).   This code contains transaction validity rules. A person sees proof from outside but they don’t know if the rule itself  has been triggered or not.  Now that have you privacy from ZKPs,  in order to comply with the transaction, you need to prove that the transaction abides by the rules.  So you can take 2 proofs and create new proofs that the person receiving them can point at and verify that the proof is accepted by the entire network.  Once the proofs have a meaning to you based on the rules, you can agree they were proved in the past and can be used in the future to transact and transfer money.

Limitations

ZKPs are moving out of the realm of theory and becoming production strength.  Now is the time to see how practical they are. They are only now going to start having really world tests and they still suffer from big scalability issues.  The work of developing a proof is enormous and has massive computation costs.  As mentioned before, in Zcash in order to create a proof and move money from someone else it takes between 45 seconds and 1 minute on a really strong computer.  Presently, people are working on making SNARKs and ZKPs more efficient by allowing for more proofs per second or for more elaborate proofs in the same amount of time. 

Deep changes need to be made architecturally in blockchain technology in order to understand knowledge of ZKP architecture. You need to understand the constraints of what you can prove and at what scale.

Very Special Thanks to Zaki Manian (@zmanian), Andrew Miller (@socrates1024) Jonathan Rouach (@jonrouach), Anish Mohammed (@anishmohammed)

Hawk section provided by Andrew Miller from a series of questions I asked.

 

https://github.com/zcash/zips/blob/zips27.reorganisation.1/protocol/protocol.pdf

Chain: Simplified Byzantine Fault Tolerance (SBFT)

This post aims to look at some of the key features of the Chain Open Standard, a permissioned blockchain, specifically its consensus mechanism. 

Blockchain startup Chain,  recently released an open source permissioned blockchain built in collaboration with 10 global financial firms and telcos.  This platform is made for financial applications that require high scalability ( > thousands of transactions per second), robust security and near absolute privacy.  Blockchains must be built for the regulatory requirements of these institutions as well. These are attributes the financial services sector requires.  If speed is the key characteristic of this platform, network stability becomes very important in any solution designed.  Chain was built with this design assumption in mind.

Partners in the project include Capital One, Citi, Fidelity, First Data, Fiserv, Mitsubishi UFJ, Nasdaq, Orange, State Street and Visa, all of which have contributed to the technology. This platform is being called the Chain Open Standard.  Chain Core is the software implementation of the Chain Open Standard and is designed to run for enterprise IT environments.

Note: Chain Core is the name Chain has given to nodes on its platform. 

Consensus Mechanism: Simplified Byzantine Fault Tolerance (SBFT)

In SBFT, one designated block generator collects and validates proposed transactions, periodically batching them together into a new-block proposal.  Consensus is provided by a Generator that applies rules (validates) agreed to by the nodes (chain cores) to the block and designated block signors. Other (multiple) designated block signers ratify the proposed block with their signatures.   All network members know the identities of the block signers (permissioned blockchain) and accept blocks only if signed by a sufficient number of signers. A Block Signer validates all transactions in each block and adds a signature.  Only blocks with sufficient signatures are accepted into the chain. This attempts to prevent the double spending problem by attempting to ensure competing transactions gets resolved. 

By using 1 generator (master replicator) in a trusted, private environment this effectively allows for kind of scale and speed needed for transactions and for the signors to validate transactions.  These signors are configurable meaning they can be added/removed from the system at any time.  The same goes for the nodes (chain cores) in the network.  They can be added/deleted since it is a private network and this adds an extra layer of security particularly when dealing with what could be a malicious actor.

As a result of using 1 generator instead of multiple, synchronization does not occur.  Synchronization is a process that establishes consistency of data between 2 or more entities. This feature allows for scalability and speed to not be affected for the enterprise grade solution.   Since the blockchain is private and the entities are known multiple generators could be seen as a redundancy.  Not all nodes need to be online for this platform to function at a minimum  1 generator and 1 signor are needed.  However, typically it allows 100 participants to interact, only needs 5 signors, 1 generator and 1 issuer (some regulatory body).  The Fault Tolerance in this setup allows for 3 out of 4 or 4 out of 5 signors.

The Privacy section will go into the details of how the Chain Open Standard tackles the problem of Confidentiality of information for the platform participants.  Open, permissionless blockchains like Bitcoin are transparency machines in that all participants can view information on the network.  Chain has built a solution for those who need privacy as a main feature.  Without the need for complete transparency and all nodes (chain cores) receiving transactional information, scalability does not get sacrificed, but transparency does.  All systems have trade-offs.  In this system,  the nodes (chain cores) would only get block proofs by node platform.   

The node (core) itself,  could store all the blockchain data or only a snapshot (balance sheet) and a limited history of activity from the Account Manager (described below).

 

Stages

  1. The Asset Issuer (presumably a node on the platform) creates what can be an unlimited number of cryptographically unique Asset ID's.  (Creation Phase)
  2. Units of these assets are issued on a blockchain.  (Submission Phase)
  3. An Asset ID is associated with an Asset Definition. (Asset Definitions can be self enforcing rules to meet specific conditions depending on the use case.  These can have an unlimited amount of reference data) (Validation Phase)
  4. Once issued, units of an asset ID can be stored in and transferred between accounts, programmatically transacted via smart contracts, or retired from circulation. (Signing Phase and Pulling into Nodes Phase)
  5.  After the Signing Phase the transaction goes live.

One of the interesting features of this system is the Account Manager which serves many key roles.  It stores assets in secure accounts.  This is where transaction data gets stored.  These accounts can contain any combination of assets and created for many different types of users.  These accounts can be thought of as digitally secure wallets. In addition to storing assets, the Account Manager enables the transferability of assets in to and out of accounts via blockchain transactions (within the same Core or between different Cores in the network). The Account Manager builds the smart contracts for all different types of transactions (See Smart Contract Section).  Each transaction is a smart contract.  

Ownership of the assets flows through the system by using a push model.  The addresses are provided by other parties and the unique Asset ID's and accounts that get created are used to designate ownership of the assets.  The smart contract (transaction) defines what actions a designated party can take.

Privacy & Security

The Chain Open Standard is a private network in which confidentiality of information is one of top priorities. This platform has been designed to support selective disclosure of sensitive information. This is done using three techniques: one-time-use addresses, zero knowledge proofs, and encrypted metadata.

A one-time address is created each time an account holder wishes to receives assets. These differing addresses prevent other observers of the blockchain from associating transactions with each other or with a particular party.

To cryptographically conceal the contents (assets and amounts) of a transaction, “zero knowledge proofs,” are used, while still allowing the entire network to validate the integrity of the contents. Zero Knowledge Proofs (ZKPs) do this by one party proving to another party that a given statement is true, without conveying any information (in this case, about the transaction) apart from the fact that the statement is indeed true. Only the counter-parties (and those granted access) can view the details of the transaction.

Also transaction metadata can be encrypted with traditional PKI, to conceal details from all but the relevant parties.  The platform uses keys to prove verifiable authenticity (signatures) of the messages delivered between the nodes (chain cores).

The keys are generated by creating an unlimited number of cryptographically unique Asset IDs.  These keys get rotated every 2-3 weeks.  Rotating keys is a process for decrypting data with an old key and applying the data to a new key by re-keying.  These keys should probably be kept in different places or data centers. If one of the keys gets compromised then use other key to generate backup keys and transfer over all assets to new key.  Key management and rotation is essential to managing secure digital assets. These keys also allow and restrict access to certain activities.

Chain Core also integrates with industry-standard hardware security module (HSM) technology. All block and transaction signing takes place within hardened HSM firmware. Multi-signature accounts using independent HSMs can further increase blockchain security. HSM firmware that secures all transactions and blocks Multi-signature accounts to eliminate single points of failure. 

 

Smart Contracts

The Chain Open Standard platform has designed a framework in which all transactions are smart contracts, that allow for outside events or data to trigger the execution of certain clauses in the contract. It also allows each transaction to contain metadata, such as information required for Know Your Customer (KYC) and anti-money laundering (AML) regulations.  The smart contracts have a built in identity feature.

Some of the use cases Chain is looking at for financial transactions and generate a smart contract on transaction by transaction basis include (See Captions Below for Use Cases):

Asset Issuance

Asset Issuance

Payments

Payments

Uses cases being explored that have smart contract features for transactions are:

  1. Asset Issuance - Digitize existing assets for transacting on a blockchain network
  2. Simple Payment - Transfer assets from one account to another
  3. Bilateral Trade - Swap one asset for another with no counterparty risk
  4. Order book - Name your sale price and let a buyer find you
  5. Collateralized Loan - Lend assets, guaranteed by locked collateral
  6. Auction - Set a minimum price and sell your assets to the highest bidder 

 

Architecture Concept

Chain Core is the software implementation of the Chain Open Standard. It is designed to run in enterprise IT environments. 

http://chain.com/core/

http://chain.com/core/

 

 Conclusion: Trade-offs for Scalability and Speed

Key characteristics of the Chain Open Standard include scalability, speed and privacy. With this in mind, as with any blockchain, trade-offs occurred for high transaction speeds.  Chain created a private blockchain open only to members of the platform.  Data privacy is a major problem private blockchains aim to solve without losing other key features of a blockchain network.  Decentralization and transparency are lost as a result of this. For the types of clients they have this is a non issue and is necessary to ensure privacy and confidentiality of transactions at scale. Having 1 generator effectively act as a master replicator for the private network of known signors and participants also allows transactions to scale into the tens of thousands per second. This being the case synchronization becomes a waste of effort and hurts scalability, so has been discarded as well.  If limitless scalability is a design principle network stability (consistency of the data) and speed cannot be sacrificed.  Transparency and decentralization can. 

Scale also gets achieved in the Chain Open Standard through sharding and replication of the storage layer. Sharding allows for the partitioning of large databases into smaller ones, which make them faster and much easier to manage. (Ethereum aspires to this as well) The one thing in the near term that may hurt this enormous scalability could be using zero knowledge proofs which are not known to scale at this point in time.  

Networks can be fitted and repurposed for any size market. However use cases centered around decentralization (privacy), scalability (speed of transactions), and consistency (Stability of network) will dictate what consensus model gets used.  Illiquid markets will not need the same type of solution as highly liquid ones.  The same can be said of use cases where absolute privacy is necessary.  Within each network, different levels of participation by different institutions are also important for deciding what type of blockchain you will build. 

Sources:

http://chain.com/core/

https://chain.com/os/

 

 

 

 

A New Approach to Consensus: Swirlds HashGraph

(Special thanks to Leemon Baird, creator of the Swirlds Hashgraph Consensus Algorithm)

As many people here know, my interest in consensus mechanisms runs far and wide.  In the KPMG research report I co-authored "Consensus: Immutable Agreement for the Internet of Value", many consensus mechanisms were discussed. In Appendix 3 of the paper, many of the major players in the space discussed their consensus methodologies.  One consensus mechanism which wasn't in the paper was the Swirlds Hashgraph Consensus Algorithm. That whitepaper is a great read and this consensus mechanism holds quite a lot of promise.  I have had many discussions with its creator, Leemon Baird and this blog post comes from conversations, questions and emails about the topic.  Also at the end of the blog I asked Leemon to fill out the consensus questionnaire from the KPMG report and he graciously did. His answers appear at the end of this post.

What exactly is a hashgraph? 

A "hashgraph" is a data structure, storing a certain type of information, and updated according to a certain algorithm.   The data structure is a directed acyclic graph, where each vertex contains the hash of its two parent vertices. This could be called a Merkle DAG, and is used in git, and IPFS, and in other software.

The stored information is a history of how everyone has gossiped.  When Alice tells Bob everything she knows, during a gossip synch, Bob commemorates that occurrence by creating a new "event", which is vertex in the graph, containing the hash of his most recent event, and the hash of Alice's most recent event.  It also contains a timestamp, and any new transactions that Bob wants to create at that moment.  Bob digitally signs this event.  The "hashgraph" is simply the set of all known events.

The hashgraph is updated by gossip: each member repeatedly chooses another member at random, and gives them all the events that they don't yet know.  As the local copy of the hashgraph grows, the member runs the algorithm in the paper to determine the consensus order for the events (and the consensus timestamps).  That determines the order of the transactions, so they can be applied to the state, as specified by the app.

 

What are gossip protocols?

A "gossip protocol" means that information is spread by each computer calling up another computer at random, and sharing everything it knows that the other one doesn't.  It's been used for all sorts of things through the decades. I think the first use of the term "gossip protocol" was for sharing identity information, though the idea probably predates the term. There's a Wikipedia article with more of the history. In Bitcoin, the transactions are gossiped, and the mined blocks are gossiped.  

It's widely used because its so fast (information spreads exponentially fast) and reliable (a single computer going down can't stop the gossip).

The "gossip about gossip" idea is new with hashgraph, as far as I know.  There are many types of information that can be spread by gossip.  But having the information to gossip, be the history of the gossip itself is a novel idea.  

In hashgraph, it's called "gossip about gossip" rather than "gossip of gossip".  Similar to how your friends might "gossip about what Bob did" rather than "gossip of what Bob did".

Key Characteristics of Swirlds Hashgraph Consensus

  1.  Ordering and fairness of transactions are the centerpiece of Swirlds. Simply put, Swirlds seeks to fix the ordering problem found in the blockchain world today (due to different consensus methodologies that have trouble addressing this problem) by using Hashgraph Consensus and "gossip about gossip".
  2. Hashgraph can achieve consensus with no Proof of Work. So it can be used as an open system (non-permissioned) using Proof of Stake, or it can be used as a permissioned system without POW or POS. 
  3.  There's no mining. Any member can create a block (called an "event") at any time.
  4. It supports smart contract creation.
  5. Blocksize can be whatever size you want. When you create a block ("event"), you put in it any new transactions you want to create at that time, plus a few bytes of overhead. So the block ranges from a few bytes (for no transactions), to as big as you want it (for many transactions).  But since you're creating many blocks per second, there's no reason to make any particular block terribly big.
  6. The core hashgraph system is for distributed consensus of a set of transactions. So all nodes receive all data.  One can build a sharded, hierarchical system on top of that. But the core system is a replicated state machine. Data is  stored on each machine. But for the core system, the data is replicated.

Other Questions I asked Leemon Baird about the Whitepaper

Below are some questions I asked Leemon after reading the whitepaper. His answers are elaborate and very useful for those seeking to not only understand Hashgraph Consensus but also the inner workings of blockchains and the consensus algorithms that power them. 

1)  Why is fairness important?

Fairness allows new kinds of applications that weren't possible before.  This creates the fourth generation of distributed trust.

For some applications, fairness doesn't matter. If two coins are spent at about the same time, we don't care which one counts as "first", as long as we all agree.  If two people record their driver's license in the ledger at about the same time, we don't care which counts as being recorded "first". 

On the other hand, there are applications where fairness is of critical importance.  If you and I both bid on a stock on the New York Stock Exchange at the same time, we absolutely care which bid counts as being first!  The same is true if we both try to patent the same thing at the same time. Or if we both try to buy the same domain name at the same time. Or if we are involved in an auction. Or is we are playing an online game: if you shoot me and I dodge, it matters whether I dodged BEFORE you shot, or AFTER you shot.

So hashgraph can do all the things block chain does (with better speed, cost, proofs, etc).  But hashgraph can also do entirely new kinds of things that you wouldn't even consider doing with a block chain.

It's useful to think about the history of distributed trust as being in 4 generations:

1. Cryptocurrency

2. Ledgers

3. Smart Contracts

4. Markets

I think it's inevitable. Once you have a cryptocurrency, people will start thinking about storing other information in it, which turns it into a public ledger with distributed trust. 

Once you have the ledger storing both money and property, people will start thinking about smart contracts to allow you to sell property for money with distributed trust.

Once you have the ability to do smart contracts, people will start thinking about fair markets to match buyers and sellers.  And to do all the other things that fairness allows (like games, auctions, patent offices, etc).

Swirlds is the first system of the fourth generation.  It can do all the things of the first 3 generations (with speed, etc). But it can also do the things of the 4th generation.

 

 2) You mention internet speed and how faster bandwidth matters?  So it acts like the current state of electronic trading in the stock market.  Are you not worried about malicious actors with high speed connections taking over the network?  Kind of like how High Frequency Trading doe in the stock market using low latency trading mechanisms, co-locality and huge bandwidth are extremely advantageous for "winning" as Michael Lewis talks about in "Flash Boys"?

In hashgraph, a fast connection doesn't allow you to "take over the network". It simply allows you to get your message out to the world faster. If Alice creates a transaction, it will spread through gossip to everyone else exponentially fast, through the gossip protocol. This will take some number of milliseconds, depending on the speed of her Internet connection, and the size of the community. If Bob has a much faster connection, then he might create a transaction a few milliseconds later than her, but get it spread to the community before hers.  However, once her transaction has spread to most people, it is then too late for Bob to count as being earlier than her, even if Bob has infinite bandwidth.

This is analogous to the current stock market, except for one nice feature. If Bob wants an advantage of a few milliseconds, he can't just build a single, fast pipe to the single, central server. He instead needs a fast connection to everyone in the network. And the network might be spread across every continent.  So he'll just need to have a fast connection to the Internet backbone. That's the best he can do, and anyone can do that, so it isn't "unfair". 

In other words, the advantage of a fast connection is smaller than the advantage he could get in the current stock market. And it's fair. If the "server" is the entire community, then it is fair to say that whichever transaction reached the entire community first, will count as being "first". Bob's fast connection benefits him a little, but it also benefits the community by making the entire system work faster, so it's good.

"Flash Boys" was a great book, and I found it inspiring. Our system mitigates the worst parts of the existing system, where people pay to have their computers co-located in the same building as the central server, or pay huge amounts to use a single fast pipe tunneled through mountains. In a hashgraph system, there is no central server, so that kind of unfairness can't happen.

3) You mention in the whitepaper that increasing block size "can make the system of fairness worse". Why is that?

That's true for a POW system like Bitcoin.  If Alice submits a transaction, then miner Bob will want to include it in his block, because he's paid a few cents to do so.  But if Carol wants to get her transaction recorded in history before Alice's, she can bribe Bob to ignore Alice's transaction, and include only Carol's in the block. If Bob succeeds in mining the block, then Alice's transaction is unfairly moved to a later point in history, because she has to wait for the next miner to include her transaction.

If each block contains 1 transaction, then Alice has suffered a 1-slot delay in where her transaction appears in history. If each block contains a million transactions, then Alice has suffered a million-slot delay. In that sense, big blocks are worse than small blocks. Big blocks allow dishonest people to delay your transactions into a later position in the consensus order.

The comment about block size doesn't apply to leader-based systems like Paxos. In them, there isn't really a "block". The unfairness simply comes from the current leader accepting a transaction from Alice, but then delaying a long time before sending it out to be recorded by the community.  The comment also doesn't apply to hashgraph.

4) Can you explain how not remembering old blocks works? And why one just needs to know the most frequent blocks and how this doesn't fly in the face of longest chain rule?  

Hashgraph doesn't have a "longest chain rule".  In blockchain, you absolutely must have a single "chain", so if it ever forks to give you two chains, the community must choose to accept one and reject the other. They do so using the longest chain rule. But in hashgraph, forking is fine. Every block is accepted.  The hashgraph is an enormous number of chains, all woven together to form a single graph. We don't care about the "longest chain". We simply accept all blocks.  (In hashgraph, a block is called an "event").

What we have to remember is not the "most frequent block". Instead, we remember the state that results from the consensus ordering of the transactions. Imagine a cryptocurrency, where each transaction is a statement "transfer X coins from wallet Y to wallet Z". At some point, the community will reach a consensus on the exact ordering of the first 100 transactions. At that time, each member of the community can calculate exactly how many coins are in each wallet after processing those 100 transactions (in the consensus order), before processing transaction number 101.  They will therefore agree on the "state", which is the list of amounts of coins in all the non-empty wallets. Each of them digitally signs that state. They gossip their signatures. So then each member will end up having a copy of the state along with the signatures from most of the community.  This combination of the state and list of signatures is something that mathematically proves exactly how much money everyone had after transaction 100.  It proves it in a way that is transferrable: a member could show this to a court of law, to prove that Alice had 10 coins after transaction 100 and before transaction 101.  

At that point, each member can discard those first 100 transactions. And they can discard all the blocks ("events") that contained those 100 transactions.There's no need to keep the old blocks and transactions. Because you still have the state itself, signed by most of the community, proving that there was consensus on it.

Of course, you're also free to keep that old information. Maybe you want to have a record of it, or want to do audits, or whatever. But the point is that there's no harm in throwing it away. 

5) You mention that blockchains don't have a guarantee of Byzantine agreement, b/c a member never reaches certainty that agreement has been achieved. Can you elaborate on this and explain why Hashgraph can achieve this?

Bitcoin doesn't have Byzantine fault tolerance, because of how that's defined.  Hashgraph has it, because of the math proof in the paper.

In computer science, there is a famous problem called "The Byzantine Generals Problem".  Here's a simplified version. You and I are both generals in the Byzantine army. We need to decide whether to attack at dawn. If we both attack or both don't attack, we will be fine. But if only one of us attacks alone, he will be defeated, because he doesn't have enough forces to win by himself.

So, how can we coordinate? This is in an age before radio, so you can send me a messenger telling me to attack. But what if the messenger is captured, so I never get the message?  Clearly, I'm going to need to send a reply by messenger to let you know I got the message.  But what if the reply is lost?  Clearly, you need to send a reply to my reply to let me know it got through. But what if that is lost?  We could spend eternity replying to each other, and never really know for sure we are in agreement. There was actually a theater play that dramatized this problem.

The full problem is more complicated, with more generals, and with two types of generals. But that's the core of the problem.  The standard definition is that a computer system is "Byzantine fault tolerant", if it solves the problem in the following sense:

- assume there are N computers, communicating over the Internet

- each computer starts with a vote of YES or NO

- all computers need to eventually reach consensus, where we all agree on YES, or all agree on NO

- all computers need to know when the consensus has been reached

- more than 2/3 of the computers are "honest", which means they follow the algorithm correctly, and although an honest computer may go down for a while (and stop communicating), it will eventually come back up and start communicating again

- the internet is controlled by an attacker, who can delay and delete messages at will (except, if Alice keeps sending messages to Bob, the attacker eventually must allow one to get through; then if she keeps sending, he must eventually allow another one to get through, and so on)

- each computer starts with a vote (YES or NO), and can change that vote many times, but eventually a time must come when the computer "decides" YES or NO.  After that point, it must never again change its mind.

- all honest computers must eventually decide (with probability one), and all must decide the same way, and it must match the initial vote of at least one honest member.

That's just for a single YES/NO question.  But Byzantine fault tolerance can also be applied to more general problems.  For example, the problem of decided the exact ordering of the first 100 transactions in history.

So if a system is Byzantine fault tolerant, that means eventually all the honest members will eventually know the exact ordering of the first 100 transactions. And, furthermore, each member will reach a point in time where they know that they know it. In other words, their opinion doesn't just stop changing. They actually know a time when it is guaranteed that consensus has been achieved. 

Bitcoin doesn't do that. Your probability of reaching consensus grows after each confirmation. You might decide that after 6 confirmations, you're "sure enough".  But you're never mathematically certain. So Bitcoin doesn't have Byzantine fault tolerance. 

There are a number of discussions online about whether this matters. But, at least for some people, this is important.  

If you're interested in more details on Bitcoin's lack of Byzantine fault tolerance, we can talk about what happens if the internet is partitioned for some period of time. When you start thinking about the details, you actually start to see why Byzantine fault tolerance matters.

6) You mention in the whitepaper, "In hashgraph, every container is used, and none are discarded"? Why is this important and why is this not a waste?

In Bitcoin, you may spends lots of time and electricity mining a block, only to discover later that someone else mined a block at almost the same time, and the community ends up extending their chain instead of yours. So your block is discarded. You don't get paid. That's a waste. Furthermore, Alice may have given you a transaction that ended up in your block but not in that other one. So she thought her transaction had become part of the blockchain, and then later learned that it hadn't.  That's unfortunate.

In hashgraph, the "block" (event) definitely becomes part of the permanent record as soon as you gossip it. Every transaction in it definitely becomes part of the permanent record.  It may take some number of seconds before you know exactly what position it will have in history. But you **immediately** know that it will be part of history. Guaranteed.

In the terminology of Bitcoin, the "efficiency" of hashgraph is 100%.  Because no block is wasted.

Of course, after the transactions have become part of the consensus order and the consensus state is signed, then you're free to throw away the old blocks.  But that isn't because they failed to be used.  That's because they **were** used, and can now be safely discarded, having served their purpose.  That's different from the discarded blocks in Bitcoin, which are not used, and whose transactions aren't guaranteed to ever become part of the history / ledger.

7) On page 8 of the whitepaper you wrote " Suppose Alice has hashgraph A and Bob hash hashgraph B. These hashgraphs may be slightly different at any given moment, but they will always be consistent. Consistent means that if A and B both contain event X, then they will both contain exactly the same set of ancestors for X, and will both contain exactly the same set of edges between those ancestors. If Alice knows of X and Bob does not, and both of them are honest and actively participating, then we would expect Bob to learn of X fairly quickly, through the gossip protocol. But the consensus algorithm does not make any assumptions about how fast that will happen. The protocol is completely asynchronous, and does not make assumptions about timeout periods, or the speed of gossip, or the rate at which progress is made."   What if they are not honest? 

If Alice is honest, then she will learn what the group's consensus is.

If Bob is NOT honest, then he might fool himself into thinking the consensus was something other than what it was. That only hurts himself.

If more than 2/3 of the members are honest, then they are guaranteed to achieve consensus, and each of them will end up with a signed state that they can use to prove to outsiders what the consensus was.  

In that case, the dishonest members can't stop the consensus from happening.  The dishonest members can't get enough signatures to forge a bad "signed state".  The dishonest members can't stop the consensus from being fair.

By the way, that "2/3" number up above is optimal.  There is a theorem that says no algorithm can achieve Byzantine fault tolerance with a number better than 2/3. So that number is as good as it can be.

8) Are the elections mentioned in the whitepaper  to decide the order of transactions or information?

Yes.  Specifically, the elections decide which witness events are famous witnesses.  Then those famous witness events determine the order of events. Which determines the order of transactions (and consensus timestamps).

9) What makes yellow "strongly see" from the chart on page 8 of the whitepaper?

If Y is an ancestor of X, then X can "see" Y, because there is a path from X to Y that goes purely downward in the diagram.  If there are **many** such paths from X to Y, which pass through more than 2/3 of the members, then X can "strongly see" Y.  That turns out to be the foundation of the entire math proof.

(To be complete: for X to see Y, it must also be the case that no forks by the creator of Y are ancestors of X. But normally, that doesn't happen.)

10) Whats the difference btw weak BFT (Byzantine Fault Tolerance) and strong BFT? Which are you using?

Hashgraph is BFT.  It is strong BFT.

"Weak BFT" means "not really BFT, but we want to use the term anyway".  

Those aren't really technical terms.  A google search for "weak byzantine fault tolerance" (in quotes) says that phrase doesn't  occur even once on the entire web.  And "weak BFT" (in quotes) occurs 6 times, none of which refer to Byzantine stuff.

People like to use terms like "Byzantine" in a weaker sense than their technical definition.  The famous paper "Practical Byzantine Fault Tolerance" describes a system that, technically, isn't Byzantine Fault Tolerant at all.  My paper references two other papers that talk about that fact.  So speaking theoretically, those systems aren't actually BFT.  Hashgraph truly is BFT.

We can also talk about it practically, rather than theoretically.  The paper I referenced in my tech report talks about how simple attacks on the network can almost completely paralyze leader-based systems like PBFT or Paxos.  That's not too surprising. If everything is coordinated by a leader, then you can just flood that leader's single computer with packets, and shut down the entire network.  If there is a mechanism for them choosing a new leader (as Paxos has), you can switch to attacking the new leader.  

Systems without leaders, like Bitcoin and hashgraph, don't have that problem.

Some people have also used "Byzantine" in a weaker sense that is called being "synchronous".  This means that you assume an honest computer will **always** respond to messages within X seconds, for some fixed constant X.  Of course, that's not a realistic assumption if we are worried about attacks like I just described.  That's why it's important that systems like both Bitcoin and hashgraph are "asynchronous".  Some people even like to abuse that term by saying a system is "partially asynchronous". So to be clear, I would say that hashgraph is "fully asynchronous" or "completely asynchronous".  That just means we don't have to make any assumptions about how fast a computer might respond.  Computers can go down for arbitrarily-long periods of time. And when they come back up, progress continues where it left off, without missing a beat.

11) Do "Famous witnesses" decide which transactions come first?

Yes. They decide the consensus order of all the events. And they decide the consensus time stamp for all the events.  And that, in turn, determines the order and timestamp for the transactions contained within the events.

It's worth pointing out that a "witness" or a "famous witness" is an event, not a computer. There isn't a computer acting as a leader to make these decisions.  These "decisions" are virtually being made by the events in the hashgraph. Every computer looks at the hashgraph and calculates what the famous witness is saying. So they all get the same answer. There's no way to cheat.

12) On page 8 of the whitepaper you write, "This virtual voting has several benefits. In addition to saving bandwidth, it ensures that members always calculate their votes according to the rules." Who makes the rules?

The "rules" are simply the consensus algorithm given in the paper.  Historically, Byzantine systems that aren't leader based have been based on rounds of voting.  In those votes, the "rules" are, for example, that Alice must vote in round 10 in accordance with the majority of the votes she received from other people in round 9.  But since Alice is a person (or a computer), she might cheat, and vote differently. She might cheat by voting NO in round 10, even though she received mostly YES votes from others in round 9. 

But in the hashgraph, every member looks at the hashgraph and decides how Alice is supposed to vote in round 10, given the virtual votes she is supposed to have received in round 9.  Therefore, the real Alice can't cheat. Because the "voting" is done by the "virtual Alice" that lives on everyone else's computers.

There are also higher-level rules that are enforced by the particular app built on top of the Swirlds platform. For example, the rule that you can't spend the same coin twice.  But that's not what that sentence was talking about.

13) How are transactions validated and who validates them?

The Swirlds platform runs a given app on the computers of every member who is part of that shared world (a "swirld").  In Bitcoin terminology, the community of members is a "network" of "full nodes" (or of "miners"). The hashgraph consensus algorithm ensures that every app sees the same transactions in the same order. The app is then responsible for updating the state according to the rules of the application.  For example, in a cryptocurrency app, a "transaction" is a statement that X coins should be transferred from wallet Y to wallet Z. The app checks whether wallet Y has that many coins. If it does, the app performs the transfer, by updating its local record of how much is in Y and how much is in Z.  If Y doesn't have that many coins, then the app does nothing, because it knew the transaction was invalid.

Since everyone is running the same app (which is Java code, running in a sandbox), and since everyone ends up with the same transactions in the same order, then everyone will end up with the same state.  They will all agree exactly how many coins are in Y after the first 100 transactions. They will all agree on which transfers were valid and which were invalid.  And so, they will all sign that state. And that signed state is the replicated, immutable ledger.

14) What was the original motivation for creating Swirlds?

We can use the cloud to collaborate on a business document, or play a game, or run an auction. But it bothered me that "cloud" meant a central server, with all the costs and security issues that implies.  It bothered me a lot. 

It should be possible for anyone to create a shared world on the internet, and invite as many participants as they want, to collaborate, or buy and sell, or play, or create, or whatever.  There shouldn't be any expensive server. It should be fast and fair and Byzantine.  And the rules of the community should be enforced, even if no single individual is trusted by everyone. This should be what the internet looks like.  This is my vision for how cyberspace should run.  This is what we need.

But no such system existed.  Whenever I tried to design such a system, I kept running into roadblocks. It clearly needed to be built on a consensus system that didn't use much computation, didn't use much bandwidth, and didn't use much storage, yet would be completely fair, fast, and cheap.

I would work hard on it for days until I finally convinced myself it was impossible. Then, a few weeks later, it would start nagging at me again, and I'd have to go back to working intensely on it, until I was again convinced it was impossible.

This went on for a long time, until I finally found the answer. If there's a hashgraph, with gossip about gossip, and virtual voting, then you get fairness and speed and a math proof of Byzantine fault tolerance. When I finally had the complete algorithm and math proof, I then built the software and a company. The entire process was a pretty intense 3 years.  But in the end, it turned out to be a system that is very simple.  And which seems obvious in retrospect.

 SUMMARY:

The DAG with hashes is not new, and has been widely used. Using it to store the history of gossip ("gossip about gossip") is new.  

The consensus algorithm looks similar to voting-based Byzantine algorithms that have been around for decades. But the idea of using "virtual voting" (where no votes ever have to cross the internet) is new. 

A distributed database with consensus (a "replicated state machine") is not new. But a platform for apps that can respond to both the non-consensus and consensus order is new.

It appears that hashgraph and the Swirlds platform can do all the things that are currently being done with blockchain, and that hashgraph has greater efficiency. But hashgraph also offers new kinds of properties, which will allow new kinds of applications to be built.

Overall Consensus Methodology

What is the underlying methodology of the used consensus?

The Swirlds hashgraph consensus system is used to achieve consensus on the fair order of transactions. It also gives the consensus timestamps on when each transaction was received by the community. It also gives consensus on enforcement of rules, such as in smart contracts.

How many nodes are need to validate a transaction? (% vs number)  How would this impact a limited participation network?

Consensus is achieved when more than 2/3 of the community is online and participating. Almost a third of the community could be attackers, and they would be unable to stop consensus, or to unfairly bias what order becomes the consensus for the transactions.

Do all nodes need to be online for system to function?   Number of current nodes?

Over 2/3 of the nodes need to be online for consensus. If fewer are online, the transactions are still communicated to everyone online very quickly, and everyone will immediately know for certain that those transactions are guaranteed to be part of the immutable ledger. They just won't know the consensus order until more than2/3 come online.

Does the algorithm have the underlying assumption that the participants in the network are known ahead of time? 

No, that's not necessary.  Though it can be run that way, if desired.

Ownership of nodes - Consensus Provider or Participants of Network?

The platform can be used to create a network that is permissioned or not.

What are current stages of mechanism?

Transactions are put into "events", which are like blocks, where each miner can mine many blocks per second. There is never a need to slow down mining to avoid forking the chain. The events are spread by a gossip protocol. When Alice gossips with Bob, she tells Bob all of the events that she knows that he doesn't, and vice versa. After Bob receives those, he creates a new event commemorating that gossip sync, which contains the hash of the last event he created and the hash of the last event Alice created before syncing with him. He can also include in the event any new transactions he wants to create at that moment. And he signs the event. That's it. There is no need for any other communication, such as voting. There is no need for proof of work to slow down mining, because anyone can create events at any time. 

When is a transaction considered "safe" or "live"?

As soon as Alice hears of a transaction, she immediately verifies it and knows for certain that it will be part of the official history. And so does anyone she gossips with after that. After a short delay (seconds to a minute or two), she will know its EXACT location in history, and have a mathematical guarantee that this is the consensus order. That knowledge is not probabilistic (as in, after 6 confirmations, you're pretty sure). It's a mathematical guarantee.

What is the Fault Tolerance?  (How many nodes need to be compromised before everything is shut down?)

This is Byzantine fault tolerant as long as less than 1/3 of the nodes are faulty / compromised / attacking.  The math proof assumes the standard assumptions: attacking nodes can collude, and are allowed to mostly control the internet. Their only limit on control of the internet is that if Alice repeatedly sends Bob messages, they must eventually allow Bob to receive one.

Is there a forking vulnerability?

The consensus can't fork as long as less than 1/3 are faulty / attacking.

How are the incentives defined within a permissioned system for the participating nodes?

Different incentive schemes can be built on top of this platform.

How does a party take ownership of an asset?

This is a system for allowing nodes to create transactions, and the community to reach consensus on what transactions occurred, and in what order. Concepts like "assets" can be built on top of this platform, as defined by an app written on it.

Cryptography/Strength of Algorithm:

How are the keys generated?

Each member (node) generates its own public-private key pair when it joins.

Does the algorithm have a leader or no?

No leader.

How is a node behavior currently measured for errors?

If a node creates an invalid event (bad hashes or bad signature) then that invalid event is ignored by honest nodes during syncs. Errors in a node can't hurt the system as long as less than 1/3 of the nodes have errors.

Governance:

How are controls/governance enforced?

If an organization uses the platform to build a network, then that organization can structure governance in the way they desire.

Tokenization (if used):

Are there any transaction signing mechanism?

Every event is signed, which acts as a signature on the transactions within it. An app can be built on top of this platform that would define tokens or cryptocurrencies.

Performance:

What is current time measurement?  For transaction to be validated? For consensus to achieved?

The software is in an early alpha stage. The answers to this questionairre refer to what the platform software will have when it is complete. For a replicated database (every node gets every transaction), it should be able to run at the bandwidth limit, where it handles as many transactions per second as the bandwidth of each node allows, where each node receives and sends each transactions once (on average) plus a small amount of overhead bytes (a few percent size increase). For a hierarchical, sharded system (where a transaction is only seen by a subset of the nodes, and most nodes never see it), it should be possible to scale beyond that limit. But for now, the platform is assuming a replicated system where every node receives every transaction. 

Security:

Does your mechanism have Digital Signature?

Yes, it uses standards for signatures, hashes, and encryption (ECDSA, SHA-256, AES, SSL/TLS)

How does system ensure the synchrony of the network (what is time needed for the nodes to sync up with network?)

No synchrony is assumed. There is no assumption that an honest node will always respond within a certain number of seconds. The Byzantine fault tolerance proofs are for a fully asynchronous system. The community simply makes progress on consensus whenever the communication happens. If every computer goes to sleep, then progress continues as soon as they wake up.  It should even work well over sneaker-net, where devices only sync when they are in physical proximity, and it might take days or months for gossip to reach everyone. Even in that situation, the consensus mechanism should be fine, working slowly as the communication slowly happens. In normal internet connections with a small group, consensus can happen in less than a second.

Do the nodes have access to an internal clock/time mechanism to stay sufficiently accurate?

There is a consensus timestamp on an event, which is the median of the clocks of those nodes that received it. This median will be as accurate as the typical honest computer's clock. This consensus timestamp does NOT need to be accurate for reaching consensus on the ordering the events, or for anything important in the algorithm. But it can be useful to the applications built on top of this platform.

Privacy:

How does system ensure privacy?

The platform allows each member to define their own key pair, and use that as their identity. If an app is built on top of this platform to establish a network, the app designer can decide how members will be allowed to join, such as by setting up a CA for their keys, or by having votes for each member, or by using proof-of-stake based on a cryptocurrency, etc.  The app can also create privacy, such as by allowing multiple wallets for one user. But the platform simply manages consensus based on a key pair per node.

Does the system require verifiable authenticity of the messages delivered between the nodes (Is signature verification in place?)

Yes, everything is signed, and all comm channels are SSL encrypted. 

How does data encryption work?

All comm during a gossip sync is SSL/TLS encrypted, using a session key negotiated using the keys of the two participants.  If an app wants further encryption, such as encrypting data inside a transaction so that only a subset of the members can read it, then the app is free to do so, and some of the API functions in the platform help to make such an app easier to write.

Implementation Approach

What are current uses cases for Consensus Mechanism?

In addition to traditional use cases (cryptocurrency, public ledger, smart contracts), the consensus mechanism also gives fairness in the transaction ordering.  This can enable use cases where the order must be fair, such as a stock market, or an auction, or a contest, or a patent office, or a massively multiplayer online (MMO) game.

Who is currently working with (Venture Capitalist,  Banks, Credit Card companies, etc.) 

Ping Identity has announced a proof of concept product for Distributed Session Management built on the Swirlds platform. Swirlds, Inc. is currently funded by a mixture of investors including venture capital, strategic partner, and angel funding.

Consensus: A Deeper Dive into the State of Blockchain

Q&A with George Samman, co-author of KPMG’s report: “Consensus: Immutable agreement for the Internet of value”

This interview is posted on both www.sammantics.com and www.bitsonblocks.net

Interviewer is Antony Lewis (AL) and interviewee is George Samman (GS).

AL

George, it’s a pleasure to chat with you.  The KPMG report “Consensus: Immutable agreement for the Internet of value” you co-authored was an interesting read and shone a light on some of the challenges facing private blockchains and distributed ledgers.  How would you summarise the findings?


CONSENSUS

GS

One of the key findings is that getting consensus right is really hard and some of the brightest minds in the space are coming to terms with this and re-examining a lot of the work they have done or researched.  Trying to re-model existing blockchain technology turns out not to be the answer.   

AL

When you say “getting consensus right”, what do you mean?  Do you mean multiple databases all reaching the same state quickly, or do you mean something else?

GS

Consensus has been around for as long as human beings have formed societies and needed to come to a state of agreement without necessarily trusting each other.  For purposes of this interview, we can refer to consensus computing for distributed systems.   In this context, it’s a way for nodes to agree on the validity of a transaction and updating the ledger with a coherent set of confirmed facts.  

AL

How would you describe the problems around achieving consensus?

GS

Keeping data in sync and ordering transactions are what consensus mechanisms are supposed to do.  The main problem that is being researched is around network stability and latency.

AL

Why is getting consensus right hard?  Why is the consensus methodology important?

GS

Most of the material on the subject of consensus comes from academic papers and applications in other industries like air traffic control or stabilizing airplanes.  The challenges are very different to the consensus challenges in capital markets - this hasn’t been done before and the issues are different.

For example, ordering becomes really important when you are dealing with stock market order books.  If multiple people are bidding for a stock at the same price who is the first one to get that price?  An issue of fairness also comes into play which some blockchain systems suffer from because of how they are attempting to achieve consensus. Leader based consensus systems have this problem because the leader selects the ordering of data, so you end up with a centralisation of control, which is what we are trying to avoid. So depending on the use case, the consensus mechanisms themselves become extremely important.

Further, with certain consensus systems, it turns out that there are a maximum number of nodes you can have before the system breaks. This is certainly an additional complexity if you need a lot of nodes in a network where parties do not trust each other but want to have equivalent write-access to the same ledger.

Getting consensus right is critical particularly when nodes can be located all over the world, and network latency adds another layer of complexity to system stabilization.

AL

Point taken on pre-trade orderbooks - I suspect that’s why this isn’t an area of focus any more for private blockchain vendors to financial service companies.

In terms of node distribution or decentralisation, I don’t see any reason why nodes in a high throughput distributed ledger will end up being scattered across the world.  Although with Bitcoin, we currently see geographical distribution for nodes, I think that any successful distributed ledger for the traditional financial industry will have nodes clustered in the same data centres, in the same building, where a number of banks rent hardware sitting physically next to each other, connected with short cables.  This should help to reduce some of the latency issues.  Of course this will be replicated to other data centres as redundant backups in case the main one fails.

To summarise, the ‘distributed’ in ‘distributed ledger technology’  will be ownership distribution rather than geographic distribution.

GS

That makes sense.  Although, if you want true distribution of the information, geographically distributing the nodes and using different cloud providers for the nodes add an extra layer of distribution and security.


SCALABILITY

AL

Moving on from consensus, to the concept of scalability and transaction throughput.  In financial markets a lot of tickets are printed, from the start of the process with orders being submitted and cancelled within milliseconds, through to matched trades and eventually settlement.  Clearly you need throughput.

GS

The problem of consensus becomes harder by orders of magnitudes when dealing with how many transactions financial institutions make. Its essential for the network to be up and running all the time. Scaling to 10s of thousands of transactions per second and beyond, keeping the network up and running is extremely difficult. This is why there aren’t many projects that are in production and able to do this as of today.  It’s a big challenge.   A general principle that could be thought about is to run two sets of consensus mechanisms one which runs locally and one which runs globally and make them intersect, this could be done at intervals of time in a Baby Step Giant Step (BSGS) manner.

 

Regarding  scalability the notion that you start a blockchain with an endless lifetime is still preliminary. The reasoning for that is 3 fold:

  1. Public blockchains are supposed to be everlastingly immutable, but are immature and have not yet figured out how to deal with unknown issues, as we have seen with the recent issues with The DAO.

  2. Technological innovation has yet to come up with a suitable solution for the transaction volume common in the financial sector, and this also then becomes a consensus problem as well.

  3. Configurations - you can’t deploy a single service solution until you’ve tested and retested the correct configurations for such a vital network railroad.

 

AL

I have seen internal “Proof of Concepts” where a single or double node blockchain is spun up, with a user-friendly front end.  They seem to work, at a rudimentary level.  Surely it’s now a case of institutionalising the technology?

GS

Yes you are right the Proof of Concepts are validating that the technology “might be able to live up to its promise.”.  They also have great marketing value.  However, this is a long way off from institutionalized technology and the inherent stability necessary for this.  Institutional technology needs to be battle tested and hardened, and has to be as antifragile as possible.  Hence, I believe the cycle to get the technology up to an acceptable level for satisfying a switchover will be longer than people think.  There can be no room for mistakes even if there are inherent benefits in the technology.

AL

Ok, aside from consensus and scalability, what are the other challenges facing the private “DLT” space?

GS

I think one of the challenges continues to be a lack of industry standards. The longer that common standards and protocols aren’t agreed and written up by industry participants, the more harmful it can be when trying to integrate different solutions, and the further away from interoperability we become.  Is distributed ledger technology creating the next legacy system problem?

Another problem is a technical problem around interoperability with existing systems and potentially between different blockchain networks.  I think this directly correlates to the above point about standards and protocols.  How will these ledgers interact if they are built separately for different types of assets and then work with existing market infrastructure technology?

What we are seeing is sort of the exact opposite of a common framework being adopted where people are trying all sorts of different things.

AL

Sure, but that’s what you would expect with a new set of technologies - and then some sort of Darwinian selection process will occur, and the best will win.  At the heart of it, this seems to be an interoperability and standards discussion. APIs and inter-system communication comes to mind here.  It seems that a lot of the interoperability issues could be fixed by creating standards for common APIs.  You then remove the privacy concerns of shared ledgers.  But APIs don’t solve for immutability and decentralised control - if that’s really what’s wanted.


EMERGING SOLUTIONS


AL

An interesting takeaway is that R3 is not building a blockchain.  That’s surprising to some people - one of the world’s most well known “blockchain companies” is not building a blockchain!

 

GS

I think it’s surprising because some people thought that private “distributed ledger technology” would be the panacea to cure us of all the “ills” of public blockchains (total transparency of network, mining and potential centralization, anonymous actors and the slow speed of transaction times) - however we have seen that is not the case.  In my opinion, R3 realized that the financial problems they aim to solve are not blockchain compatible at the present time.

We are seeing the amount of nodes in these distributed ledger networks shrink all the way down to two - ie back to bilateral communication, with “consensus” being the two nodes agreeing. This is centralization and the exact opposite of what blockchains try to solve for.  A blockchain is supposed to offer immutable consensus. The benefits of transparency, no middleman, p2p transacting without needing trust and speed are what appealed to me about blockchains to begin with.

This also applies to replication of the data: while this can certainly be permissioned to allow certain nodes to have certain actions, those in the network benefit by knowing that whatever business logic that was supposed to happen did the way it was supposed to.

Well, in every system and with every tool there are tradeoffs. When you are performing certain capital market operations, and privacy and confidentiality are most important, a distributed ledger may not be your best tool. Particularly when we are still trying to get consensus right for scaling to hundreds of thousands of transactions per second.

Corda solves the consensus and ordering problems by not being a blockchain requiring network consensus: instead, computers communicate and agree bilaterally.  This gets rid of a lot of the complexity involved with the privacy issues of forming a network with your competitors.  This also brings in a great debate about whether or not a blockchain will be the end solution and if that solution will need consensus. Let the debate begin!  In my opinion Corda can be considered more of an oracle that can connect to blockchains if necessary.

AL

What do you mean, an oracle that can connect to blockchains?

GS

What I mean by oracle is a bridge between a blockchain and the outside world, so in the case of Corda it’s a platform that can speak to blockchains but is not a blockchain itself.

AL

On 2 June this year, Morgan Stanley Research published a report stating “For ASX, blockchain deployment seeks to reduce market costs, defend the clearing monopoly and grow new revenue streams”.  It’s amazing that we have moved from blockchains being “disruptive” to blockchains being used to “defend the clearing monopoly” so quickly!  No wonder there is confusion!  I tried to clarify this here: https://bitsonblocks.net/2016/05/09/confused-by-blockchains-revolution-vs-evolution/

GS

You are getting from the banks a lot of Orwellian doublethink. This is the ability to hold two contradictory thoughts in your head and believe that they both are true.  In this case that blockchains will change the world but we can’t use them properly for certain things we need to do, in a way we are comfortable doing them.

There have also been cautious, or even negative sentiments in recent days about the utility of blockchain technology. The CEO of ICE is not convinced about blockchains.

AL

Sure, some will hate, some will love.  What are you getting at?

GS

I would just say be cautious of false narratives and that there is a deep need for understanding what this technology is really good at, and what it might not be good at.

For me, consensus is a feature not a bug.  A blockchain is a transparency machine like nothing that has come before it. Therefore, if you want a blockchain, look for use cases where total transparency is suitable. There are three questions that need to be answered in order to help you guide your decision making:

1) Who are you?
 

2) What do you want to achieve?

3) Who will be the nodes?

If you can answer these questions than you are on your way to figuring out consensus and the types of permissions you want to configure in your blockchain experiment.

Knowing the types of entities that are going to be involved in the transaction as well as the type of assets being created are also big steps.  Once you have a handle on this figuring out the consensus layer is much easier and you can start to build applications on top of the stack.

 

AL

What about the Proof of Concepts that we are seeing in the media?

GS

A lot of the use cases that companies are going after right now don’t need a blockchain - or at the very least, the end clients - often banks, aren’t needing a blockchain solution for them yet.  A lot of the blockchain companies are also Proof of Concepts themselves and have still not been taken “out of the box.”.  This is where separating hype from reality starts. I also think a lot of use cases people are looking at for a blockchain to solve are things that aren’t meant to be solved by a blockchain.

From the company side it is important to define your identity: Are you a blockchain company or a company using a blockchain? There is a big difference. For example, if you are working on solving issues in trade finance and you are using a blockchain as a solution, unless you are designing your own platform from scratch, you are just improving efficiencies using technology, but you are still a trade finance company.


 

INDUSTRY


AL

Clearly we’re at the start of the innovation cycle and the problem is just that the hype and promise has accelerated and deviated from how quickly we can deliver.  This is an unfortunate reality, but sometimes necessary to attract the investment needed to light up a technology.  Can we reach the promised land of $20bn reduced annual cost by using distributed ledgers?

GS

I think eventually we do reach the $20 billion mark, and that’s nice, but it’s not revolutionary.  It’s also a drop in the bucket compared to what banks spend today.  In order to get there and switch systems, the costs saved will need to outweigh the money invested to do that. That hurdle may be too large to jump.  Maybe the way to think about it is, are there other accrued costs which will also be saved aside from just reducing settlement costs and backoffice reduction savings.  The answer is to this is yes.

While the cost savings are appealing to banks for many reasons talked about, I think the more relevant story will be how can we generate revenue from the apps built on DLT technology.  While ideas are floating around now, the reality will probably look very different from their original conception.

AL

You’re talking about how blockchain / DLT technology providers will monetise?

GS

Yes, the VC funding cycle has become long in the tooth and the IPO market is no longer attractive. Some of the private company tech darlings, including Airbnb are getting valuation writedowns.  The FinTech narrative is starting to question monetisation paths, and where the revenues will come from when VC money dries up.

AL

Scary picture - when and how will VC money dry up?

GS

This can come from rate hikes in the future or recession or some shock to the system.  It’s hard to predict, however the funding cycle has become long in the tooth.  Global growth has slowed and even Mary Meeker pointed this out in exquisite detail in her latest state of the union.

Particularly in the blockchain space, The DAO should be looked at as the top. This is really madness in a lot of ways but based on the sheer amount of money that was raised is astounding. I think we are post peak-hype and reality will start to set in sooner rather than later.

AL

The DAO raised the USD equivalent of $150 million from pseudonymous investors, to fund unknown projects, to be voted on by unknown participants.  That really does seem pretty nuts.  It was also hacked recently - or at least it behaved exactly as it was coded to, to the detriment of many of the investors who had not scrutinised the code.

So the billion dollar question - as celebrated startups move towards becoming technology providers, unable to monetise on a “per ticket” basis, how are the company valuations justified?  Who should we invest in?

GS

Valuations seem to be based on what the last round someone else raised at as a starting point. Particularly for the bigger startups who raising later stage rounds.

The financial service companies investing in these large rounds will not be taken for fools.  They understand valuation like no other.  What is interesting is the lack of high profile VC’s investing in these bigger rounds. The funding seems to be coming from the titans of finance and those that are at risk of being “disintermediated” by cryptocurrencies.  It’s a good narrative-control play.

The funding source from finance titans can also come back and bite DLT startups. If they are beholden to only the established incumbents, they might not be able to design the disruptive ecosystems promised by blockchain technology.

I think it’s way too early to predict any clear cut winners. I would be investing in the cloud companies that will be hosting all these companies, their data and their applications, and also the companies that are using blockchain technology properly.  This is not an easy thing to do when people are trying to fit square pegs into round holes. Simplicity always wins.

AL

What’s next for the DLT / Blockchain industry?

GS

Companies need to deliver, and companies need to make money to stay in business, therefore if you are under certain time constraints to make something people want and there are still inherent problems in the technology you want to use, you pivot to making things that can improve existing workflows.

This is what you have called “industry workflow tools” in your blog and although some costs may be saved, this doesn’t transform finance any more than the next efficiency technology.  In fact in many ways it exposes us to the same risks as have been seen in the past because privacy and confidentiality are more important than anything else for banks performing capital market operations.

The problem with this thinking is that this does nothing to benefit the consumer except maybe faster transaction times. The customer experience should be a major focus for banks as they already are one of the most hated brands for young couple consumers.

AL

Perhaps some of the cost savings will be passed to consumers, settlement will speed up, and collateral released so that businesses can make better use of working capital.

GS

We all hope so!

AL

Thanks for your time George!

 

Interviewee: George Samman is a blockchain advisor and consultant to global companies and startups as well as Entrepreneur in Residence at Startupbootcamp for bitcoin and blockchain in New York City. George also writes a blog on blockchain technology and use cases at sammantics.com and he can be found on twitter @sammantic

Interviewer: Antony Lewis is a cryptocurrency and blockchain consultant to financial institutions, policymakers, and professional services firms.  Antony lives in Singapore and writes a blog about bitcoins and blockchains, found at www.bitsonblocks.net

 

‘Immutable Me’ A Discussion Paper Exploring Data Provenance to Enable New Value Chains

The ID2020 Annual Summit will bring together industry leaders, NGOs, governments, emerging technology pioneers and cross-industry experts from both the public and private sector. The aim is, together participants will foster a global conversation and build a working coalition, to identify and build the enabling conditions for the creation of a legal digital identity for all individuals at risk.

In advance of the ID2020 all participants were requested to submit a paper on decentralised identity, or specific problems that could be solved via decentralisation or web-of-trust solutions.

The following paper was authored and submitted by George Samman and Katryna Dow for the Web-of-Trust Workshop following ID2020. This Meeco paper, explores the idea of an ‘Immutable Me’ – a step towards individuals having the means to decentralise attributes, claims, verification and provenance of their personal data. George Samman will represent Meeco at ID202o and the Web-of-Trust Conference.

Problem Statement

With the advent of blockchain, is there an opportunity to add a distributed layer between the data value and the consumer of personal data?

Furthermore, does the verification and provenance of the data enable an attestation, provided by a relying party, to eliminate the need to give up Personally Identified Information (PII) at all?

How can we enable people to access all the data they generate with full control of their master record and permission data on their terms using verified attributes without sacrificing their privacy?

“Up until now the power to capture, analyse and profit from personal data has resided with business, government and social networks. What if you and I had the same power?”
– Meeco Manifesto 2012

According to the QUT and PwC Identity 3.0 white paper:

“Developed economies are moving from an economy of corporations to an economy of people. More than ever, people produce and share value amongst themselves, and create value for corporations through co-creation and by sharing their data. This data remains in the hands of corporations and governments, but people want to regain control. Digital identity 3.0 gives people that control, and much more.”

Identity is moving beyond issued instruments like passports, social security cards and ID cards. It is moving towards contextual identity in which “I can prove who I am” (persona) in the context of what I am doing.

Government issued identity instruments are relatively easy to forge. Every day, stolen identities are exploited through organised crime and on-line hacking activities. Conversely, personal attributes, behaviour, social and reputational data is more difficult to forge, in part because it makes up an immutable timeline of our life.

Increasingly the sum of the parts of our digital exhaust and social presence creates a strong identity. However the opportunity for individuals to use this for their own benefit is limited.

Proposal

The movement from User Centric Identity to Self Sovereign Identity is underway and becoming a key trend for the future of how individuals will control their own attributes.

Using blockchain technology to record data provenance, Meeco is working at the forefront of this movement and currently working on a proof of concept that allows individuals to add verification and provenance attestations to their data attributes. This is in addition to the existing permission management of their attributes.

Meeco aims to be blockchain agnostic (since what type of ledger is used will be use case dependent), thus enabling individuals to link provenance to data/attributes to support a range of personas and enable progressive disclosure. This approach also supports the option for individuals to use private, public, permissioned and permissionless chains.

The identity pieces (data and attributes) can be use-case sensitive, thus create context-based personas through unifying only the relevant attributes across different chains.

Personal control is central to increasing the power individuals hold over the range of attributes and claims that make up their identity. Enabling identity markers to be thin sliced, refined and contextual provides added privacy protection. The combination of attribute, verification and provenance provides the capability for data governance to move from data collection of personally identifiable information (PII), to binary pull requests, i.e. over 18 years (yes/no) versus date of birth.

This approach provides protection as the individual solely has the power to bring these attributes together with the added value of verification. For the relying party, the option exists to store the provenance rather than the attribute on public and private blockchains and distributed ledgers, thus providing an immutable audit trail for assurance without the compliance risk of collecting, managing and holding the data.

Why add Provenance?

Provenance refers to the tracking of supply chains and provides a record of ownership via an audit trail, which can be ordinated (the specific order matters). In the case of attributes and claims it is important that the data can point back to a reliable record (the golden record) and that this is shown to be immutable.

It’s important for purposes of integrity, transparency and counterfeiting that this asset and its path be known or provable every step of the way.  A supply chain refers to the creation of a network in which an asset is moved, touching different actors, before arriving at a destination.  It helps bring time and distance together in a manageable way. The tracking of this asset has real value to the actors involved. This equally applies to identity and all the components that make up that identity.

This approach is a pathway to turning data (single attributes) into networked data flows that compound in value and become assets similar to financial synthetics such as asset backed securities (ABS) as much as anything else in the world considered to be an asset such as Norwegian salmon, diamonds, prescription drugs, or Letters of Credit (LOCs) and Bills of Lading (BOLs).

It is important to note that this is not one master identity (record) token that is tokenized and travels through a network, but rather the attributes associated with that identity that are called upon based on context.

In order for data provenance to be effective (from a technology standpoint), it must fulfill certain requirements and characteristics that would upgrade supply chain management and monitoring.  According to Sabine Bauer in his paper titled “Data Provenance in the Internet of Things” they are:

  • Completeness Every single action, which has ever been performed, is gathered
  • Integrity Data has not been manipulated or modified by an adversary
  • Availability The possibility to verify the collected information. In this context, availability is comparable to auditing
  • Confidentiality The access to the information is only reserved for authorized individuals.
  • Efficiency Provenance mechanisms to collect relevant data should have a reasonable expenditure.

Additionally Meeco believe these requirements require the additional characteristics of:

  • Privacy the ability to control the access, permission and visibility of personal attributes
  • Unlinkability a state where this personal data must be secure and not get into the wrong hands
  • Transparency of the chain and total traceability of the transactions, particularly when it comes to actions and modifications.

A Blockchain fulfils all of these requirements.

How Will Meeco Link Data Provenance To Attributes on a Blockchain?

Blockchain allows for Digital Identity 3.0 Quoted from the PWC paper:

“Digital identity 3.0 is a private and integrated master record that exists independently of any immediate commercial or legal context. It empowers people to create new attributes, share these attributes selectively as they connect with others, and create experiences and value beyond what can be predicted”.

Master Record

For preservation of privacy there must be some way to protect the master record and the range of attributes which can be associated with a master record where explicit permission has not been granted.

 

The primary purpose of a master record is to create value specifically for the individual. However, the existence of this verified master record can in return, if permissioned, create significant value for receiving parties; i.e. organisations, enterprises, institutions and other individuals.

It is not intended for the master record will not be stored or visible on the blockchain. The intention is not to permission the master record, but to reference back to its immutable existence. This way the master record can support infinite links to data attributes of association, without linking all the attributes in one chain. This master record will have an anonymous unique identifier that is only known to the owner via private keys.

Once these attributes are created they can be put on a supply chain without the need to share the entire identity only the value that is needed in order to validate the integrity of it.

The Value of Provenance For Privacy Preservation

Tracking the origin and movement of data across a supply chain in a verifiable way is a difficult thing to do. In supply chains stretching across time and distance, all of these items could suffer from counterfeiting and theft. Establishing a chain of custody that is authenticated, time-stamped and replicated to all interested parties is paramount to creating a robust solution.

The problem can be addressed using blockchains in the following way:

  1. When the data is created, a corresponding digital token is issued by a trusted entity, which acts to authenticate its point of origin (the attribute).
  2. Then, every time the data changes hands that is the attributed associated with that identity (persona), the digital token is moved in parallel, so that the real-world chain of custody is precisely mirrored by a chain of transactions on the blockchain.
  3. A tokenized attribute, is an on-chain representations of an item of value transferred between participants. In this case proof of the golden record.

This token is acting as a virtual ‘assertion of identity’, which is far harder to steal or forge than a piece of paper. Upon receiving the digital token, the final recipient of the physical item; whether a bank, distributor, retailer or customer; can verify the chain of custody all the way back to the point of origin.

This digital identity token can also act as a mechanism for collectively recording and notarizing and linking any type of data back to the master record.  A blockchain can provide a distributed database by which all the records and assertions about an attribute are written and linked, accompanied with a timestamp and proof of origin that ties back to the golden record token in a most verifiable way.  An example could be a hash of a record that a certain element of my attribute or claim was verified when engaged in a certain type of action. This distributed database also stops corruption and theft by storing the multiple pieces of our attributes in a highly distributed manner that require the proper keys to open and put back together.

This approach is designed to counter the current problem of how companies collect personally identifying data and then use it to track individuals without explicit or informed consent. The data is used to target individuals with the aim to mold and influence behavior. This current approach of tracing our identity does not afford individuals to control their attributes or the elements that link them, as a result we don’t get to realise the value of, or monetize the data companies collect on us.

How Will The Data Get Stored?

The Multichain blog eloquently describes how data can be optimally recorded on the blockchain and this approach is informing how Meeco is approaching proof-of-concepts:

“In terms of the actual data stored on the blockchain, there are three popular options:

  1. Unencrypted data. This can be read by every participant in the pa