First, the essential difference between blockchain and distributed database
1. Consistent core consensus algorithm and redundant data storage:
Consistent core consensus algorithms and redundant data storage are the most similar ones. Even so, there is still a fundamental difference between the two in terms of technical goals. Among them, the core purpose of the blockchain using these technologies is to build a world that is as decentralized as possible, and the data assets are permanently protected and freely transferred. The core purpose of the distributed database is to build a logical center as much as possible. Provide high performance, low cost, and scalable services.
2.Impossible Triangle:
Both need to face the challenge of the impossible triangle, but the specific challenges that the two need to face are essentially different: the blockchain is facing security, decentralization, and scalability; and the distributed database surface It is the support for the business, the complexity of the project implementation and the hardware requirements.
3. Consistency:
Consistency has different meanings in blockchains and distributed databases: consistency in blockchain systems refers to the ability of multiple nodes to maintain data states together; consistency in distributed database systems refers to The state in which multiple copies are rendered externally.
4.The difference in the security level of the consensus algorithm
The blockchain system solves the Byzantine error. The mainstream algorithms are POW/POS (probability algorithm) and PBFT (deterministic algorithm). Among them, the consensus result of the POW/POS probability class algorithm is temporary. As time goes by or some kind of strengthening, the probability that the consensus result is overturned becomes smaller and smaller, and eventually becomes the de facto result. Byzantine fault-tolerant algorithms often have poor performance and cannot tolerate more than 1/3 of the faulty nodes; while the PBFT deterministic algorithm is irreversible once consensus is reached, that is, the consensus is the final result. Distributed database systems solve non-Byzantine errors or faults. Mainstream algorithms include Paxos and Raft. These fault-tolerant algorithms tend to perform better, process faster, and tolerate fault nodes that do not exceed 1/2.
Second, explore blockchain and distributed database from core values
The core value of the blockchain
The core value of the blockchain is not to provide services to the outside world but to build its own world of data assets. The blockchain world is updated with a state, and the storage is traceable. The main data structures are divided into two categories: transactions and blocks. The specific embodiment is as follows:
The transaction is used to update the world state of the external world-driven blockchain. It contains two types of data: transaction input and transaction output. The transaction input indicates the source of the data asset of the transaction, and the transaction output indicates the destination of the data asset.
The block is used to store transaction data and is mainly composed of a block header and a block body. The block number records the version number, the hash address of the previous block, the Merkle root, the block creation time stamp, and the block workload. The difficulty target and the parameter values used to calculate the target, the block body contains the number of transactions and complete transaction data.
The distributed database core value
The core value of a distributed database is to provide data access services to business systems. The business database is operational-oriented and mainly serves business products and development. The data warehouse is analytical-oriented and mainly serves analysts.
Third, unlock the veil of blockchain and distributed database from the perspective of storage technology
Blockchain
2008 Bitcoin to Blockchain 3.0, the most basic storage technology of blockchain has not changed much. Let me give you an example of the storage principle of Bitcoin. The Bitcoin/blocks/ folder is shaped like the file blk00000.dat in Figure 1. The file is stored in the block data. Each file is about 128M. All the block data is stored in this folder.
The Bitcoin/blocks/index/ folder stores index data for all blocks, using the level/value pairs of the database in leveldb format.
Each block is up to 2M, and the block data is stored in the block file (such as blk00000.dt in Figure 1). The block and block are separated by “magic number” (such as 0xF9BEB4D9 in Figure 3), one A file can store multiple block data, and the file has a maximum limit. If it is larger than 128M, a file (such as blk00001.dat) will be recreated.
Distributed database
The distributed database started around 2005, the first is the wave of NoSQL. The primary problem with these databases is that all data cannot be saved on a single machine, such as HBase/Cassadra/MongoDB. Following the redemption of RDMS, in addition to NoSQL, the RDMS system has also made a lot of efforts to adapt to the changes in the business, that is, the middleware and sub-division schemes of relational databases. Then came the development of NewSQL. From 2012 to 2013, Google published the papers of Spanner and F1, which made the industry see for the first time the possibility of the relationship model and NoSQL’s scalability blending on a large-scale production system.