Autonomys Network Unveiled: Deep Dive into Hourglass Schemes
1 Introduction
1.1 About the research
The purpose of this is to conduct a detailed study of the Autonomys Network, which utilizes the Proofs-of-Archival-Storage (PoAS) consensus mechanism to address key challenges in blockchain technology, such as decentralization, energy efficiency, and reliable data storage. This study will focus particularly on the analysis and description of the Hourglass Scheme, which ensures data security and prevents data compromise within the Autonomys Network.
The objectives of this research are as follows:
- To examine the architecture of the Autonomys Network and explain how it addresses the issue known as the "Farmer’s Dilemma", enabling farmers to participate in consensus by archiving blockchain data.
- To analyze the Hourglass Scheme in terms of its operational algorithms and present the mathematical models used to ensure correct data storage and protection against fraud.
- To explore the algorithms for generating and verifying unique data replicas using pseudorandom permutations, applied within Autonomys Network to prevent data compression and duplication.
- To compare the effectiveness of Hourglass Schemes with alternative consensus approaches, such as Proof-of-Work, Proof-of-Stake, and other storage mechanisms like Proof-of-Space, based on their ability to address the issues of decentralization and energy efficiency.
1.1 The Importance of Decentralization and Energy Efficiency in Blockchains
Decentralization is a fundamental characteristic of blockchain networks as it ensures resilience to censorship, enhances network security, and eliminates reliance on centralized, trusted authorities. In decentralized systems, each node contributes to maintaining consensus and defending the network against attacks. However, many modern blockchains, such as Bitcoin and Ethereum, face challenges related to centralization, despite their foundational principles. This centralization is driven by the increasing computational complexity and resource requirements, which have led to the consolidation of mining power into the hands of a few large entities, such as mining pools and data centers.
Energy efficiency is another critical issue associated with Proof-of-Work (PoW) blockchains. PoW-based mining requires an enormous amount of computational resources to solve complex mathematical problems, leading to excessive energy consumption. For instance, the Bitcoin network consumes as much energy annually as a small country, creating significant environmental and economic concerns. Addressing this challenge is one of the key goals of current blockchain research.
Autonomys tackles both of these issues through its implementation of Proofs-of-Archival-Storage (PoAS) — a consensus mechanism that requires participants to store the blockchain’s history rather than perform energy-intensive computations. This approach offers several advantages:
- Increased decentralization:
Unlike PoW, which encourages the formation of large mining pools, PoAS allows users with standard hardware to participate in the consensus. This reduces barriers to entry and decreases the risk of network centralization.
- Energy efficiency:
Since PoAS is based on data storage rather than computational work, energy consumption is significantly lower than in PoW networks. Autonomys farmers store the blockchain’s history and participate in consensus by utilizing existing disk space, minimizing the costs associated with energy consumption and infrastructure.
Thus, decentralization and energy efficiency have become key factors that define the future of distributed ledger technology. Autonomys Network, with its unique PoAS-based approach, provides a solution that preserves the core principles of decentralization while minimizing environmental impact and reducing the costs of network maintenance.
1.2 Introduction to Proofs-of-Archival-Storage
Proofs-of-Archival-Storage (PoAS) is a consensus mechanism that leverages data storage rather than computational resources to maintain the security and decentralization of the network. Unlike traditional approaches such as Proof-of-Work (PoW) or Proof-of-Stake (PoS), which require significant computational power or capital to participate in consensus, PoAS focuses on utilizing disk space to archive the blockchain’s history.
The core concept of PoAS is that farmers (network participants) are responsible for storing not only the current state of the blockchain but also its entire historical record. Each farmer stores unique, verifiable replicas of the blockchain’s history, and based on this archival storage, they are granted the ability to participate in the generation of new blocks. This ensures that the blockchain’s data is always accessible while simultaneously preventing the concentration of power in the hands of a few participants, as often happens in PoW or PoS systems.
The key components of Proofs-of-Archival-Storage include:
- Blockchain history storage:
Farmers are required to store an archived version of the blockchain’s history. This is not merely a copy of the data, but unique replicas that can be cryptographically verified. As a result, each participant in the network can prove they are indeed storing the data, rather than merely simulating its presence.
- Audit process:
The network periodically initiates audits of farmers to ensure they are actually storing the archived data. This process utilizes pseudorandom access to fragments of the stored data, enabling efficient verification of the storage without needing to transmit the entire dataset.
- Cryptographic uniqueness of replicas:
To prevent duplication and fraud, each farmer generates unique replicas of the blockchain’s history using cryptographic algorithms. These replicas cannot simply be copied from other participants, which enhances the network’s decentralization and ensures an even distribution of data across participants.
- Economic incentives:
Participants are rewarded for correctly storing the data, which incentivizes them to maintain their storage systems and participate in the consensus. This mechanism prevents scenarios where farmers might try to conserve resources by neglecting to store the complete history.
Proofs-of-Archival-Storage addresses the so-called “farmer’s dilemma,” where blockchain participants may prefer to optimize resource usage at the expense of network security. In PoAS, instead of choosing between storing blockchain data and maximizing profits, farmers are rewarded for reliably archiving data. This makes participation in the network more accessible to a broader range of users, as the computational resource requirements are minimal.
Thus, the PoAS concept not only ensures a high degree of decentralization and security but also makes blockchain systems more environmentally sustainable and accessible to a wider audience by minimizing the costs of hardware and energy.
2. Autonomys Network Overview
2.1 Limitations and Challenges of Proof-of-Capacity Consensus Mechanism
The Proof-of-Capacity (PoC) consensus mechanism was introduced as an alternative to traditional energy-intensive blockchain consensus mechanisms, such as Proof-of-Work (PoW). PoC replaces the need for computational power with the use of disk space, allowing for significant reductions in energy consumption. The core idea behind PoC is that network participants (farmers) allocate a portion of their disk space to store data, granting them the ability to participate in the creation of new blocks. However, while PoC offers advantages such as energy efficiency and accessibility for users with limited resources, this approach faces several significant challenges that undermine its effectiveness and decentralized nature.
The primary goal of PoC is to eliminate the reliance on powerful computational devices, which dominate PoW blockchains, and to create a more democratic and decentralized network. Instead of spending enormous resources on solving complex computational puzzles, farmers can utilize their existing disk space, making PoC blockchains more resilient and accessible. However, this goal proves difficult to achieve in practice due to several key challenges:
- The Farmer’s Dilemma:
One of the most serious issues faced by PoC blockchains is the Farmer’s Dilemma, which network participants encounter. Farmers must choose between storing useful data, such as the current state and history of the blockchain, and maximizing their potential rewards by allocating as much space as possible for participating in consensus. Rational farmers are incentivized to optimize their participation for maximum profit, which encourages them to forgo storing useful data in favor of dedicating more space to consensus. This leads to network participants becoming “light clients” who do not store the full history and state of the network, thereby undermining its decentralization and security.
- Centralization via Pools:
Similar to PoW blockchains, PoC is also prone to centralization. Large farming pools with substantial disk space gain more opportunities to generate new blocks, leading to the concentration of power in the hands of a limited number of participants. This contradicts the original goal of decentralizing blockchains and creates issues similar to those found in PoW systems, where the network becomes reliant on large participants, jeopardizing its resilience and democratic structure.
- The Data Storage Problem:
PoC blockchains require participants to store large amounts of data, which becomes increasingly problematic over time. As the blockchain grows, the volume of stored information also increases, and farmers may begin to abandon the storage of blockchain history, focusing only on the current state. This results in the centralization of historical data storage, where new network participants depend on a few large nodes or third-party services for synchronization and access to the history, increasing the risk of centralization.
- Lack of Incentives to Store Full History:
In PoC blockchains, farmers do not have explicit incentives to store the full history of the blockchain, as their primary role is to allocate space for consensus participation. Storing old data does not provide them with any benefits, leading to a situation where a significant portion of the network does not engage in storing the blockchain’s history, relying on a small number of participants, which reduces decentralization and security.
Therefore, while PoC offers an energy-efficient and cost-effective alternative, its architecture faces significant challenges related to decentralization and the reliability of data storage. These issues threaten the long-term sustainability of PoC-based networks, making the development of more robust solutions essential. The Autonomys Network, based on Proofs-of-Archival-Storage (PoAS), addresses these problems by ensuring the storage of the entire blockchain history and enabling consensus participation that supports network decentralization.
2.2 The Farmer’s Dilemma and Its Impact on Decentralization
The Farmer’s Dilemma is a central challenge faced by Proof-of-Capacity (PoC) blockchains, as well as other blockchains that use disk space for consensus. This dilemma arises from the conflict between two primary tasks for farmers: storing useful data, such as the blockchain’s history and current state, and allocating as much disk space as possible for participating in consensus. Rational farmers will always prioritize maximizing their short-term profits, leading them to forgo data storage in favor of increasing their chances of earning rewards for generating new blocks.
2.2.1 The key aspects of the Farmer’s Dilemma are:
- Conflict between data storage and consensus participation:
In PoC blockchains, farmers must allocate disk space to store data in order to participate in the consensus process. However, their rewards are not based on how much useful data they store, but on how much space they dedicate to storing cryptographically defined data used to prove their participation in the consensus. As a result, farmers are incentivized to reduce the amount of space used for storing the current state and history of the blockchain, in order to maximize the space dedicated to the consensus process.
- Lack of incentives to store the full blockchain history:
Historically, PoC blockchains have not provided explicit economic incentives for farmers to store the complete history of the blockchain. Storing older blocks and historical information does not generate additional income, so farmers opt to conserve resources by not engaging in archival storage. This leads to data loss and makes it more difficult for new participants to synchronize with the blockchain’s history.
- Focus on short-term profit:
Rational farmers seek to maximize short-term gains from their participation in the network. This encourages them to abandon or minimize the storage of useful data and instead focus exclusively on maximizing their participation in consensus. This behavior directly undermines the decentralization of the network, as farmers function as “light clients” who do not store the entire blockchain’s information.
2.2.2 Impact on Decentralization
- Centralization of data storage:
When farmers begin to abandon the storage of the full blockchain history, new network participants lose access to the complete information needed to verify the blockchain. This reliance on a small number of “archive” nodes that store the full blockchain history leads to centralization of data storage, which contradicts the core principles of decentralization and openness in blockchains.
- Increased role of farming pools:
To minimize risks and smooth out the randomness of rewards, farmers often join pools where their collective disk space is used to increase the chances of block rewards. However, these pools represent points of centralization within the network. If a pool grows too large, it begins to dominate the network, reducing security and undermining decentralization.
- Dependence on third-party solutions:
In the absence of incentives for farmers to store the blockchain history, users increasingly depend on third-party services, such as centralized nodes or APIs, to access blockchain data. This creates the risk of monopolizing access to data and reduces the transparency of the system.
- Vulnerability to attacks:
Farmers who do not store the full blockchain history are vulnerable to attacks because they cannot independently verify the blockchain’s data. This opens the door for malicious actors to manipulate new participants or execute attacks that disrupt data synchronization.
In summary, the Farmer’s Dilemma in PoC blockchains creates serious threats to the decentralization and security of the network. This issue highlights the need for new consensus mechanisms, such as Proofs-of-Archival-Storage (PoAS), which address this problem and provide economically viable solutions to ensure the complete blockchain history is stored, facilitating equitable participation for farmers and protecting the network from centralization and attacks.
2.3 Autonomys Network Solution: How Proofs-of-Archival-Storage Work
2.3.1 Storing Blockchain History and Its Role in Consensus
A key aspect of the Autonomys Network is the mandatory storage of the full blockchain history. Unlike traditional blockchains, where participants may choose what data to store (e.g., only the current state or a specific part of the history), Autonomys Network requires farmers to store the entire archived history of the blockchain. This provides several important advantages:
- Consensus based on data archival:
In Autonomys Network, consensus is built on Proofs-of-Archival-Storage (PoAS), meaning that farmers participating in consensus must store the entire blockchain history, not just the current state. This prevents network fragmentation, where different participants might store different pieces of data, and ensures that the complete blockchain history is available at all times for new network participants.
- Unique historical replicas:
Farmers do not merely store copies of the blockchain. Autonomys Network uses cryptographic techniques, such as pseudorandom permutations (PRP), to ensure that each farmer holds a unique replica of the data. This adds an additional layer of security, as attackers cannot simply duplicate another farmer’s data to simulate participation in the consensus.
- Audit process and storage proofs:
An important aspect of PoAS is the regular verification of data presence. The network periodically initiates requests to verify that farmers are indeed storing the required data. This is achieved through random audits, where the network requests specific fragments of the blockchain history from farmers. Farmers who cannot promptly provide the requested data are excluded from participating in the consensus, ensuring network reliability and maintaining decentralized storage.
- Protection against data compression:
PoAS also includes mechanisms to prevent data compression. For example, farmers cannot simply store hashes or encoded versions of the blockchain and restore them “on-the-fly.” Through cryptographic algorithms and auditing, farmers are required to store full data, ensuring their active participation in the network and preventing fraudulent behavior.
By using the blockchain history in the consensus process, the network’s resilience is enhanced, as each node holds critical information about the network’s past, which is necessary for synchronizing new nodes or verifying the authenticity of current data. This approach prevents the centralization of historical data storage and reliance on third-party services.
2.3.2 Separation of Roles Between Farmers and Executors
The Autonomys Network introduces a significant distinction between two key participants in the network — farmers and executors. This separation allows the network to remain decentralized while addressing computational load and security challenges.
- Farmers:
The primary role of farmers in Autonomys Network is to store the archival blockchain history and participate in the consensus. Farmers are not responsible for processing transactions or computing blockchain states — their task is focused solely on data storage and providing proofs that they are indeed storing this data. This removes the need for farmers to possess powerful hardware for complex computations, lowering the barriers to entry and enabling a larger number of participants to join the network using standard computers and low-power servers.
- Executors:
Executors are responsible for tasks related to computing blockchain states and processing transactions. They receive transactions, perform the computations necessary to update the blockchain state, and create new blocks reflecting these changes. Executors are required to stake tokens, ensuring the integrity of their actions. If an executor violates the rules or attempts to manipulate the blockchain state, their stake can be forfeited, providing strong motivation for honest behavior.
- Decentralization and load optimization:
The separation of functions between farmers and executors reduces the load on farmers and allows nodes to specialize in specific tasks. This increases the efficiency of the network and decreases the likelihood of performance issues that might arise if all participants were performing both functions simultaneously. This separation also helps achieve higher network throughput, as farmers focus exclusively on consensus and storage, while executors handle transaction processing.
- Independence of consensus and computation processes:
This is a key characteristic of Autonomys. The separation of the consensus process (ordering transactions) from computation (processing those transactions) enhances the security of the network. Farmers and executors operate independently, making coordinated attacks on the network more difficult, as an attacker would need to simultaneously control both data storage and transaction execution.
This clear division of roles between farmers and executors not only improves the network’s efficiency and decentralization but also strengthens its resilience to attacks and manipulation.
2.4 Advantages of PoAS Compared to Other Approaches (Proof-of-Work, Proof-of-Stake)
The Proofs-of-Archival-Storage (PoAS) mechanism used in the Autonomys Network is a unique consensus method that differs from traditional approaches like Proof-of-Work (PoW) and Proof-of-Stake (PoS). PoAS addresses many of the challenges inherent to these systems and offers a more efficient and decentralized solution for modern blockchains.
2.4.1 Energy Efficiency
One of the key advantages of PoAS over PoW is its energy efficiency.
Proof-of-Work requires an enormous amount of computational resources to solve complex mathematical puzzles, leading to significant energy consumption. For instance, the Bitcoin network consumes as much energy as small countries. This immense resource expenditure is often criticized for its environmental impact, as it directly contributes to global carbon emissions.
In contrast, Proofs-of-Archival-Storage relies on the use of disk space, minimizing energy costs. Farmers in the Autonomys Network do not engage in computation but instead use available disk space to store the blockchain’s history. This makes PoAS an environmentally sustainable solution, allowing participation in consensus without the need for massive expenditures on hardware and energy.
2.4.2 Decentralization
Compared to PoW and PoS, PoAS has clear advantages in maintaining and enhancing the decentralization of the network.
Over time, Proof-of-Work has led to the centralization of networks like Bitcoin due to the high cost of hardware and the need for cheap electricity. This has resulted in the concentration of power in the hands of large mining pools and data centers, which can dominate the network and capture most of the block rewards.
Proof-of-Stake also tends toward centralization because control of the network is transferred to those who hold large amounts of tokens. In PoS systems, those with more tokens have more influence over consensus, which can promote capital centralization and dominance by wealthier participants.
Proofs-of-Archival-Storage, on the other hand, emphasizes participation through disk space, which is a more democratic resource. In a PoAS network, farmers can participate in consensus by using available disk space, which is far more accessible than specialized hardware or large capital reserves. This lowers entry barriers, supporting a higher level of decentralization and power distribution.
2.4.3 Lower Barriers to Participation
One of the main barriers to participation in PoW networks is the need for expensive hardware (ASIC devices) and access to cheap electricity. In PoS systems, the primary barrier is the amount of capital required for staking.
Proof-of-Archival-Storage in Autonomys Network significantly lowers these barriers, as only disk space is required for participation, which is accessible to most users, even with ordinary home computers or low-power servers. This allows a broader range of participants to join the network and support its operation without the need for significant investments in equipment or tokens.
2.4.4 Incentives for Data Storage
In PoW and PoS systems, network participants are primarily interested in earning rewards for creating blocks or validating transactions, which often leads to a reduction in the amount of useful data stored, such as the blockchain’s history.
In PoAS, farmers are incentivized to store the full blockchain history, as this is an integral part of the consensus process. This not only increases the reliability of the network but also ensures the preservation of all blockchain data, which is important for verifying the authenticity of transactions and supporting decentralization.
2.4.5 Security and Resistance to Attacks
In PoW and PoS systems, networks are vulnerable to attacks such as the 51% attack, where attackers controlling more than 50% of the computational power or staked tokens can manipulate the network.
In PoAS, a 51% attack is theoretically possible, but executing such an attack would require enormous storage resources, making it extremely costly. Additionally, PoAS uses data storage verification mechanisms through regular audits, further reducing the likelihood of a successful attack on the network.
2.4.6 Data Longevity and Accessibility
One of the drawbacks of PoW and PoS systems is the lack of direct incentives to store the full history of the blockchain. Farmers and validators often reduce the amount of data they store, which creates a risk of losing important blockchain data.
In contrast, Proofs-of-Archival-Storage makes the storage of the blockchain’s history a fundamental element of the consensus. This means that the network preserves the entire history, and new participants can easily access all data for synchronization and transaction verification. This enhances the long-term durability of the network and ensures data availability over time.
3. Theoretical Foundation of Hourglass Schemes
3.1 Introduction to Hourglass Schemes: Data Protection and Fraud Prevention
Hourglass Schemes are a crucial component in ensuring data storage security within the Autonomys Network. These schemes ensure that farmers cannot “cheat” by compressing or restoring data only upon request, instead of storing the complete and authentic version of the blockchain history. The Hourglass Schemes mechanism guarantees that the data is indeed stored in an archived form, providing protection against various forms of storage-related fraud.
The core idea of Hourglass Schemes is to apply time constraints and asymmetric computational transformations that make it difficult for farmers to restore data on-the-fly for audits or compress it to reduce storage space.
3.1.1 Key Aspects of Data Protection with Hourglass Schemes
- Slow encoding:
Hourglass Schemes utilize the concept of slow encoding, which requires farmers to go through a computationally expensive process to encode data before storing it. This process cannot be executed instantly, meaning farmers cannot simply compress the data and restore it on-the-fly during an audit. This ensures that data is indeed archived and not dynamically generated upon request.
- Application of pseudorandom permutations (PRP):
To ensure that each farmer stores unique copies of the data, cryptographically secure pseudorandom permutations (PRP) are used to transform the data before it is stored. This makes it impossible for farmers to simply copy data from another participant in the network, as each farmer’s data storage is unique and requires a specific encoding process.
- Authenticity verification through audits:
Regular audits, which request specific data segments from farmers, provide proof that the farmers are genuinely storing the encoded data. These requests occur randomly, making it difficult for farmers to predict which data will be requested and virtually impossible to generate data on-the-fly.
3.1.2 Fraud Prevention
Hourglass Schemes create a computational constraint for farmers, making it difficult to compress data or manipulate the storage process. Let’s consider some key aspects of fraud prevention:
- Cost of data recovery:
If a farmer decides not to store the data fully and attempts to restore it only upon request, they will need to redo all steps of the encoding process, which takes considerable time. Since the time to respond to audits is limited, farmers who do not fully store the data will not be able to pass the audit in time.
- Economic incentives:
Storing compressed or partially restored data requires significant computational resources for the farmer each time a request is made. This makes such behavior economically disadvantageous, as it is simpler and cheaper to store the data fully than to attempt to recover it for every audit.
- The hourglass principle:
Hourglass Schemes follow the analogy of an hourglass: just as sand slowly flows through the narrow neck of the glass, data passes through a complex, slow transformation. This process sets a minimum time frame for encoding data, which cannot be bypassed, ensuring reliable verification of the authenticity of stored data.
3.2 Mechanism of the Scheme: Slow Data Encoding
The core mechanism of Hourglass Schemes is based on the concept of slow data encoding. This encoding represents a process in which farmers must perform computationally expensive operations to transform the data before storing it. Slow encoding makes it impossible to generate data “on the fly” when an audit is requested, as the process of recovering the data takes more time than the available window for responding. This ensures reliable protection of data from manipulation and fraud.
3.2.1 Principle of Slow Encoding
At the heart of slow encoding is the idea that the process of writing data to disk should be slow and asymmetric relative to the time required to read that data later. In other words, encoding the data is a difficult task requiring significant computational resources, while decoding (recovering) the stored encoded data is a comparatively simple task. This time asymmetry protects the data, as farmers cannot compress or dynamically restore it during the audit process.
3.2.2 Slow Encoding Algorithm
The implementation of slow encoding within Hourglass Schemes follows a series of cryptographic and mathematical steps. The general encoding algorithm can be described as follows:
- Data partitioning:
The original blockchain data (e.g., transaction history) is divided into multiple equally sized blocks D1,D2,…,Dn. Each block will undergo further processing and encoding.
- Application of pseudorandom permutation (PRP):
For each data block, a cryptographically secure pseudorandom permutation π is applied, which is dependent on the farmer’s unique identifier. Let π be a permutation defined on the set of block indices {1,2,…,n}. This permutation transforms each data block into a unique encoded block Dπ(i). The encoding formula is:
where Ci is the encoded block, K is the farmer’s unique encoding key, and PRP is the pseudorandom permutation function. This ensures that all farmers store unique replicas of the same data.
- Time constraints:
The applied transformation must take a significant amount of time to process each data block, ensuring that it is impossible to quickly restore data to respond to audits. The time delay at this stage is enforced by the algorithm, which is optimized to increase encoding time while keeping the complexity of the reverse decoding process low.
- Storing the encoded data:
Once the encoding process is complete, all encoded blocks C1,C2,…,Cn are stored by the farmer on disk. This encoded representation of the data becomes the primary element that the farmer must retain and provide in case of an audit.
3.2.3 Example of Slow Encoding Using SLOTH
One possible implementation of slow encoding is the use of the SLOTH algorithm (slow-time hash function), which is specifically designed to create controlled slow cryptographic transformations. SLOTH is based on solving a complex mathematical problem, such as calculating a square root modulo a prime number. The formula for SLOTH operation can be described as:
where p is a large prime number defining the computational difficulty. This process ensures slow encoding of the data block Di and guarantees that it cannot be reproduced on the fly during an audit.
3.2.4 Mathematical Model of Slow Encoding
Slow encoding can be formalized as a task of minimizing the response time for audits while ensuring data is stored. Let Tencode be the time required to perform the complete data encoding process, and let Taudit be the available time for responding during an audit. The Hourglass Schemes algorithm is designed to always satisfy the condition:
This ensures that farmers who attempt to compress data and only encode it when requested cannot respond within the allotted audit time.
3.2.5 Protection Against Data Compression
Hourglass Schemes also prevent data compression. If farmers attempt to store compressed versions of data and restore them upon request, they will face the challenge of having to repeat all the steps of slow encoding from the beginning. This would make the recovery process too lengthy, leading to audit failure.
3.3 Mathematical Model of Hourglass Schemes: Time Constraints and Asymmetric Computation
Hourglass Schemes are based on the principle of asymmetric computation, where the process of encoding data is significantly more computationally expensive and time-consuming compared to the process of decoding or reading the already encoded data. This key feature introduces a strong time constraint for farmers, preventing them from manipulating the data storage process or recovering the data “on the fly.” The mathematical model of these schemes must take into account both the time required for encoding and the ability to quickly access the data for audits.
3.3.1 Key Elements of the Model
- Encoding Time (Tencode):
This is the time required for the farmer to complete the full data encoding process before storing it. Algorithms employing slow encoding ensure that this time is significant, requiring complex operations such as data permutations, hash function calculations, or cryptographic transformations.
- Decoding Time (Tdecode):
The decoding process, which retrieves the original data from its encoded form, must be significantly faster than the encoding process. This time must be short enough to allow farmers to quickly provide access to their data during an audit.
- Audit Response Time (Taudit):
This is the time allocated for a farmer to respond to an audit request. It is set by the protocol in such a way that farmers cannot re-encode the data within the audit window if they are not storing it in an archived form.
3.3.2 Asymmetric Computation Equation
The main objective of Hourglass Schemes is to establish a balance between data encoding and decoding that makes dynamic data recovery unfeasible. Let Tencode represent the time required for encoding, and Taudit the time available to respond to an audit request. The model is built on the principle that:
This equation indicates that the time needed for re-encoding the data is significantly greater than the time available for responding to an audit request. At the same time, the decoding time T decode must satisfy the condition:
This ensures that farmers who store the data correctly can easily and quickly provide it during an audit, while those attempting to cheat will not be able to re-encode the data within the allotted time.
3.3.3 Asymmetric Computation in Hourglass Schemes
An example of asymmetric computation in Hourglass Schemes is the use of cryptographically secure permutations and other methods that ensure a significant delay during data encoding. Consider a simple case where a one-way hash function is used to encode the data. Let H(x) be a hash function that takes a data block Di and transforms it into the encoded value Ci:
Hashing itself is a relatively fast process, but if the data requires multiple iterations of hashing (e.g., N repetitions), it creates a significant delay:
This process significantly increases the encoding time Tencode, as computing multiple hashes requires substantial computational resources. At the same time, decoding in this case is a trivial task, as it simply involves providing the encoded data Ci upon request.
3.3.4 Time Constraints
The Autonomys Network enforces strict time constraints for performing audits. Let Taudit be the fixed time allocated for a farmer to respond to a request, which is much shorter than the time required for re-encoding the data. An example of time constraints:
These values mean that a farmer who does not store the data correctly and attempts to re-encode it will not be able to do so within the audit window, ensuring their attempt to deceive the system will fail.
3.3.5 Economic Incentives Through Time Constraints
One of the important characteristics of Hourglass Schemes is the economic incentive created by time constraints. If a farmer attempts to bypass the system and avoid storing the data fully, their costs for re-encoding the data in response to a request will be significantly higher compared to the costs of storing the encoded data. As a result, the only rational choice for the farmer is to store the data correctly, making the scheme both secure and economically efficient.
3.3.6 Formalizing the Model
The mathematical model for time constraints and asymmetric computation can be described by the following equations:
where N is the number of hashing iterations or other computations, and D is the amount of data to be encoded.
where C represents the encoded data that the farmer must provide. The decoding process does not involve complex operations, so the decoding time is significantly less than the encoding time.
This is the fixed audit time, which is less than the data encoding time Tencode.
Thus, the asymmetry in computation time makes dynamic data recovery impossible and ensures the security of data in the Autonomys Network.
3.4 Algorithms for File Storage Verification Using Hourglass Schemes
An essential element of Hourglass Schemes is the process of regular audits, which ensure that farmers are genuinely storing the data and not attempting to manipulate the storage process. The verification algorithms in Hourglass Schemes are based on a challenge-response mechanism, where farmers must provide proof of data storage upon request from the network. These algorithms are designed to perform fast and reliable data audits with minimal computational costs for the verifier, while imposing significant computational burden on farmers attempting to cheat.
3.4.1 Key Steps of the Verification Algorithm
The audit algorithms in Hourglass Schemes can be broken down into several key steps, ensuring reliable file storage verification:
Audit initialization:
The network randomly selects farmers and data blocks for verification. This process can be based on a pseudorandom function, making it impossible for farmers to predict which data will be requested. Let the auditor randomly choose an index iii of the data block that the farmer must provide in response to the request.
Challenge generation:
The auditor sends a challenge to the farmer to provide proof of data storage, such as the encoded data block Ci, which was encoded during the slow encoding process. The challenge formula is:
where i is the index of the data block, and σ is a random string used for request authentication to prevent the reuse of old responses.
Response generation:
The farmer must provide the encoded data block Ci that was stored during the slow encoding process. Importantly, the farmer cannot generate this data block “on the fly,” as the encoding process requires significant time. The response may include:
where Ci is the encoded data block, and Proof i is proof that the farmer is storing the data correctly. The proof can be based on hashing or another cryptographic function linked to the original data Di.
Verification: The auditor verifies the received data by comparing it with the expected result. If the farmer is storing the data correctly, the response should match the encoded data held by the auditor or available through verification mechanisms, such as hashing or Merkle Trees.
The verification formula is:
If the proof is correct, the farmer is deemed honest, and their participation in the network continues.
Sanctions: If the farmer cannot provide the correct data or if the response is delayed (indicating an attempt to re-encode data “on the fly”), sanctions may be imposed, such as losing their stake or being excluded from consensus for a certain period.
3.4.2 Proof-of-Retrievability (PoR) Algorithm
One of the main methods used in Hourglass Schemes to verify the existence of data is the Proof-of-Retrievability (PoR) algorithm, which ensures that the data is stored in full. PoR includes the following steps:
Data preparation:
Initially, the farmer divides the data into blocks and applies hash functions to generate tags. These tags are stored separately and used during verification.
For each data block Di, a tag Ti is computed using a cryptographic hash function:
where H is a secure hash function that generates the data tag.
Data request (Challenge):
During the audit, the farmer receives a request to provide a specific data block and its corresponding tag:
where i is the index of the requested block, and σ is a random value to ensure the uniqueness of the request.
Response:
The farmer provides the data block and its corresponding tag:
where Di is the data block, and Ti is the corresponding tag.
Data verification:
The auditor verifies the response by computing the hash of the provided data block and comparing it with the tag:
If the hash matches the tag, the farmer successfully passes the audit.
3.4.3 Proof-of-Storage (PoS) Algorithm
Another method of verification is Proof-of-Storage (PoS), where the farmer provides evidence that they are storing the archived data over an extended period. In PoS, storage verification involves providing multiple sequential responses to requests, which proves long-term data storage.
Initialization:
The network sends the farmer a request to provide several data blocks at random intervals.
Responses:
The farmer responds to each request by providing data blocks and corresponding proofs, similar to the PoR algorithm.
Consistency verification:
The auditor checks the consistency of the provided data and tags for each block. If the farmer correctly responds to several requests over a specified period, this proves that the data has been stored continuously.
4. Integration of Hourglass Schemes into Autonomys Network
4.1 Application of Hourglass Schemes for Blockchain History Storage and Verification
The Autonomys Network integrates Hourglass Schemes as a crucial element for ensuring the security of storing and verifying the entire blockchain history. This scheme is embedded into the protocol to ensure that farmers not only allocate disk space but also genuinely store the archived blockchain data over long periods. This is achieved through a combination of slow encoding, cryptographic methods, and regular audits, which allow the network to verify that farmers are properly storing the data.
4.1.1 Principles of Hourglass Schemes Integration into Autonomys
- Storage of the full blockchain history:
One of Autonomys’s key objectives is to ensure decentralized and reliable storage of the entire blockchain history. Unlike traditional blockchains, where participants may choose which data to store, Autonomys Network requires the full transaction history to be mandatorily stored. This addresses the issue of centralized storage and reliance on third-party services for data access. Farmers must store unique replicas of the blockchain’s history, and Hourglass Schemes ensure that they meet this requirement.
- Data protection through slow encoding:
Every farmer in Autonomys applies slow encoding to the data, which prevents manipulation or compression attempts. This encoding process, as described earlier, involves cryptographically secure transformations that require significant time. Farmers cannot re-encode the data “on the fly” when requested, making fraud both economically and computationally unfeasible.
Each data block Di stored by the farmer undergoes slow encoding using functions such as multiple hashing or pseudorandom permutations. This slow process guarantees that the data has been genuinely archived and cannot be dynamically generated upon request.
For instance, data Di undergoes hashing with multiple iterations:
where Ci is the encoded data block that the farmer must store and provide during an audit.
- Regular audits:
To verify that farmers are indeed storing the data, the network conducts regular audits. This is a key part of integrating Hourglass Schemes into Autonomys. The requests for data are random, and farmers cannot predict which data block will be requested. This makes dynamic data recovery impossible and forces farmers to store the data persistently.
In each audit, the network selects a random block index i and sends the farmer a request to provide this block and proof of its authenticity. The farmer must respond within the allotted time frame Taudit by providing the encoded data block Ci and a cryptographic proof Proof i.
The response time is limited and is always shorter than the time required for the farmer to re-encode the data if they do not store it. This is mathematically described by the equation:
Ensuring data integrity through Merkle Trees:
To provide additional security and data integrity verification in Autonomys, data structures such as Merkle Trees are used. Farmers can use these trees to store hashed values of data blocks. When the network requests storage proofs, farmers provide the corresponding data block and its path in the Merkle Tree, allowing the auditor to quickly verify the data’s integrity and authenticity.
Let’s assume we have a set of data blocks D1,D2,…,Dn, for which a Merkle Tree is constructed. The root hash Hroot is stored in the blockchain, and the hashes of individual blocks H(Di) are stored locally by the farmer. During an audit, the farmer provides the data block Di and its hash H(Di), along with the proof path through the Merkle Tree:
This allows the auditor to verify the authenticity of the data without needing to download the entire blockchain history.
- Pseudorandom permutations for replica uniqueness:
To prevent simple data copying from other farmers, Autonomys applies pseudorandom permutations. Each data replica stored by the farmer is unique and created using cryptographic transformations. This ensures that farmers store their own data and not duplicates of other participants’ storage.
4.1.2 Example of Hourglass Schemes Integration Algorithm
The algorithm for integrating Hourglass Schemes into Autonomys can be outlined as follows:
- Data storage initialization:
The farmer receives the blockchain data D1,D2,…Dn and applies slow encoding.
For each block Di, an encoded block Ci is generated using multiple hashing or another cryptographic function.
The encoded data blocks are stored on the farmer’s disk, and their hashes can be used to build a Merkle Tree.
- Audit initialization:
The network randomly selects farmers for audit and requests specific data blocks.
For each selected block Ci, the farmer must provide the encoded data block and proof of its integrity through the Merkle Tree or other mechanisms.
- Verification and validation:
The auditor verifies the block Ci and its proofs provided by the farmer. If all proofs are correct, the farmer is considered honest.
If the farmer fails to provide the data in the allotted time or the data is incorrect, the farmer is subject to sanctions, such as loss of stake or temporary exclusion from the consensus.
4.2 Pseudorandom Permutations and History Encoding to Prevent Data Compression
Pseudorandom permutations are a critical component of the Hourglass Schemes in the Autonomys Network, providing protection against data compression and manipulation by farmers. The purpose of pseudorandom permutations is to guarantee the uniqueness and authenticity of encoded data, making it impossible to duplicate or tamper with the stored data. Additionally, it prevents attempts to compress the data, where farmers might compress the data to reduce storage volumes.
4.2.1 Key Principles of Pseudorandom Permutations
- Uniqueness of data replicas:
Pseudorandom permutations (PRP) ensure that each data replica stored by a farmer is unique. Even if two farmers store the same portion of blockchain history, the result of encoding using PRP will differ, preventing farmers from sharing or duplicating stored data.
- Prevention of data compression:
Since each farmer receives a unique version of the data through pseudorandom permutation, data compression becomes impractical or impossible. Even if a farmer tries to compress the data, they will not be able to dynamically restore it to its original form, as the data will have been encoded and permutated in such a way that its original structure becomes obscure.
4.2.2 Mathematical Model of Pseudorandom Permutations
Pseudorandom permutations can be mathematically described as a function that reorders data indices based on a secret key K, available only to the farmer. Let D1,D2,…,Dn represent the set of original blockchain history data blocks. The pseudorandom permutation π determines a new arrangement of the blocks:
Thus, after applying the permutation, the data DiD_iDi is written into a new block Cπ(i). This ensures the permutation of all data blocks according to the key K, making the data structure unique for each farmer:
Where Encode is the encoding function applied to each data block. For example, a hash function or another cryptographic algorithm can be used. The permutation π depends on the key K, which is unique to each farmer.
4.2.3 Pseudorandom Permutation Algorithm
The algorithm for pseudorandom permutation in Autonomys can be described as follows:
- Initialization:
The farmer receives the blockchain history data, divided into blocks D1,D2,…,Dn, and a unique key K, which will be used for permutating the data blocks.
- Applying pseudorandom permutation:
For each data block, the farmer calculates a new position in the data array using the permutation π(K). This can be implemented using a cryptographic hash function H, which depends on the key K:
where i is the index of the block, K is the farmer’s secret key, and nnn is the total number of data blocks. This provides a unique new position for each data block.
- Data encoding:
After applying the permutation, each data block Di undergoes an encoding process, which may include hashing, cryptographic signatures, or other protection mechanisms. The encoded data block is stored at the permutated index:
Thus, each data block is written to a new unique location in the storage.
- Data storage:
The farmer stores the encoded blocks C1,C2,…,Cn on their disk. These blocks represent a unique encrypted and permutated version of the blockchain’s history.
4.2.4 Advantages of Using Pseudorandom Permutations
- Protection against duplication: Since each farmer applies a unique pseudorandom permutation based on their key K, even if two farmers store the same portion of the blockchain history, their data replicas will differ. This eliminates the possibility of data duplication between farmers.
- Prevention of compression: Attempts to compress the data become impossible due to the complexity of the data structure after applying permutations. Farmers cannot compress the data in a predictable order and restore it “on the fly,” as this would require re-encoding and re-permutating the data, which cannot be performed quickly.
- Complexity of recovery: If a farmer attempts to recover the original data without storing all the data blocks in archived form, they would need to perform the entire permutation again, which incurs computational costs. The time required to perform the permutation exceeds the time allocated for an audit, making fraud impossible.
4.2.5 Example of Pseudorandom Permutations in Practice
Suppose a farmer is required to store 10 data blocks D1,D2,…,D10. By applying a pseudorandom permutation with the secret key K, the farmer obtains a new arrangement of the data blocks:
Thus, data block D1 is moved to the third position, block D2 to the first position, and so on. After the permutation, the farmer encodes each data block and stores it in the new arrangement. This ensures the uniqueness and continuity of data storage.
4.3 Description of Algorithms for Data Decoding and Verification
The algorithms for decoding and verifying data in Hourglass Schemes are key components for ensuring secure data storage and integrity within the Autonomys Network. Once data has undergone pseudorandom permutation and encoding, it must be decoded and verified during an audit to ensure its correctness and authenticity. These algorithms guarantee that farmers are genuinely storing the data, rather than dynamically regenerating or manipulating it during the verification process.
4.3.1 Key Elements of Decoding and Verification Algorithms
- Data decoding:
Decoding is the reverse process of encoding, which allows the recovery of the original data from encoded blocks. In Hourglass Schemes, farmers are required to store archived data, and the decoding process must be significantly simpler and faster than encoding, ensuring the possibility of quick data verification during an audit.
- Data authenticity verification:
The verification of data is done by checking its integrity and matching it with the original data stored in the network. This is achieved through the use of hash functions, cryptographic proofs, and data structures such as Merkle Trees.
4.3.2 Data Decoding Algorithm
The data decoding process includes the following key steps:
- Receiving a request for decoding (audit):
When the network conducts an audit, the farmer receives a request to provide the encoded data block Ci. This request occurs randomly, and the farmer cannot predict which data block will be requested, which eliminates the possibility of dynamically restoring data.
- Inverse pseudorandom permutation:
Since the data was encoded and permuted during storage, the farmer applies the inverse permutation to retrieve the original data index. Let π⁻¹(i) be the inverse permutation for the block Ci, then the farmer calculates the original data index:
where j is the index of the encoded block in storage. This step allows the farmer to access the original data Di, which was encoded.
- Inverse encoding (decoding):
Once the farmer finds the original data index, they apply inverse encoding to recover the original data block Di from the encoded block Ci. The decoding formula depends on the encoding algorithm used (e.g., hashing or encryption):
where Decode is the decoding function corresponding to the used algorithm.
- Response to the request:
The farmer provides the decoded data Di and proofs of its authenticity to confirm that the data was indeed archived and is being stored correctly.
4.3.3 Data Verification Algorithm
Once the farmer provides the decoded data, the network must verify its authenticity and integrity. The data verification algorithm may include the following steps:
- Hash function verification (Proof-of-Storage):
One method of verifying data is through hash function verification. If a hash of the data block Di was generated during encoding, the auditor computes the hash of the provided data and compares it with the original hash stored in the network:
where TiT_iTi is the tag obtained during the encoding stage, and H(Di) is the hash of the provided data.
- Merkle Tree verification:
If a Merkle Tree structure was used during encoding, data verification is performed by reconstructing the path through the tree. Let H(Di) be the hash of the data block, and Proof i the proof of the path through the Merkle Tree, then the auditor checks whether the path from the data block Di to the root hash is correct:
If the provided Merkle Tree path is valid, this proves that the data has not been altered, and the farmer is indeed storing it in full.
- Time constraints for verification:
Since audits in Autonomys occur within limited time frames, the farmer must decode and provide the data before the audit time Taudit expires. The time for verification must be sufficient to perform all necessary steps but still shorter than the time required for the farmer to re-encode the data. This ensures that farmers cannot dynamically restore the data.
- Verification of consecutive blocks:
In some cases, the network may request the farmer to provide several consecutive data blocks. This is necessary to check the continuity of storage and protection against partial data loss. In such cases, the farmer must decode and provide several blocks Di,Di+1,…,Di+k, and the auditor verifies their integrity using similar hashing or Merkle Tree methods.
4.3.4 Example of Algorithm Operation
Suppose a farmer is storing encoded data blocks C1,C2,…,Cn, and the network initiates an audit to verify block C5. The auditor sends a request for data block D5 and its proof.
- The farmer uses the inverse pseudorandom permutation to find the index i=5 and recovers the original data D5, applying the decoding algorithm.
- The farmer’s response includes the data block D5 and its hash H(D5), along with the Merkle Tree path if necessary.
- The auditor computes the hash of the provided data and compares it with the stored tag or verifies the Merkle Tree path. If the data is correct, the farmer successfully passes the audit.
4.4 Use of Proof-of-Replication (PoR) for Additional Data Integrity Verification
Proof-of-Replication (PoR) is a mechanism that ensures that a farmer not only stores data but also stores a unique copy of that data, specifically created for them. In the context of the Autonomys Network and Hourglass Schemes, PoR is used for additional data integrity verification, guaranteeing that farmers are storing unique, archived copies of the blockchain history rather than duplicating data from other farmers or using compressed versions to bypass audits.
PoR extends the standard Proof-of-Storage (PoS) mechanism by adding cryptographic verification of data uniqueness and integrity, eliminating the possibility of manipulation in the storage process.
4.4.1 Key Principles of Proof-of-Replication
- Unique data copies:
The main feature of PoR is the requirement that farmers store a unique copy of the data, specifically created for them. These data copies differ for each farmer due to the application of pseudorandom permutations and other cryptographic transformations. This ensures that farmers cannot simply copy data from other network participants but must store their own, uniquely encoded versions of the data.
- Protection against compression and duplication:
PoR prevents farmers from compressing data or storing shared replicas of data with other network participants. Since each copy of the data is unique and verified through the PoR mechanism, farmers cannot use standard compression techniques or regenerate data “on the fly.” This makes compression and fraud strategies unprofitable for storing data.
- Proof of proper storage:
PoR enables the network to verify that farmers are storing the original data rather than compressed or regenerated replicas. This is achieved by checking not only the fact of storage but also cryptographic proofs that the farmer is storing a complete, unique version of the data.
4.4.2 Mathematical Model of Proof-of-Replication
Proof-of-Replication can be described using mathematical functions that confirm the farmer is storing unique data. Let’s assume we have the original data D1,D2,…,Dn that the farmer must encode and store using pseudorandom permutations and slow encoding. For each farmer node, a unique replica of the data is created:
where π(K,Di) is a pseudorandom permutation of the data, dependent on the unique key K, belonging to the farmer. This key generates a unique order for storing the data, ensuring that duplication or shared storage with other farmers is impossible.
The PoR algorithm requires farmers to prove the existence of these unique data replicas by providing corresponding cryptographic proofs during an audit.
4.4.3 Proof-of-Replication Algorithm
The Proof-of-Replication algorithm in the context of Autonomys and Hourglass Schemes includes the following steps:
- Initialization of a unique data replica:
When the farmer receives the original data D1,D2,…,Dn, they apply cryptographically secure permutations π(K), based on their unique key KKK. These permutations ensure that the order of storing data blocks C1,C2,…,Cn differs for each farmer. The encoded data is then stored by the farmer.
- Generation of data tags:
For each data block, the farmer generates a tag Ti using a hash function dependent on the original data block Di and the unique key K:
These tags are used to prove that the farmer is storing the data correctly and in the correct order.
- Proof-of-Replication audit:
During the audit, the network requests several data blocks and their tags from the farmer. For example, to verify data blocks Di,Dj, the farmer must provide not only the data blocks but also their tags Ti,Tj and proofs that these tags correspond to the unique key of the farmer’s node.
- Verification of Proof-of-Replication:
The network verifies the correctness of the provided data and tags using the following verification function:
If the verification is successful, it proves that the farmer is storing unique data replicas that have been correctly encoded.
4.4.4 Advantages of Proof-of-Replication
- Elimination of duplication:
Thanks to PoR, farmers cannot store identical copies of data with other farmers. Each data replica is unique, making it impossible to use a shared database to circumvent storage requirements.
- High level of security:
The uniqueness of each data replica significantly increases storage security. Even if farmers attempt to compress or regenerate data, they must go through all the stages of encoding and data permutation, making such an attempt unprofitable.
- Efficient verification:
The PoR mechanism does not require downloading all the data for verification. Instead, the network can request a few data tags and their proofs from the farmer, significantly reducing the load on the network and participants.
4.4.5 Example of Proof-of-Replication Algorithm Operation
Consider a scenario where a farmer stores a unique copy of the data D1,D2,…,Dn, encoded using pseudorandom permutation. The network initiates an audit to verify blocks D3 and D7. The farmer must provide not only these data blocks but also the corresponding tags T3 and T7, which were generated using their unique key K.
The network checks that the tags T3 and T7 are correct and correspond to the hash functions H(K,D3) and H(K,D7). If the verification is successful, the farmer confirms that they are storing unique data.
5. Algorithms and Mathematical Models of Hourglass Schemes
5.1 Algorithm for Generating Pseudorandom Permutations
Pseudorandom permutations (PRP) play a key role in Hourglass Schemes to ensure the uniqueness of the data stored by each farmer. These permutations create individual versions of data archives for each network participant, ensuring that data duplication is impossible and preventing fraud. The main goal of the PRP algorithm is to transform the original data into a unique but verifiable storage order, determined by the farmer’s secret key.
5.1.1. Key Principles of the Pseudorandom Permutation Generation Algorithm
- Individuality of replicas:
Each farmer is assigned a unique key K, which is used to generate data permutations. This key makes the farmer’s data replica unique, as no other farmer can reproduce the same storage order without that key.
- Verifiability:
Although the permutations are pseudorandom, they are verifiable. This means that anyone can verify the correctness of the data permutation with the key K, but no one other than the farmer can reproduce it without the key.
- Prevention of duplication:
The use of a unique key to generate the permutation prevents data duplication among farmers. Even if they store the same blockchain history, the data storage order will differ, preventing shared use of the same dataset.
5.1.2 Description of the Pseudorandom Permutation Generation Algorithm
The algorithm for generating pseudorandom permutations can be described as follows:
- Data initialization:
Let’s assume we have a dataset D1,D2,…,Dn, representing blockchain history data blocks that need to be encoded and permuted for storage.
- Generation of a unique key K:
The farmer generates a unique cryptographic key K, which will be used to generate the permutation. This key can be the result of an encryption or hashing process and is only known to the farmer. In some cases, this key can be linked to the farmer’s identifier.
- Pseudorandom permutation:
For each data block Di, the farmer applies a pseudorandom permutation dependent on the key K. The permutation can be computed using a cryptographic hash function or a PRP algorithm such as AES in permutation mode. The formula to calculate the index π(i) can be as follows:
where H is a cryptographic hash function, K∥i is the concatenation of the key and the data index, and n is the total number of data blocks. The result is a new position for the data block Di, determined by the permutation.
- Encoded data:
After applying the permutation, each data block Di is moved to a new position π(i) and encoded. Encoding can be performed using hashing or encryption. The formula for the encoded data block is:
Thus, the data block Di is encoded and moved to a new position Cπ(i), according to the result of the permutation.
- Data storage:
The encoded data C1,C2,…,Cn is stored by the farmer on the disk in a new order, determined by the permutation. This data represents unique archived copies, which cannot be restored without knowing the key K and the sequence of permutations.
5.1.3 Mathematical Model of the Permutation
The mathematical model of a pseudorandom permutation can be formalized using the PRP function. Let π(K) be the pseudorandom permutation function that maps the indices of the original data blocks D1,D2,…,Dn to new storage indices:
The permutation is deterministic for each key K, but pseudorandom from the perspective of an observer without the key. Each farmer node uses a unique key K, leading to different permutation results for the same data across different farmers.
Example of the Permutation Generation Algorithm
Suppose a farmer has 5 data blocks D1,D2,D3,D4,D5, and they must encode and store them using the unique key K. The farmer applies the hash function H to each data block:
As a result, each data block is placed in a new location, and the farmer then applies encoding to each, storing them in a unique encrypted form.
5.2 Formal Proofs of the Correctness of Hourglass Schemes
Hourglass Schemes in the Autonomys Network ensure the security of data storage through a combination of slow encoding, pseudorandom permutations, and audit mechanisms. To guarantee that these schemes operate correctly and are resistant to fraud, formal proofs must be provided using mathematical models and cryptographic principles. These proofs confirm that farmers cannot compress, restore, or alter data without being detected during an audit.
5.2.1 Key Components of the Proof of Correctness:
- Asymmetric computation between encoding and decoding processes ensures that farmers cannot dynamically restore data “on-the-fly” during an audit request.
- Pseudorandom permutations guarantee the uniqueness of stored data, eliminating the possibility of duplication between different farmers.
- Slow encoding protects data from compression, increasing computational complexity for those trying to dynamically restore data.
- Cryptographic proofs provide reliable methods for verifying the integrity and authenticity of stored data during audits.
Proof 1: Asymmetry Between Encoding and Decoding Data
The primary goal of slow encoding in Hourglass Schemes is to create a time gap between the encoding process and the ability to decode data, preventing dynamic restoration of data during an audit. Formally, the correctness of this mechanism can be demonstrated through the following inequality:
where Tencode is the time required to perform the full encoding process, and Tdecode is the time required for quick data decoding.
To prove this inequality, assume that data encoding is performed using a cryptographic hash function or other mathematically complex operation with N-times iterations. For example:
Each hashing iteration requires time TH, and the total data encoding time can be estimated as:
On the other hand, decoding the data, i.e., verifying the hashed data or simply transmitting the encoded block Ci, takes significantly less time:
where Ttransmit is the time to transmit or provide already encoded data, which is much shorter than Tencode. Therefore, if N is sufficiently large, the inequality holds, and farmers cannot re-encode the data during an audit, ensuring data security.
Proof 2: Uniqueness of Storage Through Pseudorandom Permutations
To prevent data duplication among farmers, Hourglass Schemes use pseudorandom permutations dependent on each farmer’s unique key K. The proof that farmers store unique data replicas lies in the fact that for each farmer, using their own key K, the data permutation π(K) will be unique. Formally, this can be expressed as:
Let π(K) be the pseudorandom permutation that maps the indices of the original data to new storage indices. If farmers have different keys K1 and K2, their data permutations will differ, and a farmer with key K1 cannot reproduce the permutation of K2, as K2 is unknown.
Thus, each data replica is unique, and farmers cannot duplicate or jointly store data, protecting the system from compression and fraud. This property is guaranteed by the cryptographic robustness of the PRP function, which prevents the prediction or reproduction of the permutation without knowledge of the key.
Proof 3: Protection From Data Compression
Slow encoding in Hourglass Schemes also prevents farmers from using data compression to save space. To compress data, a farmer would need to re-execute the encoding process for each audit request, which would take much longer than the time allowed to respond to an audit request Taudit.
Let Tcompress be the time required to compress and restore data on request. If a farmer chooses not to store full data but uses a compressed version, they would first need to decompress the data and then perform the encoding process for presentation during the audit. The total time required to perform these operations will be:
For the scheme to be secure, this time must exceed the time allowed for the audit:
This equation guarantees that farmers will not be able to use data compression and will be forced to store data in an encoded form to pass audits successfully.
Proof 4: Correctness of Audits and Verification
The correctness of the audit mechanism in Hourglass Schemes relies on cryptographic proofs such as Proof-of-Replication (PoR) or Proof-of-Storage (PoS). For each farmer node undergoing an audit, the network randomly requests several data blocks and verifies them using cryptographic tags TiT_iTi that were generated during the encoding process.
Verification is carried out as follows:
The farmer provides the data block Di and the tag Ti, which was generated using the hash function during the encoding process:
The network checks the hash of the data against the tag:
If the check is successful, the farmer confirms that they store the data correctly. If the data does not match the tag, the farmer fails the audit, and sanctions may be imposed.
5.3 Time Constraints and Their Impact on Security
Time constraints in Hourglass Schemes play a central role in ensuring the security of the Autonomys Network. These constraints ensure reliable data storage audits and prevent manipulation attempts such as on-the-fly data restoration or the use of compressed data. The mechanism of time constraints establishes strict time limits for farmers, who must provide proof of data storage in response to an audit request within the set time frame. If a farmer fails to respond within the allotted time, it indicates that they are likely attempting to restore the data dynamically or manipulate the storage process.
5.3.1 Key Principles of Time Constraints
- Response Time Limitation:
The time allocated to a farmer to respond to an audit request (audit time) is strictly limited. This time must be long enough for farmers who are storing data correctly to quickly provide proof of storage but short enough to prevent those attempting to restore data dynamically from responding.
- Data Recovery Complexity:
For a farmer trying to deceive the system, re-encoding or compressing the data will take time. The time constraints ensure that the farmer cannot perform these operations within the allotted audit time.
- Asymmetry Between Encoding and Auditing:
As discussed earlier, the process of encoding data is significantly more complex and time-consuming than decoding or providing data. Time constraints are set in such a way that a farmer cannot re-encode the data within the audit window.
5.3.2 Mathematical Model of Time Constraints
To ensure the effectiveness of time constraints, an inequality must be established that guarantees a farmer not storing data correctly cannot complete re-encoding or restoration within the audit period. Let:
- Tencode be the time required to encode the data;
- Trestore be the time required to restore the data (including re-encoding or decompression);
- Taudit be the time allotted for the farmer to respond during the audit.
For security, the time constraints must satisfy the following condition:
This ensures that the time needed for a farmer to restore or re-encode data is significantly greater than the audit time. Meanwhile, farmers who store data correctly can provide it within Taudit without the need for re-encoding.
5.3.3 Impact of Time Constraints on Security
- Preventing On-the-Fly Data Recovery:
One of the key aspects of security is protection against attempts to compress data and restore it upon request. Time constraints ensure that a farmer who chooses to compress data cannot decompress and re-encode it in time for the audit. Formally:
This equation confirms that a farmer cannot compress data, as restoring it would take significantly longer than the time allotted for the audit.
- Protection from Encoding Manipulation:
If a farmer tries to avoid storing complete data and instead encode it on-the-fly, they will need to perform full data encoding during the audit request. However, time constraints ensure that the farmer cannot perform encoding quickly enough to meet the audit time:
This inequality means that a farmer who does not store encoded data cannot pass the audit.
- No Security Compromises:
Time constraints eliminate any compromises in security. If a farmer attempts to use partially stored or compressed data, the time required to restore it would be too long, making such manipulations computationally and economically unfeasible. As a result, time constraints incentivize farmers to store data in full form.
5.3.4 Example of Time Constraint Impact
Suppose the audit time is Taudit = 10 seconds. A farmer storing data correctly can provide the encoded data block and proof of storage within this time, as the process of decoding and providing proof takes much less time. Let the decoding time be Tdecode = 1 second, which is significantly less than Taudit.
However, if the farmer decides to compress data or use dynamic recovery, they would need to decompress and re-encode the data. Let this take Tcompress + Tencode = 30 seconds. Clearly, the farmer cannot restore the data and provide it in time, resulting in audit failure.
5.3.5 Balancing Security and Efficiency
Time constraints also play a role in optimizing security without excessively burdening farmers. The audit time must be chosen carefully: short enough to prevent manipulation but long enough to allow correct data storage to be provided by honest farmers. This is achieved by fine-tuning the time frames, considering farmers’ computational capabilities and the complexity of the data.
5.4 Examples of Calculations for System Delay Checks
To assess delays in the system and ensure that farmers cannot dynamically restore data or manipulate the coding process during audits, Hourglass Schemes use calculations based on time constraints. These calculations help determine how long it takes to perform various operations, such as encoding, decoding, compression, and data transmission. The assessment of delays demonstrates that farmers must store complete data; otherwise, they cannot pass the audit.
5.4.1 Key Components of Delays
- Encoding Time (𝑇encode):
This is the time required for the farmer to perform the full process of encoding the data. It includes operations like hashing, permutations, and encrypting the data. 𝑇encode is typically the largest delay in the system, as the data encoding process is intentionally slow to prevent the possibility of quickly restoring data during an audit.
- Decoding Time (𝑇decode):
Decoding time refers to the time required to restore the original data from the encoded form. It should be significantly shorter than encoding time, allowing farmers to quickly provide proof of storage during audits.
- Transmission Time (𝑇transmit):
This is the time needed to send data or proof to the network after decoding it. It depends on the data size and network bandwidth.
- Restoration Time (𝑇restore):
If a farmer tries to compress data or dynamically restore it during an audit request, they will need to decompress and re-encode it. This time includes operations related to restoring compressed data and re-encoding it.
5.4.2 Examples
Example 1: Encoding Time Check
Assume a farmer needs to encode a 1 MB data block using delayed hashing with 𝑁 iterations of a hash function 𝐻. The time for one iteration of hashing, 𝑇𝐻, is 10 ms.
The total encoding time for the data can be calculated as:
If 𝑁 = 1000 iterations of hashing, then:
Thus, the time required to encode one data block is 10 seconds.
Example 2: Decoding and Transmission Time Check
After the data has been encoded, the farmer must provide it when an audit request is made. Suppose the farmer needs to send a 1 MB data block over a network with a bandwidth of 1 MB/s. The transmission time for the data is calculated as:
The decoding time, 𝑇decode, is generally much smaller than the encoding time. For instance, if the farmer simply provides the encoded data without the need for re-encoding, the decoding time can be considered negligible compared to encoding.
Thus, the total decoding and transmission time is:
Example 3: Data Restoration Check
Now, consider a situation where the farmer tries to compress data and restore it “on-the-fly.” Assume the farmer compresses data with a 2x compression ratio, reducing the size from 1 MB to 500 KB. The time to decompress the data is 2 seconds, and then the farmer must perform the full encoding process.
The total restoration time is:
In this case, the farmer would need 12 seconds to restore and encode the data, which is much longer than the time allocated for the audit. If the audit time is 𝑇audit = 5 seconds, the farmer will not be able to restore the data in time and will fail the audit.
Example 4: Time Constraint Check
Suppose the audit time set by the Autonomys Network is 𝑇audit = 5 seconds. This time should be sufficient for an honest farmer to decode and transmit the data but too short for a farmer who attempts to restore or re-encode the data.
- For the honest farmer, the response time 𝑇response = 1 second, which is less than the audit time, so the farmer successfully passes the audit.
- For the farmer attempting to compress and restore data, 𝑇restore = 12 seconds, which exceeds the audit time, so the farmer cannot restore the data in time.
Thus, time constraints effectively prevent fraud and ensure that farmers store data properly.
5.5 Compression Prevention Algorithm and Response Time Optimization
One of the primary threats to data storage systems like Hourglass Schemes in the Autonomys Network is farmers’ attempts to use data compression to reduce storage volumes and then dynamically restore the data during an audit. To prevent this, the protocol implements an algorithm that not only eliminates data compression but also optimizes the farmer’s response time to audit requests. This algorithm ensures that farmers store the data in full and cannot manipulate the data recovery process to bypass the system.
5.5.1 Key Principles of the Compression Prevention Algorithm
- Direct Access to Data:
The algorithm assumes that data should be directly accessible to farmers without requiring restoration or decompression. Otherwise, the farmer will not be able to provide the data within the audit time Taudit.
- Restriction on Data Restoration:
The time required for data restoration or decompression far exceeds the audit time. This makes it unprofitable for farmers to compress data since they will not meet the time limits for the audit.
- Cryptographic Data Binding:
The algorithm ensures cryptographically bound data, making it impossible to compress without significant information loss. This is achieved using hash functions, pseudorandom permutations, and other cryptographic mechanisms that complicate compression.
- Response Time Optimization:
For farmers who store data correctly, the algorithm optimizes the time for decoding and data transmission to ensure the fastest possible response during an audit.
5.5.2 The Data Compression Prevention Algorithm
The compression prevention algorithm in Hourglass Schemes can be divided into the following steps:
- Initialization of Cryptographically Bound Data:
The farmer receives data D1,D2,…,Dn that needs to be archived and encoded. Before saving to disk, each data block undergoes a hashing process using a cryptographic hash function H.
For each data block Di, a hash is computed:
These hashed data are stored by the farmer and become inaccessible for compression since any change or compression will break the hash structure.
- Pseudorandom Permutations:
Each farmer’s data is also subjected to a pseudorandom permutation based on a unique key K, preventing common or identical versions of the data, which makes compression difficult.
The permutation of indices π(i) is calculated as:
This guarantees the uniqueness of the data structure for each farmer, excluding compression.
- Response Time Optimization:
To help farmers pass audits efficiently, the algorithm focuses on minimizing the time for data decoding and transmission. Data is stored in a form suitable for quick delivery. The decoding time must be minimal, and data transmission should match network bandwidth.
Decoding is performed quickly, as it consists only of reading and providing pre-archived data.
Transmission time depends on network speed and block size. Farmers can optimize transmission by choosing appropriately sized data blocks to speed up audit responses.
- Protection Against Data Restoration: The algorithm sets a time limit on data recovery. If a farmer decides to compress the data to save space, they will not be able to restore it in time for a successful audit.
The time for data decompression Tcompress plus the time for recoding Tencode will always exceed the audit time Taudit:
5.5.3 Mathematical Model of the Algorithm
Mathematically, the compression prevention and response time optimization algorithm can be described as follows:
- The time required for farmers to provide data in response to an audit request:
where Tdecode is the time to decode data, and Ttransmit is the time to transmit data over the network.
- The time required for a farmer using compression to restore data:
where Tcompress is the time for data decompression, and Tencode is the time for data recoding.
For system security, the inequality must hold:
Thus, a farmer who stores data correctly will quickly pass the audit, while a farmer attempting to use compression will not be able to restore the data in time.
Example of Algorithm Operation
Suppose a farmer stores data totaling 1 MB. The audit time is set to Taudit==5 seconds. An honest farmer, storing the data in full, can decode it in Tdecode==1 second and transmit the data in Ttransmit=1 second. Thus, the total response time will be:
A farmer who decides to compress the data to 500 KB will spend Tcompress=2 seconds decompressing the data and another Tencode=5 seconds recoding it, resulting in:
The farmer will not be able to restore the data in time and will fail the audit since:
6. Conclusion
6.1 Key Achievements of Autonomys in Solving the Farmer’s Dilemma
One of the key challenges faced by decentralized data storage networks is the so-called Farmer’s Dilemma. This dilemma occurs when participants (farmers), who provide their storage space, may be tempted to minimize costs by compromising on data storage — such as through compression, on-the-fly recovery, or duplication. The Autonomys Network, through the use of Hourglass Schemes, effectively solves this dilemma by ensuring long-term data storage, preventing compression, and guaranteeing unique copies of data for each farmer.
6.1.1 Key Achievements of Autonomys in Addressing the Farmer’s Dilemma:
- Protection Against Data Compression and Recovery:
One of the most important achievements of Autonomys is the introduction of a slow encoding mechanism that eliminates the possibility of data compression and on-the-fly recovery. This is achieved by making the data encoding process significantly longer than the allowed audit time. The time constraints imposed on farmers make dynamic data recovery impossible, forcing farmers to store data in full.
- Unique Data Copies for Each Farmer:
By using pseudorandom permutations and unique cryptographic keys, each farmer in the Autonomys network stores a unique replica of the data. This approach eliminates the possibility of data duplication between farmers and prevents fraud attempts, such as multiple farmers sharing the same dataset. Unlike other networks, farmers cannot simply copy data from one another to pass audits.
- Proofs-of-Archival-Storage (PoAS):
Autonomys employs the Proofs-of-Archival-Storage (PoAS) mechanism, which requires farmers to store not only the current state of the blockchain but also its entire history. This is a key achievement in solving the Farmer’s Dilemma, as farmers must store the entire history of the network, ensuring long-term data integrity and accessibility. Unlike other solutions like Chia or Filecoin, which focus on proving space allocation or user data storage, Autonomys guarantees the preservation of the entire blockchain history, enhancing the network’s reliability.
- Energy Efficiency and Low Costs:
Autonomys has made significant progress in solving the Farmer’s Dilemma with minimal energy and resource costs. Using disk space instead of powerful computational resources, as in Proof of Work systems, keeps the network energy-efficient. Farmers can use their available disk space without needing to invest in expensive hardware or bear high electricity costs, making participation in the network more accessible.
- High Fraud Resistance:
Autonomys provides a high level of protection against farmer fraud. Through regular audits and the use of cryptographic proofs, the system can quickly identify farmers attempting to manipulate the storage process. Farmers who fail to provide proof of data storage are excluded from the consensus, ensuring system reliability and preventing fraud.
- Scalability and Durability:
The Autonomys Network scales with the growth of both farmers and data volumes. Solving the Farmer’s Dilemma within Autonomys is achieved not only through data storage mechanisms but also through scalable infrastructure to support audits and data verification. This allows for long-term data storage and ensures easy synchronization of new nodes.
- Economic Incentives for Honest Farmers:
Honest farmers who store data in compliance with the protocol receive economic incentives through rewards for participating in the consensus and successfully passing audits. While farmers who attempt to minimize their costs by compromising on data storage inevitably face audit failures and loss of rewards, honest participants receive steady payments, making the system economically balanced.
6.2 Advantages of Using Hourglass Schemes for Security and Decentralization
The Hourglass Schemes implemented in the Autonomys Network represent an innovative technology that enhances both the security of the network and its decentralization. These schemes offer effective methods to protect data from compression, duplication, and fraud, making them a vital part of the Autonomys ecosystem. In this subsection, we will explore the main advantages of Hourglass Schemes in terms of security and decentralization.
6.2.1 Protection Against Data Compression and Recovery
One of the key advantages of Hourglass Schemes is their ability to prevent farmers from compressing data and dynamically recovering it during an audit. Through the slow encoding mechanism, data is encoded in such a way that its recovery requires significantly more time than is allocated for an audit. This prevents farmers from cheating and makes any attempt to compress data unprofitable.
- High Time Costs for Re-Encoding:
The time required to encode data in Hourglass Schemes significantly exceeds the time allocated for an audit. This ensures that farmers who do not store the full data set will not be able to re-encode the data in time to pass the audit.
- Guaranteed Data Integrity:
Since the data must be stored in fully archived form, farmers cannot bypass the system by minimizing storage costs.
6.2.2 Unique Data and Protection Against Duplication
Hourglass Schemes use a mechanism of pseudorandom permutations, which guarantees that each farmer stores unique copies of the data. This protects the network from attempts by farmers to duplicate data from one another or jointly store the same copy of the data.
- Cryptographically Strong Permutations:
Each copy of the data is unique for each farmer due to the use of cryptographically strong pseudorandom permutations. Even if two farmers receive the same portion of data, their storage setups will differ after applying the permutation.
- Prevention of Duplication:
Unique keys for each farmer eliminate the possibility of data duplication, making it impossible for multiple farmers to share the same data set. This increases the level of decentralization and the reliability of data storage.
6.2.3 Secure and Regular Audits
Hourglass Schemes provide regular and secure audits using cryptographic proofs, such as Proof-of-Replication (PoR) and Proofs-of-Archival-Storage (PoAS). Auditors can request proof of data storage from farmers at random intervals, making it difficult for farmers to predict audit requests and dynamically recover data.
- Regular Audits to Verify Storage:
The Autonomys Network conducts regular audits by requesting random data blocks from farmers. This ensures that data is truly stored in full and not dynamically recovered during an audit.
- Unpredictability of Audit Requests:
Since audit requests are made randomly, farmers cannot predict which data will be requested. This makes it impossible to temporarily recover certain portions of data just to pass an audit.
6.2.4 Support for Network Decentralization
Hourglass Schemes promote network decentralization by preventing the centralization of data storage and encouraging the participation of a wide range of farmers. Each farmer stores their own unique copy of the data, which makes data storage distributed throughout the network. Additionally, the slow encoding and pseudorandom permutation mechanisms make network participation accessible and less resource-intensive.
- Lack of Centralized Storage:
Each farmer stores unique data that cannot be duplicated with other farmers. This eliminates the risk of data storage centralization and maintains the distributed architecture of the network.
- Accessibility for Participants:
Unlike more resource-intensive systems like Proof-of-Work, Autonomys allows a wide range of farmers to participate with minimal hardware requirements. This increases decentralization and the number of participants, making the network more resilient.
6.2.5 Durability and Availability of Data
The Proofs-of-Archival-Storage mechanism in Hourglass Schemes ensures the long-term storage of the entire blockchain history. Unlike other protocols where farmers may store only part of the data or only the current data, in Autonomys, every farmer is required to store the complete copy of the network’s history.
- Complete Blockchain History:
Unlike solutions that focus on storing the current state of the network (such as Chia or Filecoin), Autonomys requires storing the entire history, which is important for the longevity and auditability of the entire network. This makes the network resistant to long-term changes and ensures that data will always be available.
- Guaranteed Synchronization:
Full data storage ensures that new nodes can easily synchronize with the network by downloading the complete blockchain history, which enhances the network’s stability and availability.
6.2.6 Economic Efficiency and Sustainability
The use of Hourglass Schemes makes Autonomys an economically efficient solution for network participants. Farmers receive rewards for honest data storage, and the system requires minimal computational resources to maintain consensus and audits. Unlike solutions like Proof-of-Work, which require high costs for equipment and electricity, Autonomys can scale with minimal energy consumption.
- Lower Hardware Requirements:
Unlike mining in PoW networks, participation in Autonomys does not require powerful equipment, reducing barriers to entry and supporting decentralization.
- Energy Efficiency:
Slow encoding and pseudorandom permutations use disk space and minimal computational power, making the system more energy-efficient and environmentally sustainable compared to other approaches.