Trust and Privacy
Private and Secure AI Inference Compute
In a permissionless network, two critical challenges arise during AI inference: ensuring output accuracy and protecting the privacy of both user data and model weights.
Trust in Output Accuracy
When requesting inference using a large model like Llama3.1-405B (405 billion model weight parameters), how can we ensure that the network isn’t actually using a smaller model like LLama3.1-8B (8 billion parameters), or a highly quantized version of the original model? **Verifiable AI** inference ensures that the generated output is correctly derived from the specified model and inputs.
This challenge is similar to what Web3 users encounter with blockchain technologies, which often rely on high compute replication to ensure transaction integrity. Blockchains typically have hundreds or thousands of validators, with each transaction executed by a supermajority of nodes (usually two-thirds of the network).
However, while blockchain transactions are typically limited to a few kilobytes, modern AI models require gigabytes of memory. This makes it unfeasible to run AI inference on-chain due to the high cost of replication, necessitating alternative verification methods.
Privacy Concerns
Equally important is the need to protect the privacy of both user data and model weights throughout the inference process. This includes:
-
Data in Transit: Ensuring that user inputs and model outputs are encrypted during transmission between the user and the network nodes.
-
Data at Rest: Protecting stored user data, ideally in encrypted format, and model weights from unauthorized access when not in use.
-
Data During Execution: Safeguarding user inputs and model weights while they are processed in memory through the use of secure enclaves (TEEs).
Privacy preservation is crucial for several reasons:
-
User Trust: Users need assurance that their potentially sensitive inputs won’t be exposed or misused.
-
Intellectual Property Protection: Model owners must be confident that their valuable AI models won’t be reverse-engineered or stolen.
-
Regulatory Compliance: Many jurisdictions have strict data protection laws that require safeguarding personal information.
Addressing both trust and privacy concerns is essential for building a robust, secure, and widely adopted AI compute network. The following sections explore various approaches to tackling these challenges, including Atoma’s novel Sampling Consensus protocol and the use of TEEs on the Atoma Network.
Overview of Common Verifiable Computation Algorithms
Byzantine Fault Tolerant Protocols
Byzantine Fault Tolerant (BFT) consensus refers to a class of algorithms used in distributed systems to agree on the state of a global ledger modified by a given transaction. These protocols ensure consensus among nodes even if a proportion of the nodes in the system are unreliable or malicious. BFT consensus guarantees that the system as a whole can still make reliable and unanimous decisions, even in the presence of faulty or malicious actors.
Key Characteristics of BFT Protocols:
-
Typically require agreement from a supermajority (usually around 2/3) of nodes
-
Provide strong consistency and finality guarantees
-
Can tolerate up to f faulty nodes in a system of 3f+1 total nodes
Challenges in Applying BFT to AI Inference:
-
Computational Overhead: Each node must independently perform the same computations, leading to significant redundancy.
-
Communication Complexity: Multiple rounds of communication are required among a supermajority of nodes, increasing latency.
-
Scalability Issues: As the number of nodes increases, the communication overhead grows quadratically.
-
State Explosion: AI compute pipelines generate large amounts of intermediate state data, exacerbating the above issues.
While BFT protocols offer strong security guarantees, their application to AI inference faces significant challenges in terms of computational power, latency, and scalability. These limitations make BFT less suitable for large-scale, high-performance AI inference tasks.
Zero Knowledge Machine Learning (ZKML)
Zero Knowledge Machine Learning (ZKML) applies zero knowledge proofs (ZKPs) to AI and machine learning models. This approach ensures the correctness of machine learning model outputs while maintaining model confidentiality (e.g., weights and architecture).
Key Features of ZKML:
-
Model Privacy: Owners can keep models self-hosted without disclosing internal details.
-
Verifiable Correctness: ZK proofs provide near-absolute certainty of correct execution.
-
Non-interactive Proofs: Once generated, proofs can be verified quickly without further interaction.
Challenges and Limitations:
-
Computational Overhead: Generating ZK proofs for AI models incurs significant computational costs, often orders of magnitude higher than native execution.
-
Complexity: Compiling neural networks into ZK circuits is extremely complex and resource-intensive.
-
Scalability Issues: Current ZKML techniques struggle with large models, particularly LLMs.
-
Lack of Practical Implementations: As of now, there are no deployments of ZK proof generation for large language models.
-
Lack of Support for Floating Point Arithemtic: Current ZK proving systems are not tailored to prove native floating point arithmetic operations, making these unsuitable for proving highly optimized AI workflows.
While ZKML can potentially offer verifiability guarantees, its current state makes it impractical for most real-world AI inference tasks due to performance and scalability limitations. However, ongoing research in this field may lead to more efficient implementations in the future.
Optimistic Machine Learning (OpML)
Optimistic Machine Learning (OpML) adopts an “optimistic” approach to verification, assuming computations are correct unless proven otherwise. This method aims to balance efficiency with security by avoiding immediate verification costs.
Key Concepts of OpML:
-
Assume-valid Execution: Initial computations are accepted without immediate verification.
-
Challenge Periods: Time windows where participants can contest the validity of computations.
-
Dispute Resolution: Mechanisms to correct errors when valid challenges are raised.
Advantages:
-
Reduced Upfront Costs: Avoiding immediate verification can improve efficiency for most transactions.
-
Scalability: Can handle a higher throughput of computations compared to always-verify approaches.
Challenges and Limitations:
-
Delayed Finality: Challenge periods introduce delays in confirming the final state of computations.
-
Inconsistent States: The possibility of successful challenges can lead to state reversions and inconsistencies.
-
Complex Incentives: Verifiers must compute results to check validity but are only rewarded for successful challenges.
-
Security Trade-offs: Relies on economic incentives and the assumption that malicious behavior will be caught and punished.
Practical Considerations:
-
Suitable for scenarios where immediate finality is less critical and the cost of occasional reversions is acceptable.
-
May require careful tuning of challenge periods and incentive structures to balance security and efficiency.
-
Can be combined with other techniques (e.g., bonding or staking) to enhance security guarantees.
While OpML offers a pragmatic approach to verification in distributed systems, its reliance on challenge periods and potential for state reversions make it suboptimal for applications requiring high assurance or immediate finality in AI inference tasks.
Atoma’s Sampling Consensus
Atoma has developed a novel sampling consensus protocol that addresses the limitations of existing approaches while providing a robust, efficient, and scalable solution for verifiable AI compute. Here’s how it operates and why it outperforms alternatives:
-
Random Node Selection: For each request, the Atoma Network protocol selects a small number of nodes uniformly at random. This approach:
-
Ensures load balancing across the network, unlike BFT which requires a supermajority of nodes.
-
Improves scalability compared to BFT, as it doesn’t require multiple rounds of communication among a large number of nodes.
-
Accelerates finality compared to OpML, which relies on lengthy challenge periods.
-
-
Deterministic Execution: The protocol selects nodes with identical hardware and software specifications, ensuring:
-
Consistent outputs across nodes, even for floating-point arithmetic-heavy workloads like AI inference.
-
Verifiable compute without the significant computational overhead of ZKML proofs.
-
-
Incentive Structure: Nodes must execute requests or risk collateral slashing, which:
-
Encourages honesty more effectively than OpML, where verifiers are rewarded only for successful disputes.
-
Provides a clear economic incentive for participation, unlike BFT where all nodes must process all transactions.
-
-
Consensus Mechanism:
-
Honest nodes will always disagree with dishonest ones, ensuring high integrity.
-
Consensus occurs only if all selected nodes execute the computation honestly.
-
This approach is more efficient than BFT, which requires agreement from a supermajority of nodes.
-
-
Timeout Handling: Each request has a specified timeout period, with collateral slashing for non-responsive nodes. This ensures:
-
Timely execution of requests, unlike OpML which can have long challenge periods.
-
A self-regulating network that penalizes underperforming nodes.
-
-
Near-Native Finality:
-
The node completing computation first can immediately share the output with the user.
-
The final verifiability attestation is provided when the slowest node commits.
-
This approach offers much faster finality than OpML and is more efficient than BFT’s multiple rounds of communication.
-
-
Flexible Verifiability: Unlike ZKML, which always incurs a high overhead, Atoma’s approach allows for:
-
Adjustable levels of verifiability specific to the use case.
-
Cost-effective solutions for low to medium security needs.
-
High-security options for critical applications without the substantial overhead of ZKML.
-
-
Scalability: The protocol scales well with network growth, unlike BFT which becomes slower and more communication-intensive as the network expands.
-
Cost-Effectiveness: By selecting a small number of nodes, Atoma’s approach is more cost-effective than full replication (BFT) or the high computational costs of ZKML.
-
Adaptability: The protocol can be fine-tuned by adjusting the number of selected nodes, allowing for a balance between security and efficiency that other approaches lack.
In summary, Atoma’s Sampling Consensus combines the best aspects of existing solutions while mitigating their drawbacks. It offers the verifiability of BFT without its scalability issues, the efficiency of native execution without sacrificing security, and the flexibility to cater to various security needs without the substantial overheads of ZKML or the finality delays of OpML.
Probability Considerations on Atoma’s Sampling Consensus
Atoma’s protocol enforces determinism for execution workloads by randomly selecting nodes to replicate compute. The number of sampled nodes directly impacts output integrity levels.
Let’s consider a scenario where a dishonest participant controls a fraction of the network. The probability that a quorum of selected nodes includes at least one dishonest node is given by:
For example:
-
With nodes and :
-
With nodes and :
This demonstrates that even a small set of randomly selected nodes can provide high trust guarantees against output tampering. However, achieving this level of trust requires users to pay times the native cost.
Cost Considerations
-
For high-value use cases, such as AI-driven trading strategies in DeFi, the additional cost is often justified by the potential returns.
-
Comparison to other methods:
-
ZKML remains impractical for applications relying on LLM compute.
-
OpML relies on economic incentives for verifiers, even in the absence of disputes.
-
-
Low to Medium Verifiability Needs: For applications like chatbots or AI NFT enhancements, the full cost of replication may be excessive.
To address these varying needs and reduce costs, we’ve introduced two improvements to our original Sampling Consensus protocol, detailed in the next section.
Cross Validation Sampling Consensus
To address the replication costs of the original Sampling Consensus
, we developed Cross Validation Sampling Consensus
. This protocol offers lower costs with a slight impact on verifiability.
The process works as follows:
-
A single randomly chosen node executes each incoming request.
-
After the node commits its output, we select a quorum of nodes with probability :
-
If no nodes are selected (probability ), the initial output is considered final.
-
If nodes are selected, they provide their own output commitments, and agreement with the original output finalizes the result; disagreement triggers a dispute.
-
-
After dispute resolution, the collateral of malicious nodes is distributed among honest nodes in the quorum of .
This approach relies on replication with rate , meaning of requests are executed by a single node. Nodes can’t predict if their output will be verified, enhancing honesty incentives. However, this method doesn’t provide full verifiability for high-security requests and introduces additional latency compared to the original protocol.
Game-theoretical Analysis
Assuming rational participants, the system reaches Nash equilibrium under these conditions:
For an honest node, expected rewards are:
Where ( R ) is the reward, ( C ) is the execution cost, and ( r ) is the proportion of malicious nodes.
For a malicious node, expected rewards are:
Where ( H ) is the dishonest reward without verification, ( L ) is the reward with verification, and ( S ) is the slashed collateral.
A node is incentivized to be honest when:
With realistic values (( r = 10% ), ( N = 1 ), ( H = R ), ( L = 2R ), reward = ( 20% ) of ( C )), the replication rate ( p ) can be lower than ( 1% ), significantly reducing verification overhead.
NOTE: This protocol is best suited for low to medium security applications, such as Web applications like chats, content creation, and LLM-powered tools (e.g., AI-powered IDEs). However, the security of Cross Validation Sampling Consensus is insufficient for high-stakes scenarios where potential malicious rewards exceed protocol fees by orders of magnitude. In such cases, or when verifiability is not required, the original Sampling Consensus is recommended.
Note: Hyperbolic Labs independently developed and published similar ideas in a research paper.
Node Obfuscation
To address the increased time to finality in Cross Validation Sampling Consensus, we have introduced a node obfuscation mechanism. This approach leverages cryptographic techniques to enhance both efficiency and security, as follows:
-
Quorum Selection: Upon request submission, N+1 nodes (where N ≥ 1) are selected to form the quorum.
-
Information Isolation: Through advanced cryptographic obfuscation, each node is only aware of its own selection, with no knowledge of other participants in the quorum.
-
Parallel Execution: All selected nodes process the request concurrently, eliminating the need for a separate verification round.
-
Output Comparison: A designated aggregator node collects and compares the outputs from all quorum members.
-
Finality Determination: If all outputs match, the result is considered final; otherwise, a dispute resolution process is initiated.
Key Benefits:
-
Reduced Latency: Eliminates one round of communication compared to the basic Cross Validation approach.
-
Maintained Security: The game-theoretical analysis and security guarantees remain intact.
-
Enhanced Privacy: Nodes cannot collude as easily since they’re unaware of other quorum members.
This obfuscation mechanism combines the speed of parallel processing with the security of cross-validation, while preserving economic incentives for honest behavior in the network.
Trusted Execution Environments: Enabling Secure and Private AI Compute
What are TEEs?
Trusted Execution Environments (TEEs) are isolated regions within a processor designed to securely execute code and protect data. They are essential for verifiable and private AI tasks, providing a hardware-backed solution to critical security concerns in distributed computing.
TEEs ensure:
-
Confidentiality: Ensures the privacy of code and data within the TEE.
-
Integrity: Verifies that code and data remain untampered.
-
Attestation: Provides cryptographic proof of the TEE’s state and executing code.
Major TEE Technologies
Several CPU manufacturers offer TEE solutions:
-
Intel TDX (Trusted Domain Extensions):
- Designed for modern workloads including AI/ML, offering improved performance and larger memory capacity compared to its antecessor, the Intel SGX (Software Guard Extensions).
-
AMD SEV (Secure Encrypted Virtualization):
- Secures virtual machines, particularly for cloud computing.
-
ARM TrustZone:
- Prevalent in mobile and embedded devices, creating a secure world separate from the normal operating environment.
-
RISC-V MultiZone:
- Open-source TEE solution for RISC-V architecture, gaining traction in IoT and edge computing.
-
NVIDIA Confidential Computing:
- Delivers TEE capabilities tailored for GPU-accelerated workloads, enabling secure and efficient AI/ML computations.
TEEs on NVIDIA GPUs
NVIDIA’s approach to TEEs is particularly significant for the AI industry, as it addresses the need for secure, high-performance computing in AI workloads. By extending confidential computing to GPUs, NVIDIA enables a new class of secure AI applications that can leverage the power of GPU acceleration while maintaining strong privacy and security guarantees.
-
NVIDIA Hopper Architecture:
-
Introduces Confidential Computing features in the H100 and H200 GPUs.
-
Provides hardware-based isolation and memory encryption.
-
-
NVIDIA Blackwell Architecture:
- Enhances confidential computing capabilities, announced in 2024.
-
Confidential AI:
-
Enables secure execution of AI workloads on GPUs.
-
Protects both AI models and data during computation.
-
-
Key Features:
-
Secure Boot: Ensures firmware integrity.
-
Memory Encryption: Safeguards data in GPU memory.
-
Secure Multi-Tenant Execution: Supports multiple isolated workloads on a single GPU.
-
-
Integration with CPU TEEs:
-
Works in conjunction with CPU-based TEEs, such as Intel TDX.
-
Provides end-to-end protection for hybrid CPU-GPU workloads.
-
-
Use Cases:
-
Federated Learning: Enables secure collaborative AI training.
-
Model Inference: Protects proprietary AI models during deployment.
-
Healthcare and Financial Services: Allows processing of sensitive data on GPUs.
-
-
Challenges:
-
Performance Impact: Encryption and isolation can affect computational speed.
-
Ecosystem Adoption: Requires updates to existing AI frameworks and tools.
-
TEEs for Verifiable and Private AI Tasks
TEEs offer significant potential for AI computations:
-
Model Protection:
-
Proprietary AI models are executed within TEEs, preventing unauthorized access.
-
Enables “AI-as-a-Service” without exposing model internals.
-
-
Data Privacy:
-
Sensitive data can be processed securely within TEEs.
-
Allows for computations on private data without exposing it to the processor owner.
-
-
Verifiable Execution:
- Remote attestation proves the integrity of AI models and input data.
-
Secure Multi-Party Computation:
- TEEs can facilitate secure collaborations between multiple parties without exposing their individual data.
-
AI at the edge:
- TEEs on edge devices enable local execution of AI models, preserving privacy and minimizing data transfer.
Challenges and Considerations
While TEEs offer robust security, some challenges remain:
-
Side-Channel Attacks: While rare, sophisticated attacks may extract information from TEEs.
-
Performance Overhead: Encryption and isolation may impact computational speed.
-
Limited Memory: Some TEE implementations restrict enclave size.
-
Ecosystem Support: Not all software and frameworks are optimized for TEE execution.
Atoma’s Approach to TEEs
Atoma Network leverages TEEs to enhance its Sampling Consensus mechanism:
-
Secure Node Execution: Atoma network will leverage TEE nodes to protect AI computations. These nodes must include TEE-compatible hardware accelerators, such as NVIDIA Hopper or Blackwell GPUs.
-
Verifiable Outputs: TEE attestations contribute to the overall verifiability of the network.
-
Privacy-Preserving Consensus: Nodes can participate in consensus without exposing sensitive data.
By integrating TEEs into its architecture, Atoma provides a powerful solution for running verifiable and private AI tasks at scale, addressing key concerns in distributed AI compute.
Sampling Consensus and TEEs
Atoma Network uniquely combines its Sampling Consensus mechanism with Trusted Execution Environments (TEEs) to create a robust, secure, and verifiable AI compute platform. This synergistic approach addresses multiple security concerns simultaneously, providing a comprehensive solution for trustworthy AI computations.
Integrated Security Approach
-
Dual-Layer Verification:
-
Sampling Consensus provides network-level verification.
-
TEEs ensure hardware-level security and integrity at the node level.
-
-
Enhanced Privacy:
-
TEEs protect sensitive data and models during execution.
-
Sampling Consensus enables the rotation of nodes across different requests, greatly minimizing the risk of nodes manipulating their physical hardware to tamper with specific user requests (such as accessing enterprise- or government-grade confidential data).
-
-
Scalable Trust:
-
TEEs provide local guarantees on each node.
-
Sampling Consensus extends trust across the network.
-
Implementation Details
-
Node Selection and Execution:
-
Nodes are selected according to the Sampling Consensus protocol.
-
Selected nodes execute AI tasks within their TEEs.
-
-
Attestation Chain:
-
Each node provides TEE attestations along with computation results.
-
The network verifies both the consensus and individual TEE proofs.
-
-
Dispute Resolution:
-
In case of discrepancies, TEE compatible nodes can be used for resolving disputes.
-
Malicious behavior can be definitively identified and penalized.
-
-
Dynamic Security Levels:
-
Users can request higher replication (N) for critical tasks.
-
TEE security complements lower replication for less sensitive tasks.
-
-
Model and Data Protection:
-
Proprietary AI models are executed within TEEs, preventing unauthorized access.
-
Input data remains encrypted outside the TEE, protecting user privacy.
-
Benefits of the Combined Approach
-
Comprehensive Security:
-
Protects against both network-level and hardware-level attacks.
-
Mitigates risks of collusion and hardware tampering.
-
-
Verifiable Privacy:
-
Enables privacy-preserving computations that are still verifiable.
-
Supports use cases involving sensitive or regulated data.
-
-
Flexible Trust Model:
-
Adaptable to different security requirements and threat models.
-
Allows for a spectrum of trust levels, from high replication to TEE-only execution.
-
-
Enhanced Performance:
-
TEEs reduce the need for high replication in many scenarios.
-
Sampling Consensus allows for efficient distribution of workloads.
-
-
Future-Proofing:
- The combined approach is adaptable to advancements in both consensus mechanisms and TEE technologies.
By leveraging both Sampling Consensus and TEEs, Atoma Network creates a unique security ecosystem that is greater than the sum of its parts. This approach not only enhances the overall security and privacy of AI computations but also opens up new possibilities for trustworthy, decentralized AI applications across various industries and use cases.