Compute Layer

Introduction

Atoma is revolutionizing the AI landscape with its innovative decentralized compute infrastructure. This section outlines the core components and unique features of Atoma’s Compute Layer, highlighting how it addresses the growing demand for secure, efficient, and scalable AI services.

Atoma’s Decentralized Verifiable and Private AI Cloud

Atoma’s compute layer is powered by a decentralized network of execution nodes that handle AI workloads. This network pools compute power from permissionless nodes equipped with GPUs or AI-specific hardware such as TPUs and XPUs. The architecture is designed to meet the growing demand for decentralized AI services, with a focus on performance and security tailored to AI computation.

Built for efficiency, it combines economic incentives, robust tokenomics, and the increasing demand for decentralized AI services. Unlike conventional GPU-based DePiN networks, Atoma introduces advanced performance and security mechanisms tailored specifically to AI computation.

We are aggregating compute from professional data centers equipped with the latest high-performance GPUs as well as from retail-grade machines equipped with consumer GPUs, including MacBook Pros, by using MLX, Metal kernels and similar technologies.

Key Differentiators from DePiN Networks

While DePiN networks focus on pooling computational resources and managing transactions, Atoma takes a more specialized approach. Nodes in the Atoma Network specialize in specific AI processing tasks, such as AI inference (executing models on input data), model refinement, AI data embedding, and model development.

Additionally, Atoma distinguishes itself through its robust security protocols. Through its Sampling Consensus protocol and Trusted Execution Environments (TEEs), the network ensures every computation is protected from tampering. This is vital for the integrity of generative AI outputs, especially in user-facing applications where reliable results are critical.

Atoma’s Free Market for Compute

Atoma implements a dynamic, efficient marketplace for AI compute resources:

Intelligent Request Routing: User requests are automatically directed to the most suitable nodes based on multiple criteria, including:
- Cost
- Uptime
- Privacy features
- Response times
- Hardware capabilities
- Current workload
Optimized Performance: This smart routing ensures each request is processed efficiently, balancing performance and cost-effectiveness. Ultimately, it results in a fairer market for accessing AI compute resources.
Sampling Consensus for Trust: Atoma’s Sampling Consensus algorithm, combined with TEEs, provides high-assurance verification of node reliability, fostering a trustworthy ecosystem.
Transparent Pricing: Node operators set competitive rates, while users benefit from clear, market-driven pricing. Nodes offer their compute power at fair market prices, giving users the flexibility to select the best option for their needs.
Flexible Resource Allocation: The network adapts in real-time to fluctuating demand, scaling resources as needed.

This approach creates a robust, decentralized marketplace for AI compute power, combining reliability, efficiency, and economic incentives.

Node Reputation and Incentives

Node Reputation Mechanisms

Atoma’s network uses a robust reputation system to ensure high-quality service and maintain network integrity:

Performance Metrics: Nodes are evaluated on key factors including:
- Availability
- Execution speed
- Task completion rate
- Output accuracy
- Hardware capabilities
Reward System: Nodes earn rewards for:
- Successful task completion
- Maintaining high uptime
- Consistently meeting performance benchmarks
Collateral Requirement: Nodes must stake collateral to participate, which can be:
- Increased for higher-tier tasks
- Slashed for malicious behavior or repeated poor performance
Dynamic Task Allocation: Higher-reputation nodes receive priority for:
- More complex AI workloads
- Higher-value tasks
- Sensitive or privacy-focused computations

Trust and Security Measures

Sampling Consensus: Randomly selected nodes verify computations, ensuring result integrity without centralized oversight.
Trusted Execution Environments (TEEs): Hardware-level isolation protects sensitive data and ensures tamper-proof execution.
Transparent Reporting: Node performance metrics are publicly available, building trust and supporting informed user decisions.

This approach establishes an ecosystem that incentivizes high performance, security, and reliability across the Atoma network.

Atoma’s Optimized Infrastructure

Atoma leverages Rust’s low-level speed and memory safety to power its decentralized AI infrastructure. Known for system efficiency, Rust is the de facto language for high-performance systems programming, integrating high-security technologies such as TEEs and GPU programming frameworks like CUDA and Metal. These features make Rust ideal for Atoma’s decentralized AI infrastructure.

Instead of using large legacy libraries like PyTorch, which often lead to high memory usage and slower execution, Atoma adopts Candle, a lightweight, Rust-native AI framework maintained by HuggingFace. The compact binaries of Candle allow nodes, even at the network edge, to execute AI tasks with greater efficiency.

For large-scale AI processing, such as handling large context-window LLM inference with the largest AI models, Atoma incorporates advanced techniques like CUDA-based FlashAttention and PagedAttention, enhancing performance for both inference and training tasks. These optimizations ensure efficient scheduling of workloads, maximizing GPU utilization and enabling nodes to handle parallel requests seamlessly.

Atoma’s network scales both vertically and horizontally, supporting a growing number of nodes and cores to accommodate increasing computational demand.

Atoma at the Edge: Empowering Local AI

Atoma extends its reach beyond decentralized cloud infrastructure to the edge, delivering powerful AI capabilities directly on users’ devices:

WASM and WebGPU Compatibility: We are building a cutting-edge software stack that leverages WebAssembly (WASM) and WebGPU technologies, enabling high-performance AI applications to run natively in browsers and on local devices.
Edge LLM Deployment: Users can run compact yet powerful Language Models directly on their devices, ensuring privacy and reducing latency for AI-driven tasks.
Comprehensive SDK: Atoma provides developers with a robust toolkit to create edge AI applications that that integrate with our decentralized compute layer.
Data Ownership and Monetization: This edge-centric approach empowers users and developers to retain control over AI-generated data. Through Atoma’s tokenomics, this data can be monetized in decentralized data marketplaces.
Fueling Next-Gen AI: Aggregated edge data becomes a valuable resource for training future AI models, driving continuous innovation within the Atoma ecosystem.

By bridging edge computing with our decentralized infrastructure, Atoma is fostering a new paradigm of accessible, private, and user-centric AI applications.

Atoma’s AI Infrastructure

Inference, Text Embeddings, and Fine-tuning

Atoma’s infrastructure is fully optimized to handle AI tasks such as inference, text embeddings, and fine-tuning. The network implements advanced techniques to accelerate inference, including:

Flash Attention 2 and 3: These techniques reduce the number of reads and writes to and from HBM (High Bandwidth Memory) on GPUs, significantly improving speed in AI inference and training workloads. This results in faster processing times and more efficient use of hardware resources, particularly for large language models (LLMs).
vAttention: A memory management mechanism that allocates large amounts of virtual memory for models, and efficiently assigns physical memory at runtime using minimal CPU and GPU resources.
vLLM: Inspired by OS pagination techniques, vLLM efficiently manages the memory of AI inference requests. It uses virtual memory to ensure that large model requests are processed smoothly.

Multi-GPU Serving and Quantization Techniques

Atoma enables multi-GPU serving, allowing large language models (LLMs) to be deployed across multiple GPUs for handling complex computations. This capability supports the deployment of some of the largest available open-source models.

To further enhance performance, the network utilizes various quantization techniques, such as:

INT8/INT4 Quantization: Techniques that reduce the precision of model weights to minimize memory usage while maintaining accuracy.
FP8/FP4 Quantization: Methods that use lower-precision floating-point formats for faster computation and reduced hardware requirements.

These techniques enable more efficient model execution by reducing memory usage and computation costs while maintaining high performance.

RAG (Retrieval-Augmented Generation) Implementation

Atoma incorporates Retrieval-Augmented Generation (RAG) to enhance AI model performance by combining data retrieval with content generation. This approach improves the accuracy of AI outputs by using relevant external data during inference, making responses more contextually rich and reliable.

Future Roadmap: Decentralized AI Training and Data Production

Integration of Decentralized AI Training

Atoma will introduce decentralized AI training, leveraging the latest NVIDIA GPUs, such as the Hopper and Blackwell families, integrated with TEEs to ensure secure and efficient AI training processes.

Real and Synthetic Data Production

Through the Atoma Network, significant volumes of real and synthetic data will be generated. This data will support decentralized AI training. It will be carefully labeled and curated through advanced mechanisms, further strengthening the network’s long-term AI training initiatives.

Get Started

Atoma Smart Contract

Compute Layer

Introduction

Atoma’s Decentralized Verifiable and Private AI Cloud

Key Differentiators from DePiN Networks

Atoma’s Free Market for Compute

Node Reputation and Incentives

Node Reputation Mechanisms

Trust and Security Measures

Atoma’s Optimized Infrastructure

Atoma at the Edge: Empowering Local AI

Atoma’s AI Infrastructure

Inference, Text Embeddings, and Fine-tuning

Multi-GPU Serving and Quantization Techniques

RAG (Retrieval-Augmented Generation) Implementation

Future Roadmap: Decentralized AI Training and Data Production

Integration of Decentralized AI Training

Real and Synthetic Data Production

Get Started

Atoma Smart Contract

​Introduction

​Atoma’s Decentralized Verifiable and Private AI Cloud

​Key Differentiators from DePiN Networks

​Atoma’s Free Market for Compute

​Node Reputation and Incentives

​Node Reputation Mechanisms

​Trust and Security Measures

​Atoma’s Optimized Infrastructure

​Atoma at the Edge: Empowering Local AI

​Atoma’s AI Infrastructure

​Inference, Text Embeddings, and Fine-tuning

​Multi-GPU Serving and Quantization Techniques

​RAG (Retrieval-Augmented Generation) Implementation

​Future Roadmap: Decentralized AI Training and Data Production

​Integration of Decentralized AI Training

​Real and Synthetic Data Production

Introduction

Atoma’s Decentralized Verifiable and Private AI Cloud

Key Differentiators from DePiN Networks

Atoma’s Free Market for Compute

Node Reputation and Incentives

Node Reputation Mechanisms

Trust and Security Measures

Atoma’s Optimized Infrastructure

Atoma at the Edge: Empowering Local AI

Atoma’s AI Infrastructure

Inference, Text Embeddings, and Fine-tuning

Multi-GPU Serving and Quantization Techniques

RAG (Retrieval-Augmented Generation) Implementation

Future Roadmap: Decentralized AI Training and Data Production

Integration of Decentralized AI Training

Real and Synthetic Data Production