Multi-Chain Data Ingestion: The Fuel for AI Agents

Modular Ingestion Pipeline

Overview

The Modular Ingestion Pipeline in Project Zero is designed to efficiently manage blockchain data ingestion, ensuring seamless and real-time access to structured blockchain data. A critical component of this pipeline is the Block Stream Module, which is responsible for continuously fetching, organizing, and exposing blockchain data in a structured format. The Block Stream Module operates individually for each blockchain, ensuring dedicated data handling and processing per chain.

Block Stream Module

block-stream.png

The Block Stream Module is the backbone of the ingestion pipeline. It ensures data consistency, handles chain reorganizations, and facilitates seamless retrieval and processing of blockchain data.

Orchestrator

The Orchestrator acts as the central controller, managing the workflow of the ingestion process. It performs the following functions:

  • Listening for new blocks: Subscribes to WebSocket connections from blockchain nodes to receive real-time updates about newly mined blocks.
  • Registering upcoming blocks: Tracks and registers all new blocks as they arrive.
  • Placing blocks in a queue: Enqueues blocks for processing, ensuring an organized and efficient workflow.
  • Tracking blocks: Maintains an internal database of all processed blocks, ensuring no data is lost or duplicated.
  • Handling chain reorganizations: Detects and resolves chain reorgs, ensuring that only valid blocks are processed and stored.
  • Tracking downloaded datasets: Keeps track of all datasets that have been successfully downloaded and processed, ensuring data integrity.
  • Scheduling tasks: Manages task distribution among workers, optimizing resource usage.

Worker

The Worker component is responsible for fetching blockchain data from RPC providers and storing it in long-term storage solutions such as Amazon S3 or EFS. Its primary tasks include:

  • Processing queue jobs: Retrieves blocks from the orchestrator queue and processes them accordingly.
  • Making RPC calls: Interacting with blockchain nodes via RPC to fetch raw transaction and block data.
  • Extracting necessary datasets: Collecting essential blockchain information, including transactions, smart contract interactions, and event logs.
  • Storing data: Persisting structured data into cloud-based storage for further processing and retrieval.
  • Horizontal scalability: Workers can be horizontally scaled to improve processing speed and handle larger workloads efficiently.

Reader

The Reader component provides an efficient and structured way to access and stream blockchain data. It exposes a gRPC interface, which allows clients to retrieve historical and real-time blockchain data efficiently. Its functionalities include:

  • Streaming data: Serving blockchain data on demand to clients via gRPC, enabling real-time analysis.
  • Querying datasets: Allowing users to retrieve specific blockchain records based on predefined filters.
  • Ensuring data consistency: Verifying the integrity of stored datasets before serving responses.

reader.png

The Block Stream Module plays a crucial role in the Modular Ingestion Pipeline, ensuring that blockchain data is ingested, processed, and exposed efficiently. This module is designed to function independently for each blockchain, providing a dedicated ingestion workflow per chain. By structuring the ingestion pipeline with dedicated components—Orchestrator, Worker, and Reader—Project Zero ensures high availability, scalability, and reliability of blockchain data for AI-powered applications and decentralized ecosystems. The ability to horizontally scale workers further enhances the system’s efficiency, allowing for increased throughput and optimized resource utilization.

On this page