Crypto analytics with ClickHouse
Take your crypto analytics to the next level with ClickHouse. This guide shows you how to build a production-grade data pipeline for massive datasets and lightning-fast queries.
From local to production scale
When your analytics needs grow beyond a single machine and you require a database designed for production scale, it’s time to consider ClickHouse. ClickHouse is built for handling billions of rows with sub-second query times, making it perfect for production analytics, real-time dashboards, and enterprise-grade data analysis.
Looking for other analytics solutions? Check out our full list of API Tutorials for more step-by-step guides.
Why ClickHouse for crypto analytics?
- Massive scale: Built to handle billions of rows and petabytes of data, far beyond the scope of local, in-process databases.
- Lightning speed: Optimized columnar storage delivers queries that are 10-100x faster than traditional row-based systems.
- Real-time ingestion: Built for continuous data streaming and updates.
- Production ready: Used by companies like Uber, Cloudflare, and Spotify for analytics at scale.
The goal: By the end of this guide, you will have a production-grade ClickHouse setup that can:
- Ingest 15-minute OHLCV data for the top 250 Uniswap v3 pools (7 days of history)
- Handle real-time data updates via streaming
- Run complex analytical queries in milliseconds
- Enable AI-powered analysis through MCP server integration
Step 1: ClickHouse Setup
Install and configure ClickHouse for crypto analytics.
Step 2: ETL Pipeline
Create a robust data pipeline for high-frequency data ingestion.
Step 3: Advanced Queries
Run complex analytics on 15-minute interval data.
Step 4: MCP Integration
Enable AI-powered analysis through MCP server integration.
Step 1: Setting up ClickHouse
Option A: Local installation (recommended for learning)
Install ClickHouse locally for development and testing:
macOS specifics: Cask installation
The brew install clickhouse
command now installs a Cask, not a standard formula. This provides a single clickhouse
binary that acts as a multi-tool for both the server and client. Commands that refer to clickhouse-server
or brew services
will not work.
Use the following commands instead:
Option B: ClickHouse Cloud (recommended for production)
For production workloads, use ClickHouse Cloud:
- Sign up for a free trial at clickhouse.com/cloud
- Create a new service
- Note your connection details (host, port, username, password)
Moving forward:
The rest of this tutorial will assume you are using a local ClickHouse installation (Option A). The Python scripts are configured for this by default. If you choose to use ClickHouse Cloud, remember to update the CLICKHOUSE_HOST
, CLICKHOUSE_PORT
, CLICKHOUSE_USER
, and CLICKHOUSE_PASSWORD
variables in the scripts accordingly.
Test your connection
Step 2: Build the production ETL pipeline
Create a new file named build_clickhouse_db.py
. This script efficiently handles high-frequency data from the top 500 pools, incorporating robust error handling and API management strategies. It leverages two key endpoints: the Top Pools on a DEX endpoint to discover pools, and the Pool OHLCV Data endpoint to fetch historical price data.
This script is used for performance and reliability, using several good practices common in data pipelines:
- Asynchronous operations: By using
asyncio
andaiohttp
, the script can make many API requests concurrently instead of one by one. - Dynamic windowing: The
fetch_pool_ohlcv_paginated
function calculates how much data to request per API call based on theOHLCV_API_LIMIT
. - Concurrency control & throttling: An
asyncio.Semaphore
, combined with carefully tunedBATCH_SIZE
andasyncio.sleep()
calls, makes sure we don’t hit the rate limit. - Resiliency: The
fetch_with_retry
function automatically retries failed requests with an exponential backoff delay.
Required libraries
Step 3: Lightning-fast analytics (Optional)
Once your database is populated, you can query it directly using any ClickHouse-compatible SQL client or a Python script. While the next step (AI Integration) is recommended for the most powerful analysis, running queries directly is a great way to verify your data.
You can create a file named query_clickhouse.py
to see how fast ClickHouse can process complex analytical queries on the millions of rows you’ve ingested.
You can run the script by executing it from your terminal:
Now, let’s move on to the recommended final step: connecting your database to an AI assistant.
Step 4: AI-powered analysis with an MCP server
Enable seamless analysis of your local ClickHouse database through the ClickHouse MCP Server. This allows AI assistants like Claude Desktop to connect to your database, list tables, and run SELECT
queries securely.
1. Install the MCP server
The server is a Python package that can be installed via pip
:
2. Configure your AI client
Next, configure your AI client (e.g., Claude Desktop) to use the server. You’ll need to edit its configuration file.
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%/Claude/claude_desktop_config.json
Add the following JSON block to the mcpServers
section of the file. This tells the client how to run the server and provides the connection details for your local ClickHouse instance.
Finding the command path
The most common point of failure is an incorrect command path. The command should be the absolute path to the clickhouse-mcp-server
executable that pip
installed.
Find this path by running which clickhouse-mcp-server
in your terminal and use the output in the command
field below.
3. Restart and analyze
Save the configuration file and restart your AI client. Once restarted, you can start asking it to analyze the data in your crypto_analytics
database.
Troubleshooting & important notes
-
“Server disconnected” error: This almost always means the
command
path in your configuration is incorrect. Double-check the absolute path usingwhich clickhouse-mcp-server
. -
AI connects to the
default
database: We observed that the AI client might sometimes choose to connect to thedefault
database on its own, even ifcrypto_analytics
is specified in the config. This will result in it seeing no tables. -
Solution: Be explicit: To ensure the AI works correctly, always specify the database in your prompt. This overrides the AI’s tendency to use the default.
Good example prompts:
- “Using the clickhouse-mcp-server, connect to the
crypto_analytics
database and then list the tables.” - “In the
crypto_analytics
database, show me the top 10 pools by volume from thepools
table.” - “Calculate the average daily volume for the top 5 most volatile pools from the
crypto_analytics
database.”
What you’ve built: A production-grade analytics pipeline
Congratulations! You’ve successfully built a scalable crypto analytics pipeline with ClickHouse. You’ve ingested a large dataset of OHLCV data, and you’ve enabled a powerful AI assistant to securely query and analyze that data.
Key achievements:
- Built a production-ready ETL pipeline: You have a reusable, high-performance Python script that can create a comprehensive, multi-million row database from any supported DEX and network.
- Unlocked lightning-fast SQL: You can now perform complex analytical queries on a massive dataset in milliseconds, directly on your machine.
- Mastered a scalable workflow: This “local-first” data strategy, combined with ClickHouse’s power, provides a solid foundation for building real-time dashboards, conducting in-depth market research, and developing sophisticated trading algorithms.
- Enabled secure AI analysis: By connecting your database to an AI assistant via an MCP server, you’ve created a powerful and secure way to explore your data using natural language.