Howwe'veimprovedDuneAPIusingDuckDB

Learn how Dune's enhanced API enables pagination, larger results, and new features like filtering and sorting.

Research•July 30, 2024•4 min read

dune-api

At Dune, we value our customers’ feedback and are committed to continuously improving our services. This is the story of how a simple, prioritized feature request for DuneAPI —supporting query result pagination for larger results—evolved into a comprehensive improvement involving the adoption of DuckDB at Dune.

We’ve learned a lot during this journey and are excited to share our experiences and the new functionalities we’ve been building.

Motivation & Context

The journey began with user feedback and a repeated feature request: “Dune API doesn’t support pagination, and the maximum size of query results is limited (~1GB).” Users needed to read larger results, which required supporting pagination. At the end of 2023, we finally prioritized resolving this issue.

Why no pagination and 1GB limit?

To address this question, let’s start with understanding our initial architecture and its limitations.

Our original capabilities were designed to serve the needs of Dune Analytics, a platform focused on visualizing crypto data through dashboards and graphs.

This use case leveraged the following architectural decisions:

Query-Driven Visualizations: Each visualization on a dashboard was tied to a specific SQL query. This setup allowed for consistent and static data views, which were suitable for our initial visualization-centric use cases.
Powerful Query Execution: SQL Queries offer the expressiveness and capability for rich and complex data manipulations required to query and aggregate large datasets
Small, Reusable Query Results: Visualizations typically require manageable data sizes, optimized for quick rendering on dashboards. Large datasets were unnecessary, as visual elements have limited pixel space and do not need millions of data points.
Caching: To improve performance, we cached query results. This approach was suitable for dashboards that repeatedly accessed the same query results, reducing the need for re-execution.
1GB Result Cap: The API was capped at 1GB per query result because larger results were unnecessary for visualization purposes and could overwhelm the system’s memory.
No Pagination: Since visualizations generally required the entire dataset at once, there was no need for pagination.
Complex Query Execution: our query execution is powerful but massively complex queries can be too slow
Search and Filter: Jesse needs an app to search and filter large query results by wallet address or time window, requiring advanced filtering and efficient handling of large datasets.
Data for Charts: Logan needs to feed specific data into charts for mobile, requiring low network data usage and dynamic sorting and filtering.
Data Science: Another use case involves performing data science on large sets of crypto transactions, emphasizing the need for pagination and efficient data handling.
Flexibility and Integration: Developing an API that easily integrates with existing Dune functionalities.
Real-Life Applications: Using real-life applications, like Dune dashboarding, to guide feature development.
Embrace new Technology: Using new technologies to bridge user needs and our engineering capabilities.
Maximizing Value: Extending the use cases we cover with existing Dune queries and query results, maximizing the value provided to our users.
Reading/Querying Cached Results: Support our cached Results format.
Low-Latency Response: Queries must return within 100-200 milliseconds for interactive use.
Cost-Effective Execution: Ensuring execution is inexpensive enough to allow multiple requests per user interaction.
Load Query Results into Another Database: This approach would involve selecting a database with better cost-effective performance suited to our use case than Trino.
Implement Features Directly: Building our functionalities on top of our existing query-result format (which was compressed JSON).

Considering Future Growth

Beyond our immediate needs, we also had to plan for future functionalities:

Aggregation Functions: Allowing for aggregation operations on specific columns.
Re-ordering Results: Enabling re-ordering by any column.
Handling Larger Results: Increasing the query result limit from 1GB to potentially 20GB or more.
Parquet Query Result Format: Improving our query results format, moving from compressed JSON to Parquet.
Support Our Query Result Format: Compatible with compressed JSON and Parquet.
Fast Query Execution: Return results in less than 100 milliseconds.
Cost-Effective: Lightweight and inexpensive, allowing multiple queries per minute without significant costs.
Data Format Support: It supports querying and loading data directly from JSON and Parquet.
High-Performance SQL Engine: Its robust and modern SQL engine supported our required usecases within our latency and cost requirements.
Flexibility: The ability to handle advanced query functionalitities and larger datasets without significant changes.
Easy to use: We had a working prototype of loading and querying our query results in an single day.
Pagination: Retrieve data in manageable chunks to handle large datasets.
Filtering: Apply filters to query results based on specific columns and criteria. relevance.
Sorting: Organize query results in a specified order.
Sampling: Retrieve an uniform sample of the dataset for efficient data analysis and visualization.
Custom Endpoints, which allow users to create and manage API endpoints from Dune queries.
Preset Endpoints, which provide quick access to standardized and essential data, eliminating the need for custom queries for frequently needed information.
- Contracts: Data on blockchain contracts.
- DEX Metrics: Information on decentralized exchanges.
- EigenLayer and Farcaster: Metrics and data related to specific projects and technologies.
- Marketplace Marketshare: Key market indicators and trends.
- Linea: Insights and data on specific blockchain project

Conclusion

As we navigated the journey of enhancing DuneAPI, we experienced firsthand “feature creep”— that moment when you start with a simple request and end up redesigning half the system. But feature creep doesn’t have to be scary. By stepping back and looking at the bigger picture of our customers’ needs, we found opportunities to innovate and build a better service.

Incorporating DuckDB was not just about addressing a single feature request; it was about rethinking how we can serve our users more effectively. By being open to evolving our approach and investing in scalable, efficient technologies, we’ve expanded the capabilities of DuneAPI, making it a powerful tool for developers.

We hope you find these new features and improvements valuable. As always, we’re excited to see how you’ll leverage them to create even more amazing applications and insights. Stay tuned for more updates, and keep those feedback and feature requests coming—they’re the real MVPs!

For more details, visit our API documentation.

Special thanks to Alsie, Jonathan and Wilhelm for their review and feedback.

Howwe'veimprovedDuneAPIusingDuckDB

Learn how Dune's enhanced API enables pagination, larger results, and new features like filtering and sorting.

Motivation & Context

Considering Future Growth

Conclusion

Related

Make real impact with onchain data