DuckDB — open source GitHub project

What It Is

DuckDB is a high-performance analytical SQL database that runs inside the application process. It is often called “SQLite for analytics,” which is a helpful analogy rather than a full definition.

The project is built for fast queries over data near an application or notebook: files, tables, intermediate exports, and local datasets.

DuckDB emphasizes portability and ease of use. Install a client, connect a file or DataFrame, and run SQL without a separate database server.

What Is Inside

DuckDB supports a rich SQL dialect: nested and correlated subqueries, window functions, collations, complex types such as arrays, structs, and maps, plus extensions that make SQL more convenient.

The project is available as a standalone command-line application and as client libraries for Python, R, Java, Wasm, and other environments. Deep pandas and dplyr integrations are especially useful for analysts.

Unlike server analytical databases, DuckDB does not require a separate service. That is convenient for local analysis, tests, embedded analytics, and file processing.

How People Use It

A typical scenario is loading CSV, Parquet, or a DataFrame, running SQL, and getting aggregates without configuring a cluster. That makes medium-sized data work much faster.

Developers use DuckDB in applications that need analytics close to the user or inside the process: reports, data checks, export processing, and local prototypes.

The limitation is that DuckDB is not a replacement for a multi-user transactional database. Continuous writes from many clients, roles, and network access need another layer.

Python Query

The example shows DuckDB’s strength: SQL can run directly over data already loaded in Python, without a separate server.

Strengths And Limits

DuckDB’s strength is analytical SQL in places where teams previously wrote lots of data-processing code. It works well as a local query engine.

The weak point is the product boundary. Centralized storage, access control, persistent network connections, and multi-user scale require additional systems.

DuckDB fits analysts, data engineers, report developers, and local-tool authors. For high-write transactional workloads, a server database is the better starting point.

Example

SQL Over A DataFrame

The example shows how DuckDB can run an analytical query over data already loaded in Python.

Language: Python

import duckdb
import pandas as pd

sales = pd.DataFrame({
    "region": ["EU", "EU", "US"],
    "amount": [120, 80, 150],
})

result = duckdb.sql("""
    SELECT region, sum(amount) AS total
    FROM sales
    GROUP BY region
""").df()
print(result)