Pathway — open source GitHub project

Pathway is a Python framework for streaming data processing, real-time analytics, and ИИ pipelines.

What it is

Pathway is a Python framework for live data processing. It targets scenarios where data is constantly arriving from files, queues, databases, documents, or external sources.

It is visible in real-time analytics and ИИ pipeline work: RAG indexes, event processing, updates after source changes, vector databases, and LLM applications.

How the model works

Pathway lets users describe data transformations declaratively while the engine handles execution. You define schemas, sources, computations, and outputs.

The key difference from a batch script is continuous updating. When input changes, the system updates the result as a stream.

Small streaming job

This example shows the general style: define schema, connect a source, describe a transformation, and write results.

Language: Python

import pathway as pw

class Input(pw.Schema):
    value: int

table = pw.io.csv.read("./data", schema=Input, mode="streaming")
positive = table.filter(table.value > 0)
result = positive.groupby().reduce(total=pw.reducers.sum(positive.value))

pw.io.jsonlines.write(result, "./out.jsonl")
pw.run()

What is inside

The repository contains the framework core, Python API, connectors, examples, ИИ templates, and deployment documentation. It connects data engineering with modern LLM application needs.

Pathway is useful when data must stay current: a document changes, an index updates, and the answering system should see the new version without a full manual rebuild.

Strengths

The main strength is live data orientation, which separates it from scheduled ETL scripts that see only snapshots.

The Python interface also matters for teams already building analytics and ИИ applications in Python.

Limits

Pathway requires understanding the streaming model. If the job is static and monthly, a simple script or SQL query may be clearer.

Production systems still need operations, memory planning, monitoring, error handling, and reproducibility.