← All open source projects

Scrapling

D4Vinci/Scrapling

Scrapling is a Python web scraping framework for requests, crawling, and adaptive data extraction.

Forks 6,403
Author D4Vinci
Language Python
License BSD-3-Clause
Synced 2026-06-20

What it is

Scrapling is a Python framework for extracting data from websites. It covers the path from one-off requests to full crawlers and focuses on adaptive extraction for modern sites.

It lives in an area where simple `requests` is often not enough. Pages are dynamic, markup changes, data is hidden behind interface state, and extraction needs to survive small changes.

The approach

Scrapling brings together requests, sessions, browser-backed fetching, element selection, and spiders. You can start simple and add complexity only when the site requires it.

Adaptivity matters for long-running extractors. If a selector breaks after a small markup change, the system becomes constant manual repair. Scrapling tries to reduce that fragility.

Single extraction

This example shows the simple starting point: fetch a page, select elements, and collect data. Real projects add delays, storage, and respectful access rules.

Language: Python
from scrapling import Fetcher

page = Fetcher.get("https://example.com/articles")

for card in page.css(".article-card"):
    title = card.css_first("h2").text
    url = card.css_first("a").attrib["href"]
    print(title, url)

What is inside

The repository includes framework code, usage examples, spider mechanics, different page fetching approaches, and extraction documentation. It is aimed at more than throwaway scripts.

Scrapling does not remove the ethical and technical limits of data collection. Good scraping respects site rules, request rates, legal conditions, and caching.

Strengths

The main strength is one path from a simple request to a larger crawler. That is useful when a small script grows into sessions, retries, queues, and resilient selectors.

It is also oriented toward the modern web, where downloading HTML once is often not enough.

Limits

Web data collection always depends on the source. No framework can guarantee that a site will not change, restrict access, or forbid automation.

Aggressive crawling should also be avoided. A technically working script can still harm someone else’s infrastructure or violate terms of use.

Who it fits

Scrapling fits developers and analysts who need web extraction as a reproducible process rather than a one-off command.

The best start is a small set of pages, selector stability checks, and only then scaling the crawl.