GitHub - danclaudiupop/robox: Simple library for exploring/scraping the web or testing a website you’re developing

Overview

Robox is a simple library with a clean interface for exploring/scraping the web or testing a website you’re developing. Robox can fetch a page, click on links and buttons, and fill out and submit forms. Robox is built on top of two excelent libraries: httpx and beautifulsoup4.

Robox has all the standard features of httpx, including async, plus:

clean api
caching
downloading files
history
retry
parsing tables
understands robots.txt

Examples

from robox import Robox


with Robox() as robox:
    page = robox.open("https://httpbin.org/forms/post")
    form = page.get_form()
    form.fill_in("custname", value="foo")
    form.check("topping", values=["Onion"])
    form.choose("size", option="Medium")
    form.fill_in("comments", value="all good in the hood")
    form.fill_in("delivery", value="13:37")
    page = page.submit_form(form)
    assert page.url == "https://httpbin.org/post"

or use async version:

import asyncio

from pprint import pprint
from robox import AsyncRobox


async def main():
    async with AsyncRobox(follow_redirects=True) as robox:
        page = await robox.open("https://www.google.com")
        form = page.get_form()
        form.fill_in("q", value="python")
        consent_page = await page.submit_form(form)
        form = consent_page.get_form()
        page = await consent_page.submit_form(form)
        links = page.get_links()
        pprint([link for link in links if "Python" in link.text])


asyncio.run(main())

Caching can be easily configured via httpx-cache

from robox import Robox, DictCache, FileCache


with Robox(options=Options(cache=DictCache())) as robox:
    p1 = robox.open("https://httpbin.org/get")
    assert not p1.from_cache
    p2 = robox.open("https://httpbin.org/get")
    assert p2.from_cache

Failed requests that are potentially caused by temporary problems such as a connection timeout or HTTP 500 error can be retried:

with Robox(
    options=Options(
        retry=True,
        retry_max_attempts=2,
        raise_on_4xx_5xx=True,
    )
) as robox:
    page = robox.open("https://httpbin.org/status/503,200")
    assert page.status_code == 200

Parse tables with rowspan and colspan:

with Robox() as robox:
    page = robox.open("https://html.com/tables/rowspan-colspan/")
    tables = page.get_tables()
    for table in tables:
        pprint(table.get_rows())

[['65', '65', '40', '40', '20', '20'],
 ['Men', 'Women', 'Men', 'Women', 'Men', 'Women'],
 ['82', '85', '78', '82', '77', '81']]
 ...

An example on how to reuse authentication state with cookies:

with Robox() as robox:
    page = robox.open("https://news.ycombinator.com/login")
    form = page.get_forms()[0]
    form.fill_in("acct", value=os.getenv("PASSWORD"))
    form.fill_in("pw", value=os.getenv("USERNAME"))
    page.submit_form(form)
    robox.save_cookies("cookies.json")


with Robox() as robox:
    robox.load_cookies("cookies.json")
    page = robox.open("https://news.ycombinator.com/")
    assert page.parsed.find("a", attrs={"id": "logout"})

See examples folder for more detailed examples.

Installation

Using pip:

pip install robox

Robox requires Python 3.8+. See Changelog for changes.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
examples		examples
src/robox		src/robox
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

examples

examples

src/robox

src/robox

tests

tests

.gitignore

.gitignore

.pre-commit-config.yaml

.pre-commit-config.yaml

CHANGELOG.md

CHANGELOG.md

LICENSE

LICENSE

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

Overview

Examples

Installation

About

Releases 5

Packages

Contributors 2

Languages

License

danclaudiupop/robox

Folders and files

Latest commit

History

Repository files navigation

Overview

Examples

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages