Skip to content

Latest commit

 

History

History

python

PyMongoArrow

PyPI Version Python Versions Monthly Downloads Documentation Status

PyMongoArrow is a companion library to PyMongo that contains tools for loading MongoDB query result sets as Apache Arrow tables, Pandas DataFrames or NumPy arrays.

>>> from pymongoarrow.monkey import patch_all
... patch_all()
... from pymongoarrow.api import Schema
... schema = Schema({"_id": int, "qty": float})
... from pymongo import MongoClient
... client = MongoClient()
... client.db.data.insert_many(
...     [{"_id": 1, "qty": 25.4}, {"_id": 2, "qty": 16.9}, {"_id": 3, "qty": 2.3}]
... )
... data_frame = client.db.test.find_pandas_all({}, schema=schema)
... data_frame
   _id   qty
0    1  25.4
1    2  16.9
2    3   2.3
... arrow_table = client.db.test.find_arrow_all({}, schema=schema)
# The schema may also be omitted
... arrow_table = client.db.test.find_arrow_all({})
... arrow_table
pyarrow.Table
_id: int64
qty: double
... ndarrays = client.db.test.find_numpy_all({}, schema=schema)
... ndarrays
{'_id': array([1, 2, 3]), 'qty': array([25.4, 16.9,  2.3])}

PyMongoArrow is the recommended way to materialize MongoDB query result sets as contiguous-in-memory, typed arrays suited for in-memory analytical processing applications.

Installing PyMongoArrow

PyMongoArrow is available on PyPI:

python -m pip install pymongoarrow

To use PyMongoArrow with MongoDB Atlas' mongodb+srv:// URIs, you will need to also install PyMongo with the srv extra:

python -m pip install 'pymongo[srv]' pymongoarrow

To use PyMongoArrow APIs that return query result sets as pandas DataFrame instances, you will also need to have the pandas package installed:

python -m pip install pandas

Note: pymongoarrow is not supported or tested on big-endian systems (e.g. Linux s390x).

Development Install

See the instructions in the [Contributing Guide][./CONTRIBUTING.md]

Documentation

Full documentation is available on Read the Docs.