Hashquery is a Python framework for defining and querying BI models in your data warehouse.
Hashquery expressions are defined in Python, compiled into SQL, and run directly against your data warehouse. It is capable of expressing complex, multi-layered data queries, way beyond the capabilities of standard SQL. It natively integrates with upstream semantic layers and can be used together with Hashboard as a headless BI interface.
Hashquery is currently in early beta. This means we’ll do our best to keep it stable, but there are significant caveats to keep in mind as you try it out:
- Backwards compatibility will not be preserved between version upgrades, and you may need to update your package regularly to keep things working.
- You may encounter dialect-specific SQL syntax errors. If this happens, let us know by reporting an issue.
- There may be differences in the SQL logic generated by hashquery compared to the Hashboard app.
- Some errors might be hard to debug on your own (but we're happy to help!)
Project overview and live examples: https://hashquery.dev/
Full documentation: https://hashquery.dev/docs
Install hashquery with:
pip install hashquery
You can verify everything's working by running models against the demo project:
>>> from hashquery.demo import demo_project
>>> demo_project.models.products.limit(5).run().df
id pizza_size pizza_shape pizza_type price
0 14278024243148112051 Large Round Custom 13.20
1 9154428932967574098 Medium Round Custom 11.00
2 2699960497371169210 Large Square Margherita 11.52
3 12910763551309720704 Medium Round Margherita 8.00
4 1363512534661457390 Medium Square Veggie 13.20
A full guide to querying your own datasources can be found in the documentation.
from hashquery import attr, column, func
from hashquery.demo import demo_project
# Simple metric
sales = demo_project.models.sales
sales_by_year = (
sales.aggregate(
groups=[column("timestamp").by_year],
measures=[func.count()],
)
.sort(column("timestamp"))
)
print(sales_by_year.run().df)
"""
timestamp count
0 2018-01-01 1025
1 2019-01-01 1190
2 2020-01-01 2720
3 2021-01-01 4941
4 2022-01-01 4018
5 2023-01-01 3589
6 2024-01-01 642
"""
# Funnel analysis
events = demo_project.models.events_model
funnel = (
events
.with_activity_schema(
group=attr.user_id,
timestamp=attr.timestamp,
event_key=attr.event_type
)
.funnel(
"ad_impression",
"add_to_cart",
"buy")
)
print(funnel.run().df)
"""
step count
0 count 501
1 ad_impression 495
2 add_to_cart 465
3 buy 264
"""
During the beta, hashquery integrates with Hashboard to compile and execute SQL queries and supports the following databases:
- BigQuery
- Snowflake
- PostgreSQL
- MotherDuck / DuckDB
- Redshift
- Databricks
- AWS Athena
- ClickHouse
- How does hashquery fetch data? Hashquery models are compiled into SQL and run directly against your data warehouse. The backend caches results for performance, but there are no intermediate datastores — the computation all happens on your own infrastructure.
- How does this differ from other analytics DSLs like LookML or Malloy? We built hashquery because we loved the expressiveness and power of data modeling languages like Malloy but wanted to avoid the portability and developer experience drawbacks of custom DSLs. Hashquery is a python package that can run anywhere, and it can be used as a querying language, a semantic layer, or a headless BI API — or all three at once.
- Do I need to be a Hashboard user to use hashquery? During the beta, the hashquery SQL compiler is not available to run locally, so you do need to define your data connections inside of Hashboard and use its API to execute your queries. But we do plan on making the full hashquery stack available to run locally in the near future.