Daft SQL#
Daft's SQL dialect closely follows both DuckDB and PostgreSQL. For a full list of SQL operations, check out our SQL Reference.
Example#
Please see Sessions and Catalogs for a detailed look at connecting data sources to Daft SQL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Usage#
SQL with DataFrames#
Daft's daft.sql
function automatically detects any daft.DataFrame
objects in your current Python environment to let you query them easily by name.
1 2 3 4 5 6 7 |
|
Output | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
In the above example, we query the DataFrame called "my_special_df"
by simply referring to it in the SQL command. This produces a new DataFrame sql_df
which can natively integrate with the rest of your Daft query. You can also use table functions to query sources directly.
1 2 |
|
SQL Expressions#
SQL has the concept of expressions as well. Here is an example of a simple addition expression, adding columns A
and B
in SQL to produce a new column C
.
We also present here the equivalent query for SQL and DataFrame. Notice how similar the concepts are!
1 2 3 |
|
1 2 3 4 5 |
|
Output | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
In the above query, both the SQL version of the query and the DataFrame version of the query produce the same result.
Under the hood, they run the same Expression col("A") + col("B")
!
One really cool trick you can do is to use the daft.sql_expr
function as a helper to easily create Expressions. The following are equivalent:
1 2 |
|
1 2 |
|
Output | |
---|---|
1 2 |
|
This means that you can pretty much use SQL anywhere you use Python expressions, making Daft extremely versatile at mixing workflows which leverage both SQL and Python.
As an example, consider the filter query below and compare the two equivalent Python and SQL queries:
1 2 3 4 5 6 |
|
1 2 3 4 5 6 |
|
Output | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Pretty sweet! Of course, this support for running Expressions on your columns extends well beyond arithmetic as we'll see in the next section on SQL Functions.
SQL Functions#
SQL also has access to all of Daft's powerful daft.Expression
functionality through SQL functions.
However, unlike the Python Expression API which encourages method-chaining (e.g. col("a").url.download().image.decode()
), in SQL you have to do function nesting instead (e.g. "image_decode(url_download(a))"
).
Note
A full catalog of the available SQL Functions in Daft is available in SQL Reference
.
Note that it closely mirrors the Python API, with some function naming differences vs the available Python methods. We also have some aliased functions for ANSI SQL-compliance or familiarity to users coming from other common SQL dialects such as PostgreSQL and SparkSQL to easily find their functionality.
Here is an example of an equivalent function call in SQL vs Python:
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 |
|
Output | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|