Skip to content

Window Functions#

Window functions allow you to perform calculations across a set of rows that are related to the current row. They operate on a group of rows (called a window frame) and return a result for each row based on the values in its window frame, without collapsing the result into a single row like aggregate functions do. Learn more about Window Functions in the Daft User Guide.

Window #

Window()

Describes how to partition data and in what order to apply the window function.

This class provides a way to specify window definitions for window functions. Window functions operate on a group of rows (called a window frame) and return a result for each row based on the values in its window frame.

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
>>> from daft import Window, col
>>>
>>> # Basic window aggregation with a single partition column:
>>> window_spec = Window().partition_by("category")
>>> df = df.select(
...     col("value").sum().over(window_spec).alias("category_total"),
...     col("value").mean().over(window_spec).alias("category_avg"),
... )
>>>
>>> # Partitioning by multiple columns:
>>> window_spec = Window().partition_by(["department", "category"])
>>> df = df.select(col("sales").sum().over(window_spec).alias("dept_category_total"))
>>>
>>> # Using window aggregations in expressions:
>>> window_spec = Window().partition_by("category")
>>> df = df.select((col("value") / col("value").sum().over(window_spec)).alias("pct_of_category"))

Methods:

Name Description
order_by

Orders rows within each partition by specified columns or expressions.

partition_by

Partitions the dataset by one or more columns or expressions.

range_between

Restricts each window to a range-based frame between start and end boundaries.

rows_between

Restricts each window to a row-based frame between start and end boundaries.

Attributes:

Name Type Description
current_row
unbounded_following
unbounded_preceding
Source code in daft/window.py
43
44
def __init__(self) -> None:
    self._spec = _WindowSpec.new()

current_row #

current_row = offset(0)

unbounded_following #

unbounded_following = unbounded_following()

unbounded_preceding #

unbounded_preceding = unbounded_preceding()

order_by #

order_by(
    *cols: ManyColumnsInputType,
    desc: bool | list[bool] = False,
) -> Window

Orders rows within each partition by specified columns or expressions.

Parameters:

Name Type Description Default
*cols ManyColumnsInputType

Columns or expressions to determine ordering within the partition. Can be column names as strings, Expression objects, or iterables of these.

()
desc bool | list[bool]

Sort descending (True) or ascending (False). Can be a single boolean value applied to all columns, or a list of boolean values corresponding to each column. Default is False (ascending).

False

Returns:

Name Type Description
Window Window

A window specification with the given ordering.

Examples:

1
2
3
4
5
6
7
>>> from daft import Window, col
>>> # Order by 'date' ascending (default)
>>> window_spec = Window().partition_by("category").order_by("date")
>>> # Order by 'sales' descending
>>> window_spec_desc = Window().partition_by("category").order_by("sales", desc=True)
>>> # Order by 'date' ascending and 'sales' descending
>>> window_spec_multi = Window().partition_by("category").order_by("date", "sales", desc=[False, True])
Source code in daft/window.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def order_by(self, *cols: ManyColumnsInputType, desc: bool | list[bool] = False) -> Window:
    """Orders rows within each partition by specified columns or expressions.

    Args:
        *cols: Columns or expressions to determine ordering within the partition.
               Can be column names as strings, Expression objects, or iterables of these.
        desc: Sort descending (True) or ascending (False). Can be a single boolean value applied to all columns,
             or a list of boolean values corresponding to each column. Default is False (ascending).

    Returns:
        Window: A window specification with the given ordering.

    Examples:
        >>> from daft import Window, col
        >>> # Order by 'date' ascending (default)
        >>> window_spec = Window().partition_by("category").order_by("date")
        >>> # Order by 'sales' descending
        >>> window_spec_desc = Window().partition_by("category").order_by("sales", desc=True)
        >>> # Order by 'date' ascending and 'sales' descending
        >>> window_spec_multi = Window().partition_by("category").order_by("date", "sales", desc=[False, True])
    """
    expressions = []
    for c in cols:
        expressions.extend(column_inputs_to_expressions(c))

    if isinstance(desc, bool):
        desc_flags = [desc] * len(expressions)
    else:
        if len(desc) != len(expressions):
            raise ValueError("Length of descending flags must match number of order by columns")
        desc_flags = desc

    window = Window()
    window._spec = self._spec.with_order_by([expr._expr for expr in expressions], desc_flags)
    return window

partition_by #

partition_by(*cols: ManyColumnsInputType) -> Window

Partitions the dataset by one or more columns or expressions.

Parameters:

Name Type Description Default
*cols ManyColumnsInputType

Columns or expressions on which to partition data. Can be column names as strings, Expression objects, or iterables of these.

()

Returns:

Name Type Description
Window Window

A window specification with the given partitioning.

Raises:

Type Description
ValueError

If no partition columns are specified.

Examples:

1
2
3
4
5
>>> from daft import Window, col
>>> # Partition by a single column 'category'
>>> window_spec = Window().partition_by("category")
>>> # Partition by multiple columns 'department' and 'region'
>>> window_spec_multi = Window().partition_by("department", "region")
Source code in daft/window.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def partition_by(self, *cols: ManyColumnsInputType) -> Window:
    """Partitions the dataset by one or more columns or expressions.

    Args:
        *cols: Columns or expressions on which to partition data.
               Can be column names as strings, Expression objects, or iterables of these.

    Returns:
        Window: A window specification with the given partitioning.

    Raises:
        ValueError: If no partition columns are specified.

    Examples:
        >>> from daft import Window, col
        >>> # Partition by a single column 'category'
        >>> window_spec = Window().partition_by("category")
        >>> # Partition by multiple columns 'department' and 'region'
        >>> window_spec_multi = Window().partition_by("department", "region")
    """
    if not cols:
        raise ValueError("At least one partition column must be specified")

    expressions = []
    for c in cols:
        expressions.extend(column_inputs_to_expressions(c))

    if not expressions:
        raise ValueError("At least one partition column must be specified")

    window = Window()
    window._spec = self._spec.with_partition_by([expr._expr for expr in expressions])
    return window

range_between #

range_between(
    start: Any, end: Any, min_periods: int = 1
) -> Window

Restricts each window to a range-based frame between start and end boundaries.

This defines a window frame based on a range of values relative to the current row's value in the ordering column. Requires exactly one order_by column, which must be numeric or temporal type.

Parameters:

Name Type Description Default
start Any

Boundary definition for the start of the window's range. Can be:

  • Window.unbounded_preceding: Include all rows with order value <= current row's order value + start offset.
  • Window.current_row: Start range at the current row's order value.
  • Offset value (e.g., -10, datetime.timedelta(days=-1)): The start of the range is defined as current_row_order_value + start. The type of the offset must match the order-by column type. Negative values indicate a lower bound less than the current row's value. Positive values indicate a lower bound more than the current row's value.
required
end Any

Boundary definition for the end of the window's range. Syntax is similar to start. Negative values indicate an upper bound less than the current row's value. Positive values indicate an upper bound more than the current row's value.

required
min_periods int

Minimum number of rows required in the window frame to compute a result (default = 1). If fewer rows exist in the frame, the function returns NULL.

1

Returns:

Name Type Description
Window Window

A window specification with the given range-based frame bounds.

Raises:

Type Description
NotImplementedError

This feature is not yet implemented.

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
>>> from daft import Window, col
>>> import datetime
>>> # Assume df has columns 'sensor_id', 'timestamp', 'reading'
>>> # Frame includes rows within 10 units *before* the current row's reading
>>> val_window = Window().partition_by("sensor_id").order_by("reading").range_between(-10, Window.current_row)
>>> # Frame includes rows from 1 day before to 1 day after the current row's timestamp
>>> time_window = (
...     Window()
...     .partition_by("sensor_id")
...     .order_by("timestamp")
...     .range_between(datetime.timedelta(days=-1), datetime.timedelta(days=1))
... )
Source code in daft/window.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
def range_between(
    self,
    start: Any,
    end: Any,
    min_periods: int = 1,
) -> Window:
    """Restricts each window to a range-based frame between start and end boundaries.

    This defines a window frame based on a range of values relative to the current row's
    value in the ordering column. Requires exactly one `order_by` column, which must be
    numeric or temporal type.

    Args:
        start: Boundary definition for the start of the window's range. Can be:

            *   ``Window.unbounded_preceding``: Include all rows with order value <= current row's order value + start offset.
            *   ``Window.current_row``: Start range at the current row's order value.
            *   Offset value (e.g., ``-10``, ``datetime.timedelta(days=-1)``): The start of the range is defined as
                `current_row_order_value + start`. The type of the offset must match the order-by column type.
                Negative values indicate a lower bound *less than* the current row's value. Positive values indicate a lower bound
                *more than* the current row's value.
        end: Boundary definition for the end of the window's range. Syntax is similar to `start`.
            Negative values indicate an upper bound *less than* the current row's value. Positive values indicate an upper bound
            *more than* the current row's value.
        min_periods: Minimum number of rows required in the window frame to compute a result (default = 1).
            If fewer rows exist in the frame, the function returns NULL.

    Returns:
        Window: A window specification with the given range-based frame bounds.

    Raises:
        NotImplementedError: This feature is not yet implemented.

    Examples:
        >>> from daft import Window, col
        >>> import datetime
        >>> # Assume df has columns 'sensor_id', 'timestamp', 'reading'
        >>> # Frame includes rows within 10 units *before* the current row's reading
        >>> val_window = Window().partition_by("sensor_id").order_by("reading").range_between(-10, Window.current_row)
        >>> # Frame includes rows from 1 day before to 1 day after the current row's timestamp
        >>> time_window = (
        ...     Window()
        ...     .partition_by("sensor_id")
        ...     .order_by("timestamp")
        ...     .range_between(datetime.timedelta(days=-1), datetime.timedelta(days=1))
        ... )
    """
    if isinstance(start, _PyWindowBoundary):
        start_boundary = start
    else:
        start_expr = Expression._to_expression(start)
        start_boundary = _PyWindowBoundary.range_offset(start_expr._expr)

    if isinstance(end, _PyWindowBoundary):
        end_boundary = end
    else:
        end_expr = Expression._to_expression(end)
        end_boundary = _PyWindowBoundary.range_offset(end_expr._expr)

    frame = _WindowFrame(
        start=start_boundary,
        end=end_boundary,
    )

    new_window = Window()
    new_window._spec = self._spec.with_frame(frame).with_min_periods(min_periods)
    return new_window

rows_between #

rows_between(
    start: int | PyWindowBoundary,
    end: int | PyWindowBoundary,
    min_periods: int = 1,
) -> Window

Restricts each window to a row-based frame between start and end boundaries.

This defines a sliding window based on row offsets relative to the current row.

Parameters:

Name Type Description Default
start int | PyWindowBoundary

Boundary definitions for the start of the window. Can be:

  • Window.unbounded_preceding: Include all rows before the current row.
  • Window.current_row: Start at the current row.
  • Integer value (e.g., -3): A negative integer indicates the number of rows preceding the current row.
  • Integer value (e.g., 1): A positive integer indicates the number of rows following the current row.
required
end int | PyWindowBoundary

Boundary definitions for the end of the window. Can be:

  • Window.unbounded_following: Include all rows after the current row.
  • Window.current_row: End at the current row.
  • Integer value (e.g., 1): A positive integer indicates the number of rows following the current row.
  • Integer value (e.g., -1): A negative integer indicates the number of rows preceding the current row.
required
min_periods int

Minimum number of rows required in the window frame to compute a result (default = 1). If fewer rows exist in the frame, the function returns NULL.

1

Returns:

Name Type Description
Window Window

A window specification with the given row-based frame bounds.

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
>>> from daft import Window
>>> # Frame includes the current row and the 2 preceding rows
>>> window_spec = Window().partition_by("cat").order_by("val").rows_between(-2, Window.current_row)
>>> # Frame includes all rows from the beginning of the partition up to the current row
>>> cum_window = (
...     Window()
...     .partition_by("cat")
...     .order_by("val")
...     .rows_between(Window.unbounded_preceding, Window.current_row)
... )
>>> # Frame includes the preceding row, current row, and following row
>>> sliding_window = Window().partition_by("cat").order_by("val").rows_between(-1, 1)
>>> # Frame includes the current row and the 3 following rows, requiring at least 2 rows
>>> lookahead_window = (
...     Window().partition_by("cat").order_by("val").rows_between(Window.current_row, 3, min_periods=2)
... )
Source code in daft/window.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
def rows_between(
    self,
    start: int | _PyWindowBoundary,
    end: int | _PyWindowBoundary,
    min_periods: int = 1,
) -> Window:
    """Restricts each window to a row-based frame between start and end boundaries.

    This defines a sliding window based on row offsets relative to the current row.

    Args:
        start: Boundary definitions for the start of the window. Can be:

            *   ``Window.unbounded_preceding``: Include all rows before the current row.
            *   ``Window.current_row``: Start at the current row.
            *   Integer value (e.g., ``-3``): A negative integer indicates the number of rows *preceding* the current row.
            *   Integer value (e.g., ``1``): A positive integer indicates the number of rows *following* the current row.
        end: Boundary definitions for the end of the window. Can be:

            *   ``Window.unbounded_following``: Include all rows after the current row.
            *   ``Window.current_row``: End at the current row.
            *   Integer value (e.g., ``1``): A positive integer indicates the number of rows *following* the current row.
            *   Integer value (e.g., ``-1``): A negative integer indicates the number of rows *preceding* the current row.
        min_periods: Minimum number of rows required in the window frame to compute a result (default = 1).
            If fewer rows exist in the frame, the function returns NULL.

    Returns:
        Window: A window specification with the given row-based frame bounds.

    Examples:
        >>> from daft import Window
        >>> # Frame includes the current row and the 2 preceding rows
        >>> window_spec = Window().partition_by("cat").order_by("val").rows_between(-2, Window.current_row)
        >>> # Frame includes all rows from the beginning of the partition up to the current row
        >>> cum_window = (
        ...     Window()
        ...     .partition_by("cat")
        ...     .order_by("val")
        ...     .rows_between(Window.unbounded_preceding, Window.current_row)
        ... )
        >>> # Frame includes the preceding row, current row, and following row
        >>> sliding_window = Window().partition_by("cat").order_by("val").rows_between(-1, 1)
        >>> # Frame includes the current row and the 3 following rows, requiring at least 2 rows
        >>> lookahead_window = (
        ...     Window().partition_by("cat").order_by("val").rows_between(Window.current_row, 3, min_periods=2)
        ... )
    """
    if isinstance(start, int):
        start = _PyWindowBoundary.offset(start)
    if isinstance(end, int):
        end = _PyWindowBoundary.offset(end)

    frame = _WindowFrame(
        start=start,
        end=end,
    )

    new_window = Window()
    new_window._spec = self._spec.with_frame(frame).with_min_periods(min_periods)
    return new_window

Applying Window Functions#

over #

over(window: Window) -> Expression

Apply the expression as a window function.

Parameters:

Name Type Description Default
window Window

The window specification (created using daft.Window) defining partitioning, ordering, and framing.

required

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
>>> import daft
>>> from daft import Window, col
>>> df = daft.from_pydict(
...     {
...         "group": ["A", "A", "A", "B", "B", "B"],
...         "date": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06"],
...         "value": [1, 2, 3, 4, 5, 6],
...     }
... )
>>> window_spec = Window().partition_by("group").order_by("date")
>>> df = df.with_column("cumulative_sum", col("value").sum().over(window_spec))
>>> df.sort(["group", "date"]).show()
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ group โ”† date       โ”† value โ”† cumulative_sum โ”‚
โ”‚ ---   โ”† ---        โ”† ---   โ”† ---            โ”‚
โ”‚ Utf8  โ”† Utf8       โ”† Int64 โ”† Int64          โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ A     โ”† 2020-01-01 โ”† 1     โ”† 1              โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A     โ”† 2020-01-02 โ”† 2     โ”† 3              โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A     โ”† 2020-01-03 โ”† 3     โ”† 6              โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B     โ”† 2020-01-04 โ”† 4     โ”† 4              โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B     โ”† 2020-01-05 โ”† 5     โ”† 9              โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B     โ”† 2020-01-06 โ”† 6     โ”† 15             โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

(Showing first 6 of 6 rows)

Returns:

Name Type Description
Expression Expression

The result of applying this expression as a window function.

Source code in daft/expressions/expressions.py
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
@ExpressionPublicAPI
def over(self, window: Window) -> Expression:
    """Apply the expression as a window function.

    Args:
        window: The window specification (created using ``daft.Window``)
            defining partitioning, ordering, and framing.

    Examples:
        >>> import daft
        >>> from daft import Window, col
        >>> df = daft.from_pydict(
        ...     {
        ...         "group": ["A", "A", "A", "B", "B", "B"],
        ...         "date": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06"],
        ...         "value": [1, 2, 3, 4, 5, 6],
        ...     }
        ... )
        >>> window_spec = Window().partition_by("group").order_by("date")
        >>> df = df.with_column("cumulative_sum", col("value").sum().over(window_spec))
        >>> df.sort(["group", "date"]).show()
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
        โ”‚ group โ”† date       โ”† value โ”† cumulative_sum โ”‚
        โ”‚ ---   โ”† ---        โ”† ---   โ”† ---            โ”‚
        โ”‚ Utf8  โ”† Utf8       โ”† Int64 โ”† Int64          โ”‚
        โ•žโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
        โ”‚ A     โ”† 2020-01-01 โ”† 1     โ”† 1              โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A     โ”† 2020-01-02 โ”† 2     โ”† 3              โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A     โ”† 2020-01-03 โ”† 3     โ”† 6              โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B     โ”† 2020-01-04 โ”† 4     โ”† 4              โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B     โ”† 2020-01-05 โ”† 5     โ”† 9              โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B     โ”† 2020-01-06 โ”† 6     โ”† 15             โ”‚
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
        <BLANKLINE>
        (Showing first 6 of 6 rows)

    Returns:
        Expression: The result of applying this expression as a window function.
    """
    expr = self._expr.over(window._spec)
    return Expression._from_pyexpr(expr)

Aggregate Functions#

Standard aggregate functions (e.g., sum, mean, count, min, max, etc) can be used as window functions by applying them with .over. They work with all valid window specifications (partition by only, partition + order by, partition + order by + frame). Refer to the Expressions API for a full list of aggregate functions.

Note

When using aggregate functions with both partition by and order by, the default window frame includes all rows from the start of the partition up to the current row โ€” equivalent to rows between unbounded preceding and current row.

Ranking Functions#

These functions compute ranks within a window partition. They require an order_by clause without a rows_between or range_between clause in the window specification.

row_number #

row_number() -> Expression

Return the row number of the current row (used for window functions).

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
>>> import daft
>>> from daft.window import Window
>>> from daft.functions import row_number
>>> df = daft.from_pydict({"category": ["A", "A", "A", "A", "B", "B", "B", "B"], "value": [1, 7, 2, 9, 1, 3, 3, 7]})
>>>
>>> # Ascending order
>>> window = Window().partition_by("category").order_by("value")
>>> df = df.with_column("row", row_number().over(window))
>>> df = df.sort("category")
>>> df.show()
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ category โ”† value โ”† row    โ”‚
โ”‚ ---      โ”† ---   โ”† ---    โ”‚
โ”‚ Utf8     โ”† Int64 โ”† UInt64 โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ A        โ”† 1     โ”† 1      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 2     โ”† 2      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 7     โ”† 3      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 9     โ”† 4      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 1     โ”† 1      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 3     โ”† 2      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 3     โ”† 3      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 7     โ”† 4      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

(Showing first 8 rows)

Returns:

Name Type Description
Expression Expression

An expression that returns the row number of the current row.

Source code in daft/functions/functions.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
def row_number() -> Expression:
    """Return the row number of the current row (used for window functions).

    Examples:
        >>> import daft
        >>> from daft.window import Window
        >>> from daft.functions import row_number
        >>> df = daft.from_pydict({"category": ["A", "A", "A", "A", "B", "B", "B", "B"], "value": [1, 7, 2, 9, 1, 3, 3, 7]})
        >>>
        >>> # Ascending order
        >>> window = Window().partition_by("category").order_by("value")
        >>> df = df.with_column("row", row_number().over(window))
        >>> df = df.sort("category")
        >>> df.show()
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
        โ”‚ category โ”† value โ”† row    โ”‚
        โ”‚ ---      โ”† ---   โ”† ---    โ”‚
        โ”‚ Utf8     โ”† Int64 โ”† UInt64 โ”‚
        โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
        โ”‚ A        โ”† 1     โ”† 1      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 2     โ”† 2      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 7     โ”† 3      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 9     โ”† 4      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 1     โ”† 1      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 3     โ”† 2      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 3     โ”† 3      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 7     โ”† 4      โ”‚
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
        <BLANKLINE>
        (Showing first 8 rows)

    Returns:
        Expression: An expression that returns the row number of the current row.
    """
    return Expression._from_pyexpr(native.row_number())

rank #

rank() -> Expression

Return the rank of the current row (used for window functions).

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
>>> import daft
>>> from daft.window import Window
>>> from daft.functions import rank
>>> df = daft.from_pydict({"category": ["A", "A", "A", "A", "B", "B", "B", "B"], "value": [1, 3, 3, 7, 7, 7, 4, 4]})
>>>
>>> window = Window().partition_by("category").order_by("value", desc=True)
>>> df = df.with_column("rank", rank().over(window))
>>> df = df.sort("category")
>>> df.show()
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ category โ”† value โ”† rank   โ”‚
โ”‚ ---      โ”† ---   โ”† ---    โ”‚
โ”‚ Utf8     โ”† Int64 โ”† UInt64 โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ A        โ”† 7     โ”† 1      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 3     โ”† 2      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 3     โ”† 2      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 1     โ”† 4      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 7     โ”† 1      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 7     โ”† 1      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 4     โ”† 3      โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 4     โ”† 3      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

(Showing first 8 rows)

Returns:

Name Type Description
Expression Expression

An expression that returns the rank of the current row.

Source code in daft/functions/functions.py
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
def rank() -> Expression:
    """Return the rank of the current row (used for window functions).

    Examples:
        >>> import daft
        >>> from daft.window import Window
        >>> from daft.functions import rank
        >>> df = daft.from_pydict({"category": ["A", "A", "A", "A", "B", "B", "B", "B"], "value": [1, 3, 3, 7, 7, 7, 4, 4]})
        >>>
        >>> window = Window().partition_by("category").order_by("value", desc=True)
        >>> df = df.with_column("rank", rank().over(window))
        >>> df = df.sort("category")
        >>> df.show()
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
        โ”‚ category โ”† value โ”† rank   โ”‚
        โ”‚ ---      โ”† ---   โ”† ---    โ”‚
        โ”‚ Utf8     โ”† Int64 โ”† UInt64 โ”‚
        โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
        โ”‚ A        โ”† 7     โ”† 1      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 3     โ”† 2      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 3     โ”† 2      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 1     โ”† 4      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 7     โ”† 1      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 7     โ”† 1      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 4     โ”† 3      โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 4     โ”† 3      โ”‚
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
        <BLANKLINE>
        (Showing first 8 rows)

    Returns:
        Expression: An expression that returns the rank of the current row.
    """
    return Expression._from_pyexpr(native.rank())

dense_rank #

dense_rank() -> Expression

Return the dense rank of the current row (used for window functions).

The dense rank is the rank of the current row without gaps.

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
>>> import daft
>>> from daft.window import Window
>>> from daft.functions import dense_rank
>>> df = daft.from_pydict({"category": ["A", "A", "A", "A", "B", "B", "B", "B"], "value": [1, 3, 3, 7, 7, 7, 4, 4]})
>>>
>>> window = Window().partition_by("category").order_by("value", desc=True)
>>> df = df.with_column("dense_rank", dense_rank().over(window))
>>> df = df.sort("category")
>>> df.show()
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ category โ”† value โ”† dense_rank โ”‚
โ”‚ ---      โ”† ---   โ”† ---        โ”‚
โ”‚ Utf8     โ”† Int64 โ”† UInt64     โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ A        โ”† 7     โ”† 1          โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 3     โ”† 2          โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 3     โ”† 2          โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 1     โ”† 3          โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 7     โ”† 1          โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 7     โ”† 1          โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 4     โ”† 2          โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 4     โ”† 2          โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

(Showing first 8 rows)

Returns:

Name Type Description
Expression Expression

An expression that returns the dense rank of the current row.

Source code in daft/functions/functions.py
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
def dense_rank() -> Expression:
    """Return the dense rank of the current row (used for window functions).

    The dense rank is the rank of the current row without gaps.

    Examples:
        >>> import daft
        >>> from daft.window import Window
        >>> from daft.functions import dense_rank
        >>> df = daft.from_pydict({"category": ["A", "A", "A", "A", "B", "B", "B", "B"], "value": [1, 3, 3, 7, 7, 7, 4, 4]})
        >>>
        >>> window = Window().partition_by("category").order_by("value", desc=True)
        >>> df = df.with_column("dense_rank", dense_rank().over(window))
        >>> df = df.sort("category")
        >>> df.show()
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
        โ”‚ category โ”† value โ”† dense_rank โ”‚
        โ”‚ ---      โ”† ---   โ”† ---        โ”‚
        โ”‚ Utf8     โ”† Int64 โ”† UInt64     โ”‚
        โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
        โ”‚ A        โ”† 7     โ”† 1          โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 3     โ”† 2          โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 3     โ”† 2          โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 1     โ”† 3          โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 7     โ”† 1          โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 7     โ”† 1          โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 4     โ”† 2          โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 4     โ”† 2          โ”‚
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
        <BLANKLINE>
        (Showing first 8 rows)

    Returns:
        Expression: An expression that returns the dense rank of the current row.
    """
    return Expression._from_pyexpr(native.dense_rank())

Lead/Lag Functions#

These functions access data from preceding or succeeding rows within a window partition. They require an order_by clause without a rows_between or range_between clause in the window specification.

lag #

lag(
    offset: int = 1, default: Any | None = None
) -> Expression

Get the value from a previous row within a window partition.

Parameters:

Name Type Description Default
offset int

The number of rows to shift backward. Must be >= 0.

1
default Any | None

Value to use when no previous row exists. Can be a column reference.

None

Returns:

Name Type Description
Expression Expression

Value from the row offset positions before the current row.

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
>>> import daft
>>> from daft import Window, col
>>> df = daft.from_pydict(
...     {
...         "category": ["A", "A", "A", "B", "B", "B"],
...         "value": [1, 2, 3, 4, 5, 6],
...         "default_val": [10, 20, 30, 40, 50, 60],
...     }
... )
>>>
>>> # Simple lag with null default
>>> window = Window().partition_by("category").order_by("value")
>>> df = df.with_column("lagged", col("value").lag(1).over(window))
>>>
>>> # Lag with column reference as default
>>> df = df.with_column("lagged_with_default", col("value").lag(1, default=col("default_val")).over(window))
>>> df.sort(["category", "value"]).show()
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ category โ”† value โ”† default_val โ”† lagged โ”† lagged_with_default โ”‚
โ”‚ ---      โ”† ---   โ”† ---         โ”† ---    โ”† ---                 โ”‚
โ”‚ Utf8     โ”† Int64 โ”† Int64       โ”† Int64  โ”† Int64               โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ A        โ”† 1     โ”† 10          โ”† None   โ”† 10                  โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 2     โ”† 20          โ”† 1      โ”† 1                   โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 3     โ”† 30          โ”† 2      โ”† 2                   โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 4     โ”† 40          โ”† None   โ”† 40                  โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 5     โ”† 50          โ”† 4      โ”† 4                   โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 6     โ”† 60          โ”† 5      โ”† 5                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

(Showing first 6 of 6 rows)
Source code in daft/expressions/expressions.py
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
@ExpressionPublicAPI
def lag(self, offset: int = 1, default: Any | None = None) -> Expression:
    """Get the value from a previous row within a window partition.

    Args:
        offset: The number of rows to shift backward. Must be >= 0.
        default: Value to use when no previous row exists. Can be a column reference.

    Returns:
        Expression: Value from the row `offset` positions before the current row.

    Examples:
        >>> import daft
        >>> from daft import Window, col
        >>> df = daft.from_pydict(
        ...     {
        ...         "category": ["A", "A", "A", "B", "B", "B"],
        ...         "value": [1, 2, 3, 4, 5, 6],
        ...         "default_val": [10, 20, 30, 40, 50, 60],
        ...     }
        ... )
        >>>
        >>> # Simple lag with null default
        >>> window = Window().partition_by("category").order_by("value")
        >>> df = df.with_column("lagged", col("value").lag(1).over(window))
        >>>
        >>> # Lag with column reference as default
        >>> df = df.with_column("lagged_with_default", col("value").lag(1, default=col("default_val")).over(window))
        >>> df.sort(["category", "value"]).show()
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
        โ”‚ category โ”† value โ”† default_val โ”† lagged โ”† lagged_with_default โ”‚
        โ”‚ ---      โ”† ---   โ”† ---         โ”† ---    โ”† ---                 โ”‚
        โ”‚ Utf8     โ”† Int64 โ”† Int64       โ”† Int64  โ”† Int64               โ”‚
        โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
        โ”‚ A        โ”† 1     โ”† 10          โ”† None   โ”† 10                  โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 2     โ”† 20          โ”† 1      โ”† 1                   โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 3     โ”† 30          โ”† 2      โ”† 2                   โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 4     โ”† 40          โ”† None   โ”† 40                  โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 5     โ”† 50          โ”† 4      โ”† 4                   โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 6     โ”† 60          โ”† 5      โ”† 5                   โ”‚
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
        <BLANKLINE>
        (Showing first 6 of 6 rows)
    """
    if default is not None:
        default = Expression._to_expression(default)
    expr = self._expr.offset(-offset, default._expr if default is not None else None)
    return Expression._from_pyexpr(expr)

lead #

lead(
    offset: int = 1, default: Any | None = None
) -> Expression

Get the value from a future row within a window partition.

Parameters:

Name Type Description Default
offset int

The number of rows to shift forward. Must be >= 0.

1
default Any | None

Value to use when no future row exists. Can be a column reference.

None

Returns:

Name Type Description
Expression Expression

Value from the row offset positions after the current row.

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
>>> import daft
>>> from daft import Window, col
>>> df = daft.from_pydict(
...     {
...         "category": ["A", "A", "A", "B", "B", "B"],
...         "value": [1, 2, 3, 4, 5, 6],
...         "default_val": [10, 20, 30, 40, 50, 60],
...     }
... )
>>>
>>> # Simple lag with null default
>>> window = Window().partition_by("category").order_by("value")
>>> df = df.with_column("lead", col("value").lead(1).over(window))
>>>
>>> # Lead with column reference as default
>>> df = df.with_column("lead_with_default", col("value").lead(1, default=col("default_val")).over(window))
>>> df.sort(["category", "value"]).show()
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ category โ”† value โ”† default_val โ”† lead  โ”† lead_with_default โ”‚
โ”‚ ---      โ”† ---   โ”† ---         โ”† ---   โ”† ---               โ”‚
โ”‚ Utf8     โ”† Int64 โ”† Int64       โ”† Int64 โ”† Int64             โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ A        โ”† 1     โ”† 10          โ”† 2     โ”† 2                 โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 2     โ”† 20          โ”† 3     โ”† 3                 โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ A        โ”† 3     โ”† 30          โ”† None  โ”† 30                โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 4     โ”† 40          โ”† 5     โ”† 5                 โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 5     โ”† 50          โ”† 6     โ”† 6                 โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ B        โ”† 6     โ”† 60          โ”† None  โ”† 60                โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

(Showing first 6 of 6 rows)
Source code in daft/expressions/expressions.py
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
@ExpressionPublicAPI
def lead(self, offset: int = 1, default: Any | None = None) -> Expression:
    """Get the value from a future row within a window partition.

    Args:
        offset: The number of rows to shift forward. Must be >= 0.
        default: Value to use when no future row exists. Can be a column reference.

    Returns:
        Expression: Value from the row `offset` positions after the current row.

    Examples:
        >>> import daft
        >>> from daft import Window, col
        >>> df = daft.from_pydict(
        ...     {
        ...         "category": ["A", "A", "A", "B", "B", "B"],
        ...         "value": [1, 2, 3, 4, 5, 6],
        ...         "default_val": [10, 20, 30, 40, 50, 60],
        ...     }
        ... )
        >>>
        >>> # Simple lag with null default
        >>> window = Window().partition_by("category").order_by("value")
        >>> df = df.with_column("lead", col("value").lead(1).over(window))
        >>>
        >>> # Lead with column reference as default
        >>> df = df.with_column("lead_with_default", col("value").lead(1, default=col("default_val")).over(window))
        >>> df.sort(["category", "value"]).show()
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
        โ”‚ category โ”† value โ”† default_val โ”† lead  โ”† lead_with_default โ”‚
        โ”‚ ---      โ”† ---   โ”† ---         โ”† ---   โ”† ---               โ”‚
        โ”‚ Utf8     โ”† Int64 โ”† Int64       โ”† Int64 โ”† Int64             โ”‚
        โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
        โ”‚ A        โ”† 1     โ”† 10          โ”† 2     โ”† 2                 โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 2     โ”† 20          โ”† 3     โ”† 3                 โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ A        โ”† 3     โ”† 30          โ”† None  โ”† 30                โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 4     โ”† 40          โ”† 5     โ”† 5                 โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 5     โ”† 50          โ”† 6     โ”† 6                 โ”‚
        โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
        โ”‚ B        โ”† 6     โ”† 60          โ”† None  โ”† 60                โ”‚
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
        <BLANKLINE>
        (Showing first 6 of 6 rows)
    """
    if default is not None:
        default = Expression._to_expression(default)
    expr = self._expr.offset(offset, default._expr if default is not None else None)
    return Expression._from_pyexpr(expr)