Skip to content

Scalar Functions#

Daft provides a set of built-in operations that can be applied to DataFrame columns. This page provides an overview of all the functions provided by Daft. Learn more about scalar or column functions in Daft User Guide.

functions #

Functions:

Name Description
columns_avg

Average values across columns. Akin to columns_mean.

columns_max

Find the maximum value across columns.

columns_mean

Average values across columns. Akin to columns_avg.

columns_min

Find the minimum value across columns.

columns_sum

Sum values across columns.

format

Format a string using the given arguments.

monotonically_increasing_id

Generates a column of monotonically increasing unique ids.

columns_avg #

columns_avg(*exprs: Expression | str) -> Expression

Average values across columns. Akin to columns_mean.

Parameters:

Name Type Description Default
exprs Expression | str

The columns to average across.

()

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> import daft
>>> from daft.functions import columns_avg
>>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df = df.with_column("avg", columns_avg("a", "b"))
>>> df.show()
╭───────┬───────┬─────────╮
│ a     ┆ b     ┆ avg     │
│ ---   ┆ ---   ┆ ---     │
│ Int64 ┆ Int64 ┆ Float64 │
╞═══════╪═══════╪═════════╡
│ 1     ┆ 4     ┆ 2.5     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2     ┆ 5     ┆ 3.5     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 3     ┆ 6     ┆ 4.5     │
╰───────┴───────┴─────────╯

(Showing first 3 of 3 rows)
Source code in daft/functions/functions.py
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def columns_avg(*exprs: Expression | str) -> Expression:
    """Average values across columns. Akin to `columns_mean`.

    Args:
        exprs: The columns to average across.

    Examples:
        >>> import daft
        >>> from daft.functions import columns_avg
        >>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
        >>> df = df.with_column("avg", columns_avg("a", "b"))
        >>> df.show()
        ╭───────┬───────┬─────────╮
        │ a     ┆ b     ┆ avg     │
        │ ---   ┆ ---   ┆ ---     │
        │ Int64 ┆ Int64 ┆ Float64 │
        ╞═══════╪═══════╪═════════╡
        │ 1     ┆ 4     ┆ 2.5     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
        │ 2     ┆ 5     ┆ 3.5     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
        │ 3     ┆ 6     ┆ 4.5     │
        ╰───────┴───────┴─────────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)
    """
    if not exprs:
        raise ValueError("columns_avg requires at least one expression")
    return list_(*exprs).list.mean().alias("columns_avg")

columns_max #

columns_max(*exprs: Expression | str) -> Expression

Find the maximum value across columns.

Parameters:

Name Type Description Default
exprs Expression | str

The columns to find the maximum of.

()

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> import daft
>>> from daft.functions import columns_max
>>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df = df.with_column("max", columns_max("a", "b"))
>>> df.show()
╭───────┬───────┬───────╮
│ a     ┆ b     ┆ max   │
│ ---   ┆ ---   ┆ ---   │
│ Int64 ┆ Int64 ┆ Int64 │
╞═══════╪═══════╪═══════╡
│ 1     ┆ 4     ┆ 4     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2     ┆ 5     ┆ 5     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3     ┆ 6     ┆ 6     │
╰───────┴───────┴───────╯

(Showing first 3 of 3 rows)
Source code in daft/functions/functions.py
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def columns_max(*exprs: Expression | str) -> Expression:
    """Find the maximum value across columns.

    Args:
        exprs: The columns to find the maximum of.

    Examples:
        >>> import daft
        >>> from daft.functions import columns_max
        >>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
        >>> df = df.with_column("max", columns_max("a", "b"))
        >>> df.show()
        ╭───────┬───────┬───────╮
        │ a     ┆ b     ┆ max   │
        │ ---   ┆ ---   ┆ ---   │
        │ Int64 ┆ Int64 ┆ Int64 │
        ╞═══════╪═══════╪═══════╡
        │ 1     ┆ 4     ┆ 4     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ 2     ┆ 5     ┆ 5     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ 3     ┆ 6     ┆ 6     │
        ╰───────┴───────┴───────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)
    """
    if not exprs:
        raise ValueError("columns_max requires at least one expression")
    return list_(*exprs).list.max().alias("columns_max")

columns_mean #

columns_mean(*exprs: Expression | str) -> Expression

Average values across columns. Akin to columns_avg.

Parameters:

Name Type Description Default
exprs Expression | str

The columns to average.

()

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> import daft
>>> from daft.functions import columns_mean
>>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df = df.with_column("mean", columns_mean("a", "b"))
>>> df.show()
╭───────┬───────┬─────────╮
│ a     ┆ b     ┆ mean    │
│ ---   ┆ ---   ┆ ---     │
│ Int64 ┆ Int64 ┆ Float64 │
╞═══════╪═══════╪═════════╡
│ 1     ┆ 4     ┆ 2.5     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2     ┆ 5     ┆ 3.5     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 3     ┆ 6     ┆ 4.5     │
╰───────┴───────┴─────────╯

(Showing first 3 of 3 rows)
Source code in daft/functions/functions.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def columns_mean(*exprs: Expression | str) -> Expression:
    """Average values across columns. Akin to `columns_avg`.

    Args:
        exprs: The columns to average.

    Examples:
        >>> import daft
        >>> from daft.functions import columns_mean
        >>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
        >>> df = df.with_column("mean", columns_mean("a", "b"))
        >>> df.show()
        ╭───────┬───────┬─────────╮
        │ a     ┆ b     ┆ mean    │
        │ ---   ┆ ---   ┆ ---     │
        │ Int64 ┆ Int64 ┆ Float64 │
        ╞═══════╪═══════╪═════════╡
        │ 1     ┆ 4     ┆ 2.5     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
        │ 2     ┆ 5     ┆ 3.5     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
        │ 3     ┆ 6     ┆ 4.5     │
        ╰───────┴───────┴─────────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)
    """
    if not exprs:
        raise ValueError("columns_mean requires at least one expression")
    return list_(*exprs).list.mean().alias("columns_mean")

columns_min #

columns_min(*exprs: Expression | str) -> Expression

Find the minimum value across columns.

Parameters:

Name Type Description Default
exprs Expression | str

The columns to find the minimum of.

()

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> import daft
>>> from daft.functions import columns_min
>>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df = df.with_column("min", columns_min("a", "b"))
>>> df.show()
╭───────┬───────┬───────╮
│ a     ┆ b     ┆ min   │
│ ---   ┆ ---   ┆ ---   │
│ Int64 ┆ Int64 ┆ Int64 │
╞═══════╪═══════╪═══════╡
│ 1     ┆ 4     ┆ 1     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2     ┆ 5     ┆ 2     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3     ┆ 6     ┆ 3     │
╰───────┴───────┴───────╯

(Showing first 3 of 3 rows)
Source code in daft/functions/functions.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def columns_min(*exprs: Expression | str) -> Expression:
    """Find the minimum value across columns.

    Args:
        exprs: The columns to find the minimum of.

    Examples:
        >>> import daft
        >>> from daft.functions import columns_min
        >>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
        >>> df = df.with_column("min", columns_min("a", "b"))
        >>> df.show()
        ╭───────┬───────┬───────╮
        │ a     ┆ b     ┆ min   │
        │ ---   ┆ ---   ┆ ---   │
        │ Int64 ┆ Int64 ┆ Int64 │
        ╞═══════╪═══════╪═══════╡
        │ 1     ┆ 4     ┆ 1     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ 2     ┆ 5     ┆ 2     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ 3     ┆ 6     ┆ 3     │
        ╰───────┴───────┴───────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)
    """
    if not exprs:
        raise ValueError("columns_min requires at least one expression")
    return list_(*exprs).list.min().alias("columns_min")

columns_sum #

columns_sum(*exprs: Expression | str) -> Expression

Sum values across columns.

Parameters:

Name Type Description Default
exprs Expression | str

The columns to sum.

()

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> import daft
>>> from daft.functions import columns_sum
>>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df = df.with_column("sum", columns_sum("a", "b"))
>>> df.show()
╭───────┬───────┬───────╮
│ a     ┆ b     ┆ sum   │
│ ---   ┆ ---   ┆ ---   │
│ Int64 ┆ Int64 ┆ Int64 │
╞═══════╪═══════╪═══════╡
│ 1     ┆ 4     ┆ 5     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2     ┆ 5     ┆ 7     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3     ┆ 6     ┆ 9     │
╰───────┴───────┴───────╯

(Showing first 3 of 3 rows)
Source code in daft/functions/functions.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def columns_sum(*exprs: Expression | str) -> Expression:
    """Sum values across columns.

    Args:
        exprs: The columns to sum.

    Examples:
        >>> import daft
        >>> from daft.functions import columns_sum
        >>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
        >>> df = df.with_column("sum", columns_sum("a", "b"))
        >>> df.show()
        ╭───────┬───────┬───────╮
        │ a     ┆ b     ┆ sum   │
        │ ---   ┆ ---   ┆ ---   │
        │ Int64 ┆ Int64 ┆ Int64 │
        ╞═══════╪═══════╪═══════╡
        │ 1     ┆ 4     ┆ 5     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ 2     ┆ 5     ┆ 7     │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ 3     ┆ 6     ┆ 9     │
        ╰───────┴───────┴───────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)
    """
    if not exprs:
        raise ValueError("columns_sum requires at least one expression")
    return list_(*exprs).list.sum().alias("columns_sum")

format #

format(
    f_string: str, *args: Expression | str
) -> Expression

Format a string using the given arguments.

Parameters:

Name Type Description Default
f_string str

The format string.

required
*args Expression | str

The arguments to format the string with.

()

Returns:

Name Type Description
Expression Expression

A string expression with the formatted result.

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
>>> import daft
>>> from daft.functions import format
>>> from daft import col
>>> df = daft.from_pydict({"first_name": ["Alice", "Bob"], "last_name": ["Smith", "Jones"]})
>>> df = df.with_column("greeting", format("Hello {} {}", col("first_name"), "last_name"))
>>> df.show()
╭────────────┬───────────┬───────────────────╮
│ first_name ┆ last_name ┆ greeting          │
│ ---        ┆ ---       ┆ ---               │
│ Utf8       ┆ Utf8      ┆ Utf8              │
╞════════════╪═══════════╪═══════════════════╡
│ Alice      ┆ Smith     ┆ Hello Alice Smith │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Bob        ┆ Jones     ┆ Hello Bob Jones   │
╰────────────┴───────────┴───────────────────╯

(Showing first 2 of 2 rows)
Source code in daft/functions/functions.py
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
def format(f_string: str, *args: Expression | str) -> Expression:
    """Format a string using the given arguments.

    Args:
        f_string: The format string.
        *args: The arguments to format the string with.

    Returns:
        Expression: A string expression with the formatted result.

    Examples:
        >>> import daft
        >>> from daft.functions import format
        >>> from daft import col
        >>> df = daft.from_pydict({"first_name": ["Alice", "Bob"], "last_name": ["Smith", "Jones"]})
        >>> df = df.with_column("greeting", format("Hello {} {}", col("first_name"), "last_name"))
        >>> df.show()
        ╭────────────┬───────────┬───────────────────╮
        │ first_name ┆ last_name ┆ greeting          │
        │ ---        ┆ ---       ┆ ---               │
        │ Utf8       ┆ Utf8      ┆ Utf8              │
        ╞════════════╪═══════════╪═══════════════════╡
        │ Alice      ┆ Smith     ┆ Hello Alice Smith │
        ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ Bob        ┆ Jones     ┆ Hello Bob Jones   │
        ╰────────────┴───────────┴───────────────────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)
    """
    if f_string.count("{}") != len(args):
        raise ValueError(
            f"Format string {f_string} has {f_string.count('{}')} placeholders but {len(args)} arguments were provided"
        )

    parts = f_string.split("{}")
    exprs = []

    for part, arg in zip(parts, args):
        if part:
            exprs.append(lit(part))

        if isinstance(arg, str):
            exprs.append(col(arg))
        else:
            exprs.append(arg)

    if parts[-1]:
        exprs.append(lit(parts[-1]))

    if not exprs:
        return lit("")

    result = exprs[0]
    for expr in exprs[1:]:
        result = result + expr

    return result

monotonically_increasing_id #

monotonically_increasing_id() -> Expression

Generates a column of monotonically increasing unique ids.

The implementation puts the partition number in the upper 28 bits, and the row number in each partition in the lower 36 bits. This allows for 2^28 ≈ 268 million partitions and 2^40 ≈ 68 billion rows per partition.

Returns:

Name Type Description
Expression Expression

An expression that generates monotonically increasing IDs

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
>>> import daft
>>> from daft.functions import monotonically_increasing_id
>>> daft.context.set_runner_ray()
>>>
>>> df = daft.from_pydict({"a": [1, 2, 3, 4]}).into_partitions(2)
>>> df = df.with_column("id", monotonically_increasing_id())
>>> df.show()
╭───────┬─────────────╮
│ a     ┆ id          │
│ ---   ┆ ---         │
│ Int64 ┆ UInt64      │
╞═══════╪═════════════╡
│ 1     ┆ 0           │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2     ┆ 1           │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3     ┆ 68719476736 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4     ┆ 68719476737 │
╰───────┴─────────────╯

(Showing first 4 of 4 rows)
Source code in daft/functions/functions.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def monotonically_increasing_id() -> Expression:
    """Generates a column of monotonically increasing unique ids.

    The implementation puts the partition number in the upper 28 bits, and the row number in each partition
    in the lower 36 bits. This allows for 2^28 ≈ 268 million partitions and 2^40 ≈ 68 billion rows per partition.

    Returns:
        Expression: An expression that generates monotonically increasing IDs

    Examples:
        >>> import daft
        >>> from daft.functions import monotonically_increasing_id
        >>> daft.context.set_runner_ray()  # doctest: +SKIP
        >>>
        >>> df = daft.from_pydict({"a": [1, 2, 3, 4]}).into_partitions(2)
        >>> df = df.with_column("id", monotonically_increasing_id())
        >>> df.show()  # doctest: +SKIP
        ╭───────┬─────────────╮
        │ a     ┆ id          │
        │ ---   ┆ ---         │
        │ Int64 ┆ UInt64      │
        ╞═══════╪═════════════╡
        │ 1     ┆ 0           │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ 2     ┆ 1           │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ 3     ┆ 68719476736 │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ 4     ┆ 68719476737 │
        ╰───────┴─────────────╯
        <BLANKLINE>
        (Showing first 4 of 4 rows)

    """
    f = native.get_function_from_registry("monotonically_increasing_id")
    return Expression._from_pyexpr(f())

llm_generate #

Functions:

Name Description
llm_generate

A UDF for running LLM inference over an input column of strings.

llm_generate #

llm_generate(
    input_column: Expression,
    model: str = "facebook/opt-125m",
    provider: Literal["vllm"] = "vllm",
    concurrency: int = 1,
    batch_size: int = 1024,
    num_cpus: int | None = None,
    num_gpus: int | None = None,
    **generation_config: dict[str, Any],
) -> Expression

A UDF for running LLM inference over an input column of strings.

This UDF provides a flexible interface for text generation using various LLM providers. By default, it uses vLLM for efficient local inference.

Parameters:

Name Type Description Default
model str

str, default="facebook/opt-125m" The model identifier to use for generation

'facebook/opt-125m'
provider Literal['vllm']

str, default="vllm" The LLM provider to use for generation. Supported values: "vllm"

'vllm'
concurrency int

int, default=1 The number of concurrent instances of the model to run

1
batch_size int

int, default=1024 The batch size for the UDF

1024
num_cpus int | None

float, default=None The number of CPUs to use for the UDF

None
num_gpus int | None

float, default=None The number of GPUs to use for the UDF

None
generation_config dict[str, Any]

dict, default={} Configuration parameters for text generation (e.g., temperature, max_tokens)

{}

Examples:

1
2
3
4
5
6
>>> import daft
>>> from daft import col
>>> from daft.functions import llm_generate
>>> df = daft.read_csv("prompts.csv")
>>> df = df.with_column("response", llm_generate(col("prompt"), model="facebook/opt-125m"))
>>> df.collect()
Note

Make sure the required provider packages are installed (e.g. vllm, transformers).

Source code in daft/functions/llm_generate.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def llm_generate(
    input_column: Expression,
    model: str = "facebook/opt-125m",
    provider: Literal["vllm"] = "vllm",  # vllm is the only supported provider for now
    concurrency: int = 1,
    batch_size: int = 1024,
    num_cpus: int | None = None,
    num_gpus: int | None = None,
    **generation_config: dict[str, Any],
) -> Expression:
    """A UDF for running LLM inference over an input column of strings.

    This UDF provides a flexible interface for text generation using various LLM providers.
    By default, it uses vLLM for efficient local inference.

    Args:
        model: str, default="facebook/opt-125m"
            The model identifier to use for generation
        provider: str, default="vllm"
            The LLM provider to use for generation. Supported values: "vllm"
        concurrency: int, default=1
            The number of concurrent instances of the model to run
        batch_size: int, default=1024
            The batch size for the UDF
        num_cpus: float, default=None
            The number of CPUs to use for the UDF
        num_gpus: float, default=None
            The number of GPUs to use for the UDF
        generation_config: dict, default={}
            Configuration parameters for text generation (e.g., temperature, max_tokens)

    Examples:
        >>> import daft
        >>> from daft import col
        >>> from daft.functions import llm_generate
        >>> df = daft.read_csv("prompts.csv")
        >>> df = df.with_column("response", llm_generate(col("prompt"), model="facebook/opt-125m"))
        >>> df.collect()

    Note:
        Make sure the required provider packages are installed (e.g. vllm, transformers).
    """
    if provider == "vllm":
        cls = _vLLMGenerator
    else:
        raise ValueError(f"Unsupported provider: {provider}")

    llm_generator = udf(
        return_dtype=DataType.string(),
        batch_size=batch_size,
        num_cpus=num_cpus,
        num_gpus=num_gpus,
        concurrency=concurrency,
    )(cls).with_init_args(
        model=model,
        generation_config=generation_config,
    )

    return llm_generator(input_column)