Skip to content

DataTypes#

Daft provides simple DataTypes that are ubiquituous in many DataFrames such as numbers, strings and dates - all the way up to more complex types like tensors and images. Learn more about DataTypes in Daft User Guide.

DataType #

DataType()

A Daft DataType defines the type of all the values in an Expression or DataFrame column.

Methods:

Name Description
binary

Create a Binary DataType: A string of bytes.

bool

Create the Boolean DataType: Either True or False.

date

Create a Date DataType: A date with a year, month and day.

decimal128

Fixed-precision decimal.

duration

Duration DataType.

embedding

Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric dtype and each array has a fixed length of size.

extension
fixed_size_binary

Create a FixedSizeBinary DataType: A fixed-size string of bytes.

fixed_size_list

Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type dtype and each list has length size.

float32

Create a 32-bit float DataType.

float64

Create a 64-bit float DataType.

from_arrow_type

Maps a PyArrow DataType to a Daft DataType.

from_numpy_dtype

Maps a Numpy datatype to a Daft DataType.

image

Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

int16

Create an 16-bit integer DataType.

int32

Create an 32-bit integer DataType.

int64

Create an 64-bit integer DataType.

int8

Create an 8-bit integer DataType.

interval

Interval DataType.

is_binary

Check if this is a binary type.

is_boolean

Check if this is a boolean type.

is_date

Check if this is a date type.

is_decimal128

Check if this is a decimal128 type.

is_duration

Check if this is a duration type.

is_embedding

Check if this is an embedding type.

is_extension

Check if this is an extension type.

is_fixed_shape_image

Check if this is a fixed shape image type.

is_fixed_shape_sparse_tensor

Check if this is a fixed shape sparse tensor type.

is_fixed_shape_tensor

Check if this is a fixed shape tensor type.

is_fixed_size_binary

Check if this is a fixed size binary type.

is_fixed_size_list

Check if this is a fixed size list type.

is_float32

Check if this is a 32-bit float type.

is_float64

Check if this is a 64-bit float type.

is_image

Check if this is an image type.

is_int16

Check if this is a 16-bit integer type.

is_int32

Check if this is a 32-bit integer type.

is_int64

Check if this is a 64-bit integer type.

is_int8

Check if this is an 8-bit integer type.

is_integer

Check if this is an integer type.

is_interval

Check if this is an interval type.

is_list

Check if this is a list type.

is_logical

Check if this is a logical type.

is_map

Check if this is a map type.

is_null

Check if this is a null type.

is_numeric

Check if this is a numeric type.

is_python

Check if this is a python object type.

is_sparse_tensor

Check if this is a sparse tensor type.

is_string

Check if this is a string type.

is_struct

Check if this is a struct type.

is_temporal

Check if this is a temporal type.

is_tensor

Check if this is a tensor type.

is_time

Check if this is a time type.

is_timestamp

Check if this is a timestamp type.

is_uint16

Check if this is an unsigned 16-bit integer type.

is_uint32

Check if this is an unsigned 32-bit integer type.

is_uint64

Check if this is an unsigned 64-bit integer type.

is_uint8

Check if this is an unsigned 8-bit integer type.

list

Create a List DataType: Variable-length list, where each element in the list has type dtype.

map

Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

null

Creates the Null DataType: Always the Null value.

python

Create a Python DataType: a type which refers to an arbitrary Python object.

sparse_tensor

Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

string

Create a String DataType: A string of UTF8 characters.

struct

Create a Struct DataType: a nested type which has names mapped to child types.

tensor

Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

time

Time DataType. Supported timeunits are "us", "ns".

timestamp

Timestamp DataType.

to_arrow_dtype
uint16

Create an unsigned 16-bit integer DataType.

uint32

Create an unsigned 32-bit integer DataType.

uint64

Create an unsigned 64-bit integer DataType.

uint8

Create an unsigned 8-bit integer DataType.

Attributes:

Name Type Description
dtype DataType

If the datatype contains an inner type, return the inner type, otherwise an attribute error is raised.

fields dict[str, DataType]

If this is a struct type, return the fields, otherwise an attribute error is raised.

image_mode ImageMode | None

If this is an image type, return the (optional) image mode, otherwise an attribute error is raised.

key_type DataType

If this is a map type, return the key type, otherwise an attribute error is raised.

precision int

If this is a decimal type, return the precision, otherwise an attribute error is raised.

scale int

If this is a decimal type, return the scale, otherwise an attribute error is raised.

shape tuple[int, ...]

If this is a fixed shape type, return the shape, otherwise an attribute error is raised.

size int

If this is a fixed size type, return the size, otherwise an attribute error is raised.

timeunit TimeUnit

If this is a time or timestamp type, return the timeunit, otherwise an attribute error is raised.

timezone str | None

If this is a timestamp type, return the timezone, otherwise an attribute error is raised.

use_offset_indices bool

If this is a sparse tensor type, return whether the indices are stored as offsets, otherwise an attribute error is raised.

value_type DataType

If this is a map type, return the value type, otherwise an attribute error is raised.

Source code in daft/datatype.py
95
96
97
98
99
def __init__(self) -> None:
    raise NotImplementedError(
        "We do not support creating a DataType via __init__ "
        "use a creator method like DataType.int32() or use DataType.from_arrow_type(pa_type)"
    )

dtype #

dtype: DataType

If the datatype contains an inner type, return the inner type, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.list(daft.DataType.int64())
>>> assert dtype.dtype == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.dtype
... except AttributeError:
...     pass

fields #

fields: dict[str, DataType]

If this is a struct type, return the fields, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
9
>>> import daft
>>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
>>> fields = dtype.fields
>>> assert fields["a"] == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.fields
... except AttributeError:
...     pass

image_mode #

image_mode: ImageMode | None

If this is an image type, return the (optional) image mode, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.image(mode="RGB")
>>> assert dtype.image_mode == daft.ImageMode.RGB
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.image_mode
... except AttributeError:
...     pass

key_type #

key_type: DataType

If this is a map type, return the key type, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.key_type == daft.DataType.string()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.key_type
... except AttributeError:
...     pass

precision #

precision: int

If this is a decimal type, return the precision, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.precision == 10
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.precision
... except AttributeError:
...     pass

scale #

scale: int

If this is a decimal type, return the scale, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.scale == 2
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.precision
... except AttributeError:
...     pass

shape #

shape: tuple[int, ...]

If this is a fixed shape type, return the shape, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.shape == (2, 3)
>>> dtype = daft.DataType.tensor(daft.DataType.float32())
>>> try:
...     dtype.shape
... except AttributeError:
...     pass

size #

size: int

If this is a fixed size type, return the size, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.fixed_size_binary(size=10)
>>> assert dtype.size == 10
>>> dtype = daft.DataType.binary()
>>> try:
...     dtype.size
... except AttributeError:
...     pass

timeunit #

timeunit: TimeUnit

If this is a time or timestamp type, return the timeunit, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
9
>>> import daft
>>> dtype = daft.DataType.time(timeunit="ns")
>>> dtype.timeunit
TimeUnit(ns)
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.timeunit
... except AttributeError:
...     pass

timezone #

timezone: str | None

If this is a timestamp type, return the timezone, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns", timezone="UTC")
>>> assert dtype.timezone == "UTC"
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.time_zone
... except AttributeError:
...     pass

use_offset_indices #

use_offset_indices: bool

If this is a sparse tensor type, return whether the indices are stored as offsets, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), use_offset_indices=True)
>>> assert dtype.use_offset_indices
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.use_offset_indices
... except AttributeError:
...     pass

value_type #

value_type: DataType

If this is a map type, return the value type, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.value_type == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.value_type
... except AttributeError:
...     pass

binary #

binary() -> DataType

Create a Binary DataType: A string of bytes.

Source code in daft/datatype.py
201
202
203
204
@classmethod
def binary(cls) -> DataType:
    """Create a Binary DataType: A string of bytes."""
    return cls._from_pydatatype(PyDataType.binary())

bool #

bool() -> DataType

Create the Boolean DataType: Either True or False.

Source code in daft/datatype.py
196
197
198
199
@classmethod
def bool(cls) -> DataType:
    """Create the Boolean DataType: Either ``True`` or ``False``."""
    return cls._from_pydatatype(PyDataType.bool())

date #

date() -> DataType

Create a Date DataType: A date with a year, month and day.

Source code in daft/datatype.py
223
224
225
226
@classmethod
def date(cls) -> DataType:
    """Create a Date DataType: A date with a year, month and day."""
    return cls._from_pydatatype(PyDataType.date())

decimal128 #

decimal128(precision: int, scale: int) -> DataType

Fixed-precision decimal.

Source code in daft/datatype.py
218
219
220
221
@classmethod
def decimal128(cls, precision: int, scale: int) -> DataType:
    """Fixed-precision decimal."""
    return cls._from_pydatatype(PyDataType.decimal128(precision, scale))

duration #

duration(timeunit: TimeUnit | str) -> DataType

Duration DataType.

Source code in daft/datatype.py
242
243
244
245
246
247
@classmethod
def duration(cls, timeunit: TimeUnit | str) -> DataType:
    """Duration DataType."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.duration(timeunit._timeunit))

embedding #

embedding(dtype: DataType, size: int) -> DataType

Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric dtype and each array has a fixed length of size.

Parameters:

Name Type Description Default
dtype DataType

DataType of each element in the list (must be numeric)

required
size int

length of each list

required
Source code in daft/datatype.py
301
302
303
304
305
306
307
308
309
310
311
@classmethod
def embedding(cls, dtype: DataType, size: int) -> DataType:
    """Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a **numeric** ``dtype`` and each array has a fixed length of ``size``.

    Args:
        dtype: DataType of each element in the list (must be numeric)
        size: length of each list
    """
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a embedding must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.embedding(dtype._dtype, size))

extension #

extension(
    name: str,
    storage_dtype: DataType,
    metadata: str | None = None,
) -> DataType
Source code in daft/datatype.py
297
298
299
@classmethod
def extension(cls, name: str, storage_dtype: DataType, metadata: str | None = None) -> DataType:
    return cls._from_pydatatype(PyDataType.extension(name, storage_dtype._dtype, metadata))

fixed_size_binary #

fixed_size_binary(size: int) -> DataType

Create a FixedSizeBinary DataType: A fixed-size string of bytes.

Source code in daft/datatype.py
206
207
208
209
210
211
@classmethod
def fixed_size_binary(cls, size: int) -> DataType:
    """Create a FixedSizeBinary DataType: A fixed-size string of bytes."""
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a fixed-size binary must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.fixed_size_binary(size))

fixed_size_list #

fixed_size_list(dtype: DataType, size: int) -> DataType

Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type dtype and each list has length size.

Parameters:

Name Type Description Default
dtype DataType

DataType of each element in the list

required
size int

length of each list

required
Source code in daft/datatype.py
263
264
265
266
267
268
269
270
271
272
273
@classmethod
def fixed_size_list(cls, dtype: DataType, size: int) -> DataType:
    """Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type ``dtype`` and each list has length ``size``.

    Args:
        dtype: DataType of each element in the list
        size: length of each list
    """
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a fixed-size list must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.fixed_size_list(dtype._dtype, size))

float32 #

float32() -> DataType

Create a 32-bit float DataType.

Source code in daft/datatype.py
181
182
183
184
@classmethod
def float32(cls) -> DataType:
    """Create a 32-bit float DataType."""
    return cls._from_pydatatype(PyDataType.float32())

float64 #

float64() -> DataType

Create a 64-bit float DataType.

Source code in daft/datatype.py
186
187
188
189
@classmethod
def float64(cls) -> DataType:
    """Create a 64-bit float DataType."""
    return cls._from_pydatatype(PyDataType.float64())

from_arrow_type #

from_arrow_type(arrow_type: DataType) -> DataType

Maps a PyArrow DataType to a Daft DataType.

Source code in daft/datatype.py
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
@classmethod
def from_arrow_type(cls, arrow_type: pa.lib.DataType) -> DataType:
    """Maps a PyArrow DataType to a Daft DataType."""
    if pa.types.is_int8(arrow_type):
        return cls.int8()
    elif pa.types.is_int16(arrow_type):
        return cls.int16()
    elif pa.types.is_int32(arrow_type):
        return cls.int32()
    elif pa.types.is_int64(arrow_type):
        return cls.int64()
    elif pa.types.is_uint8(arrow_type):
        return cls.uint8()
    elif pa.types.is_uint16(arrow_type):
        return cls.uint16()
    elif pa.types.is_uint32(arrow_type):
        return cls.uint32()
    elif pa.types.is_uint64(arrow_type):
        return cls.uint64()
    elif pa.types.is_float32(arrow_type):
        return cls.float32()
    elif pa.types.is_float64(arrow_type):
        return cls.float64()
    elif pa.types.is_string(arrow_type) or pa.types.is_large_string(arrow_type):
        return cls.string()
    elif pa.types.is_binary(arrow_type) or pa.types.is_large_binary(arrow_type):
        return cls.binary()
    elif pa.types.is_fixed_size_binary(arrow_type):
        return cls.fixed_size_binary(arrow_type.byte_width)
    elif pa.types.is_boolean(arrow_type):
        return cls.bool()
    elif pa.types.is_null(arrow_type):
        return cls.null()
    elif pa.types.is_decimal128(arrow_type):
        return cls.decimal128(arrow_type.precision, arrow_type.scale)
    elif pa.types.is_date32(arrow_type):
        return cls.date()
    elif pa.types.is_date64(arrow_type):
        return cls.timestamp(TimeUnit.ms())
    elif pa.types.is_time64(arrow_type):
        timeunit = TimeUnit.from_str(pa.type_for_alias(str(arrow_type)).unit)
        return cls.time(timeunit)
    elif pa.types.is_timestamp(arrow_type):
        timeunit = TimeUnit.from_str(arrow_type.unit)
        return cls.timestamp(timeunit=timeunit, timezone=arrow_type.tz)
    elif pa.types.is_duration(arrow_type):
        timeunit = TimeUnit.from_str(arrow_type.unit)
        return cls.duration(timeunit=timeunit)
    elif pa.types.is_list(arrow_type) or pa.types.is_large_list(arrow_type):
        assert isinstance(arrow_type, (pa.ListType, pa.LargeListType))
        field = arrow_type.value_field
        return cls.list(cls.from_arrow_type(field.type))
    elif pa.types.is_fixed_size_list(arrow_type):
        assert isinstance(arrow_type, pa.FixedSizeListType)
        field = arrow_type.value_field
        return cls.fixed_size_list(cls.from_arrow_type(field.type), arrow_type.list_size)
    elif pa.types.is_struct(arrow_type):
        assert isinstance(arrow_type, pa.StructType)
        fields = [arrow_type[i] for i in range(arrow_type.num_fields)]
        return cls.struct({field.name: cls.from_arrow_type(field.type) for field in fields})
    elif pa.types.is_interval(arrow_type):
        return cls.interval()
    elif pa.types.is_map(arrow_type):
        assert isinstance(arrow_type, pa.MapType)
        return cls.map(
            key_type=cls.from_arrow_type(arrow_type.key_type),
            value_type=cls.from_arrow_type(arrow_type.item_type),
        )
    elif isinstance(arrow_type, getattr(pa, "FixedShapeTensorType", ())):
        scalar_dtype = cls.from_arrow_type(arrow_type.value_type)
        return cls.tensor(scalar_dtype, tuple(arrow_type.shape))
    elif isinstance(arrow_type, pa.PyExtensionType):
        # TODO(Clark): Add a native cross-lang extension type representation for PyExtensionTypes.
        raise ValueError(
            "pyarrow extension types that subclass pa.PyExtensionType can't be used in Daft, since they can't be "
            f"used in non-Python Arrow implementations and Daft uses the Rust Arrow2 implementation: {arrow_type}"
        )
    elif isinstance(arrow_type, pa.BaseExtensionType):
        name = arrow_type.extension_name

        if (get_context().get_or_create_runner().name == "ray") and (
            type(arrow_type).__reduce__ == pa.BaseExtensionType.__reduce__
        ):
            raise ValueError(
                f"You are attempting to use a Extension Type: {arrow_type} with the default pyarrow `__reduce__` which breaks pickling for Extensions"
                "To fix this, implement your own `__reduce__` on your extension type"
                "For more details see this issue: "
                "https://github.com/apache/arrow/issues/35599"
            )
        try:
            metadata = arrow_type.__arrow_ext_serialize__().decode()
        except AttributeError:
            metadata = None

        if name == "daft.super_extension":
            assert metadata is not None
            return cls._from_pydatatype(PyDataType.from_json(metadata))
        else:
            return cls.extension(
                name,
                cls.from_arrow_type(arrow_type.storage_type),
                metadata,
            )
    else:
        # Fall back to a Python object type.
        # TODO(Clark): Add native support for remaining Arrow types.
        return cls.python()

from_numpy_dtype #

from_numpy_dtype(np_type: dtype[Any]) -> DataType

Maps a Numpy datatype to a Daft DataType.

Source code in daft/datatype.py
517
518
519
520
521
@classmethod
def from_numpy_dtype(cls, np_type: np.dtype[Any]) -> DataType:
    """Maps a Numpy datatype to a Daft DataType."""
    arrow_type = pa.from_numpy_dtype(np_type)
    return cls.from_arrow_type(arrow_type)

image #

image(
    mode: str | ImageMode | None = None,
    height: int | None = None,
    width: int | None = None,
) -> DataType

Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

Each image in the array has an :class:~daft.ImageMode, which describes the pixel dtype (e.g. uint8) and the number of image channels/bands and their logical interpretation (e.g. RGB).

If the height, width, and mode are the same for all images in the array, specifying them when constructing this type is advised, since that will allow Daft to create a more optimized physical representation of the image array.

If the height, width, or mode may vary across images in the array, leaving these fields unspecified when creating this type will cause Daft to represent this image array as a heterogeneous collection of images, where each image can have a different mode, height, and width. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:

Name Type Description Default
mode str | ImageMode | None

The mode of the image. By default, this is inferred from the underlying data. If height and width are specified, the mode must also be specified.

None
height int | None

The height of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.

None
width int | None

The width of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.

None
Source code in daft/datatype.py
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
@classmethod
def image(
    cls, mode: str | ImageMode | None = None, height: int | None = None, width: int | None = None
) -> DataType:
    """Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

    Each image in the array has an :class:`~daft.ImageMode`, which describes the pixel dtype (e.g. uint8) and
    the number of image channels/bands and their logical interpretation (e.g. RGB).

    If the height, width, and mode are the same for all images in the array, specifying them when constructing
    this type is advised, since that will allow Daft to create a more optimized physical representation
    of the image array.

    If the height, width, or mode may vary across images in the array, leaving these fields unspecified when
    creating this type will cause Daft to represent this image array as a heterogeneous collection of images,
    where each image can have a different mode, height, and width. This is much more flexible, but will result
    in a less compact representation and may be make some operations less efficient.

    Args:
        mode: The mode of the image. By default, this is inferred from the underlying data.
            If height and width are specified, the mode must also be specified.
        height: The height of the image. By default, this is inferred from the underlying data.
            Must be specified if the width is specified.
        width: The width of the image. By default, this is inferred from the underlying data.
            Must be specified if the width is specified.
    """
    if isinstance(mode, str):
        mode = ImageMode.from_mode_string(mode.upper())
    if mode is not None and not isinstance(mode, ImageMode):
        raise ValueError(f"mode must be a string or ImageMode variant, but got: {mode}")
    if height is not None and width is not None:
        if not isinstance(height, int) or height <= 0:
            raise ValueError("Image height must be a positive integer, but got: ", height)
        if not isinstance(width, int) or width <= 0:
            raise ValueError("Image width must be a positive integer, but got: ", width)
    elif height is not None or width is not None:
        raise ValueError(
            f"Image height and width must either both be specified, or both not be specified, but got height={height}, width={width}"
        )
    return cls._from_pydatatype(PyDataType.image(mode, height, width))

int16 #

int16() -> DataType

Create an 16-bit integer DataType.

Source code in daft/datatype.py
146
147
148
149
@classmethod
def int16(cls) -> DataType:
    """Create an 16-bit integer DataType."""
    return cls._from_pydatatype(PyDataType.int16())

int32 #

int32() -> DataType

Create an 32-bit integer DataType.

Source code in daft/datatype.py
151
152
153
154
@classmethod
def int32(cls) -> DataType:
    """Create an 32-bit integer DataType."""
    return cls._from_pydatatype(PyDataType.int32())

int64 #

int64() -> DataType

Create an 64-bit integer DataType.

Source code in daft/datatype.py
156
157
158
159
@classmethod
def int64(cls) -> DataType:
    """Create an 64-bit integer DataType."""
    return cls._from_pydatatype(PyDataType.int64())

int8 #

int8() -> DataType

Create an 8-bit integer DataType.

Source code in daft/datatype.py
141
142
143
144
@classmethod
def int8(cls) -> DataType:
    """Create an 8-bit integer DataType."""
    return cls._from_pydatatype(PyDataType.int8())

interval #

interval() -> DataType

Interval DataType.

Source code in daft/datatype.py
249
250
251
252
@classmethod
def interval(cls) -> DataType:
    """Interval DataType."""
    return cls._from_pydatatype(PyDataType.interval())

is_binary #

is_binary() -> bool

Check if this is a binary type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.binary()
>>> assert dtype.is_binary()
Source code in daft/datatype.py
712
713
714
715
716
717
718
719
720
def is_binary(self) -> builtins.bool:
    """Check if this is a binary type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.binary()
        >>> assert dtype.is_binary()
    """
    return self._dtype.is_binary()

is_boolean #

is_boolean() -> bool

Check if this is a boolean type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.bool()
>>> assert dtype.is_boolean()
Source code in daft/datatype.py
542
543
544
545
546
547
548
549
550
def is_boolean(self) -> builtins.bool:
    """Check if this is a boolean type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.bool()
        >>> assert dtype.is_boolean()
    """
    return self._dtype.is_boolean()

is_date #

is_date() -> bool

Check if this is a date type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.date()
>>> assert dtype.is_date()
Source code in daft/datatype.py
672
673
674
675
676
677
678
679
680
def is_date(self) -> builtins.bool:
    """Check if this is a date type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.date()
        >>> assert dtype.is_date()
    """
    return self._dtype.is_date()

is_decimal128 #

is_decimal128() -> bool

Check if this is a decimal128 type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.is_decimal128()
Source code in daft/datatype.py
652
653
654
655
656
657
658
659
660
def is_decimal128(self) -> builtins.bool:
    """Check if this is a decimal128 type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.decimal128(precision=10, scale=2)
        >>> assert dtype.is_decimal128()
    """
    return self._dtype.is_decimal128()

is_duration #

is_duration() -> bool

Check if this is a duration type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.duration(timeunit="ns")
>>> assert dtype.is_duration()
Source code in daft/datatype.py
692
693
694
695
696
697
698
699
700
def is_duration(self) -> builtins.bool:
    """Check if this is a duration type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.duration(timeunit="ns")
        >>> assert dtype.is_duration()
    """
    return self._dtype.is_duration()

is_embedding #

is_embedding() -> bool

Check if this is an embedding type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.embedding(daft.DataType.float32(), 512)
>>> assert dtype.is_embedding()
Source code in daft/datatype.py
812
813
814
815
816
817
818
819
820
def is_embedding(self) -> builtins.bool:
    """Check if this is an embedding type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.embedding(daft.DataType.float32(), 512)
        >>> assert dtype.is_embedding()
    """
    return self._dtype.is_embedding()

is_extension #

is_extension() -> bool

Check if this is an extension type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.extension("custom", daft.DataType.int64())
>>> assert dtype.is_extension()
Source code in daft/datatype.py
782
783
784
785
786
787
788
789
790
def is_extension(self) -> builtins.bool:
    """Check if this is an extension type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.extension("custom", daft.DataType.int64())
        >>> assert dtype.is_extension()
    """
    return self._dtype.is_extension()

is_fixed_shape_image #

is_fixed_shape_image() -> bool

Check if this is a fixed shape image type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.image(mode="RGB", height=224, width=224)
>>> assert dtype.is_fixed_shape_image()
Source code in daft/datatype.py
802
803
804
805
806
807
808
809
810
def is_fixed_shape_image(self) -> builtins.bool:
    """Check if this is a fixed shape image type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.image(mode="RGB", height=224, width=224)
        >>> assert dtype.is_fixed_shape_image()
    """
    return self._dtype.is_fixed_shape_image()

is_fixed_shape_sparse_tensor #

is_fixed_shape_sparse_tensor() -> bool

Check if this is a fixed shape sparse tensor type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.is_fixed_shape_sparse_tensor()
Source code in daft/datatype.py
852
853
854
855
856
857
858
859
860
def is_fixed_shape_sparse_tensor(self) -> builtins.bool:
    """Check if this is a fixed shape sparse tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), shape=(2, 3))
        >>> assert dtype.is_fixed_shape_sparse_tensor()
    """
    return self._dtype.is_fixed_shape_sparse_tensor()

is_fixed_shape_tensor #

is_fixed_shape_tensor() -> bool

Check if this is a fixed shape tensor type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.is_fixed_shape_tensor()
Source code in daft/datatype.py
832
833
834
835
836
837
838
839
840
def is_fixed_shape_tensor(self) -> builtins.bool:
    """Check if this is a fixed shape tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
        >>> assert dtype.is_fixed_shape_tensor()
    """
    return self._dtype.is_fixed_shape_tensor()

is_fixed_size_binary #

is_fixed_size_binary() -> bool

Check if this is a fixed size binary type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.fixed_size_binary(size=10)
>>> assert dtype.is_fixed_size_binary()
Source code in daft/datatype.py
722
723
724
725
726
727
728
729
730
def is_fixed_size_binary(self) -> builtins.bool:
    """Check if this is a fixed size binary type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.fixed_size_binary(size=10)
        >>> assert dtype.is_fixed_size_binary()
    """
    return self._dtype.is_fixed_size_binary()

is_fixed_size_list #

is_fixed_size_list() -> bool

Check if this is a fixed size list type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.fixed_size_list(daft.DataType.int64(), size=10)
>>> assert dtype.is_fixed_size_list()
Source code in daft/datatype.py
752
753
754
755
756
757
758
759
760
def is_fixed_size_list(self) -> builtins.bool:
    """Check if this is a fixed size list type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.fixed_size_list(daft.DataType.int64(), size=10)
        >>> assert dtype.is_fixed_size_list()
    """
    return self._dtype.is_fixed_size_list()

is_float32 #

is_float32() -> bool

Check if this is a 32-bit float type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.float32()
>>> assert dtype.is_float32()
Source code in daft/datatype.py
632
633
634
635
636
637
638
639
640
def is_float32(self) -> builtins.bool:
    """Check if this is a 32-bit float type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float32()
        >>> assert dtype.is_float32()
    """
    return self._dtype.is_float32()

is_float64 #

is_float64() -> bool

Check if this is a 64-bit float type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.float64()
>>> assert dtype.is_float64()
Source code in daft/datatype.py
642
643
644
645
646
647
648
649
650
def is_float64(self) -> builtins.bool:
    """Check if this is a 64-bit float type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float64()
        >>> assert dtype.is_float64()
    """
    return self._dtype.is_float64()

is_image #

is_image() -> bool

Check if this is an image type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.image()
>>> assert dtype.is_image()
Source code in daft/datatype.py
792
793
794
795
796
797
798
799
800
def is_image(self) -> builtins.bool:
    """Check if this is an image type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.image()
        >>> assert dtype.is_image()
    """
    return self._dtype.is_image()

is_int16 #

is_int16() -> bool

Check if this is a 16-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int16()
>>> assert dtype.is_int16()
Source code in daft/datatype.py
562
563
564
565
566
567
568
569
570
def is_int16(self) -> builtins.bool:
    """Check if this is a 16-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int16()
        >>> assert dtype.is_int16()
    """
    return self._dtype.is_int16()

is_int32 #

is_int32() -> bool

Check if this is a 32-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int32()
>>> assert dtype.is_int32()
Source code in daft/datatype.py
572
573
574
575
576
577
578
579
580
def is_int32(self) -> builtins.bool:
    """Check if this is a 32-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int32()
        >>> assert dtype.is_int32()
    """
    return self._dtype.is_int32()

is_int64 #

is_int64() -> bool

Check if this is a 64-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int64()
>>> assert dtype.is_int64()
Source code in daft/datatype.py
582
583
584
585
586
587
588
589
590
def is_int64(self) -> builtins.bool:
    """Check if this is a 64-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int64()
        >>> assert dtype.is_int64()
    """
    return self._dtype.is_int64()

is_int8 #

is_int8() -> bool

Check if this is an 8-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int8()
>>> assert dtype.is_int8()
Source code in daft/datatype.py
552
553
554
555
556
557
558
559
560
def is_int8(self) -> builtins.bool:
    """Check if this is an 8-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int8()
        >>> assert dtype.is_int8()
    """
    return self._dtype.is_int8()

is_integer #

is_integer() -> bool

Check if this is an integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int64()
>>> assert dtype.is_integer()
Source code in daft/datatype.py
882
883
884
885
886
887
888
889
890
def is_integer(self) -> builtins.bool:
    """Check if this is an integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int64()
        >>> assert dtype.is_integer()
    """
    return self._dtype.is_integer()

is_interval #

is_interval() -> bool

Check if this is an interval type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.interval()
>>> assert dtype.is_interval()
Source code in daft/datatype.py
702
703
704
705
706
707
708
709
710
def is_interval(self) -> builtins.bool:
    """Check if this is an interval type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.interval()
        >>> assert dtype.is_interval()
    """
    return self._dtype.is_interval()

is_list #

is_list() -> bool

Check if this is a list type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.list(daft.DataType.int64())
>>> assert dtype.is_list()
Source code in daft/datatype.py
742
743
744
745
746
747
748
749
750
def is_list(self) -> builtins.bool:
    """Check if this is a list type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.list(daft.DataType.int64())
        >>> assert dtype.is_list()
    """
    return self._dtype.is_list()

is_logical #

is_logical() -> bool

Check if this is a logical type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.bool()
>>> assert not dtype.is_logical()
Source code in daft/datatype.py
892
893
894
895
896
897
898
899
900
def is_logical(self) -> builtins.bool:
    """Check if this is a logical type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.bool()
        >>> assert not dtype.is_logical()
    """
    return self._dtype.is_logical()

is_map #

is_map() -> bool

Check if this is a map type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.is_map()
Source code in daft/datatype.py
772
773
774
775
776
777
778
779
780
def is_map(self) -> builtins.bool:
    """Check if this is a map type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
        >>> assert dtype.is_map()
    """
    return self._dtype.is_map()

is_null #

is_null() -> bool

Check if this is a null type.

Examples:

1
2
3
4
>>> import daft
>>> dtype = daft.DataType.null()
>>> dtype.is_null()
True
Source code in daft/datatype.py
531
532
533
534
535
536
537
538
539
540
def is_null(self) -> builtins.bool:
    """Check if this is a null type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.null()
        >>> dtype.is_null()
        True
    """
    return self._dtype.is_null()

is_numeric #

is_numeric() -> bool

Check if this is a numeric type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.float64()
>>> assert dtype.is_numeric()
Source code in daft/datatype.py
872
873
874
875
876
877
878
879
880
def is_numeric(self) -> builtins.bool:
    """Check if this is a numeric type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float64()
        >>> assert dtype.is_numeric()
    """
    return self._dtype.is_numeric()

is_python #

is_python() -> bool

Check if this is a python object type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.python()
>>> assert dtype.is_python()
Source code in daft/datatype.py
862
863
864
865
866
867
868
869
870
def is_python(self) -> builtins.bool:
    """Check if this is a python object type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.python()
        >>> assert dtype.is_python()
    """
    return self._dtype.is_python()

is_sparse_tensor #

is_sparse_tensor() -> bool

Check if this is a sparse tensor type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32())
>>> assert dtype.is_sparse_tensor()
Source code in daft/datatype.py
842
843
844
845
846
847
848
849
850
def is_sparse_tensor(self) -> builtins.bool:
    """Check if this is a sparse tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32())
        >>> assert dtype.is_sparse_tensor()
    """
    return self._dtype.is_sparse_tensor()

is_string #

is_string() -> bool

Check if this is a string type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.string()
>>> assert dtype.is_string()
Source code in daft/datatype.py
732
733
734
735
736
737
738
739
740
def is_string(self) -> builtins.bool:
    """Check if this is a string type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.string()
        >>> assert dtype.is_string()
    """
    return self._dtype.is_string()

is_struct #

is_struct() -> bool

Check if this is a struct type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
>>> assert dtype.is_struct()
Source code in daft/datatype.py
762
763
764
765
766
767
768
769
770
def is_struct(self) -> builtins.bool:
    """Check if this is a struct type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
        >>> assert dtype.is_struct()
    """
    return self._dtype.is_struct()

is_temporal #

is_temporal() -> bool

Check if this is a temporal type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns")
>>> assert dtype.is_temporal()
Source code in daft/datatype.py
902
903
904
905
906
907
908
909
910
def is_temporal(self) -> builtins.bool:
    """Check if this is a temporal type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.timestamp(timeunit="ns")
        >>> assert dtype.is_temporal()
    """
    return self._dtype.is_temporal()

is_tensor #

is_tensor() -> bool

Check if this is a tensor type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32())
>>> assert dtype.is_tensor()
Source code in daft/datatype.py
822
823
824
825
826
827
828
829
830
def is_tensor(self) -> builtins.bool:
    """Check if this is a tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.tensor(daft.DataType.float32())
        >>> assert dtype.is_tensor()
    """
    return self._dtype.is_tensor()

is_time #

is_time() -> bool

Check if this is a time type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.time(timeunit="ns")
>>> assert dtype.is_time()
Source code in daft/datatype.py
682
683
684
685
686
687
688
689
690
def is_time(self) -> builtins.bool:
    """Check if this is a time type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.time(timeunit="ns")
        >>> assert dtype.is_time()
    """
    return self._dtype.is_time()

is_timestamp #

is_timestamp() -> bool

Check if this is a timestamp type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns")
>>> assert dtype.is_timestamp()
Source code in daft/datatype.py
662
663
664
665
666
667
668
669
670
def is_timestamp(self) -> builtins.bool:
    """Check if this is a timestamp type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.timestamp(timeunit="ns")
        >>> assert dtype.is_timestamp()
    """
    return self._dtype.is_timestamp()

is_uint16 #

is_uint16() -> bool

Check if this is an unsigned 16-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uint16()
>>> assert dtype.is_uint16()
Source code in daft/datatype.py
602
603
604
605
606
607
608
609
610
def is_uint16(self) -> builtins.bool:
    """Check if this is an unsigned 16-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint16()
        >>> assert dtype.is_uint16()
    """
    return self._dtype.is_uint16()

is_uint32 #

is_uint32() -> bool

Check if this is an unsigned 32-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uint32()
>>> assert dtype.is_uint32()
Source code in daft/datatype.py
612
613
614
615
616
617
618
619
620
def is_uint32(self) -> builtins.bool:
    """Check if this is an unsigned 32-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint32()
        >>> assert dtype.is_uint32()
    """
    return self._dtype.is_uint32()

is_uint64 #

is_uint64() -> bool

Check if this is an unsigned 64-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uint64()
>>> assert dtype.is_uint64()
Source code in daft/datatype.py
622
623
624
625
626
627
628
629
630
def is_uint64(self) -> builtins.bool:
    """Check if this is an unsigned 64-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint64()
        >>> assert dtype.is_uint64()
    """
    return self._dtype.is_uint64()

is_uint8 #

is_uint8() -> bool

Check if this is an unsigned 8-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uint8()
>>> assert dtype.is_uint8()
Source code in daft/datatype.py
592
593
594
595
596
597
598
599
600
def is_uint8(self) -> builtins.bool:
    """Check if this is an unsigned 8-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint8()
        >>> assert dtype.is_uint8()
    """
    return self._dtype.is_uint8()

list #

list(dtype: DataType) -> DataType

Create a List DataType: Variable-length list, where each element in the list has type dtype.

Parameters:

Name Type Description Default
dtype DataType

DataType of each element in the list

required
Source code in daft/datatype.py
254
255
256
257
258
259
260
261
@classmethod
def list(cls, dtype: DataType) -> DataType:
    """Create a List DataType: Variable-length list, where each element in the list has type ``dtype``.

    Args:
        dtype: DataType of each element in the list
    """
    return cls._from_pydatatype(PyDataType.list(dtype._dtype))

map #

map(key_type: DataType, value_type: DataType) -> DataType

Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

Parameters:

Name Type Description Default
key_type DataType

DataType of the keys in the map

required
value_type DataType

DataType of the values in the map

required
Source code in daft/datatype.py
275
276
277
278
279
280
281
282
283
@classmethod
def map(cls, key_type: DataType, value_type: DataType) -> DataType:
    """Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

    Args:
        key_type: DataType of the keys in the map
        value_type: DataType of the values in the map
    """
    return cls._from_pydatatype(PyDataType.map(key_type._dtype, value_type._dtype))

null #

null() -> DataType

Creates the Null DataType: Always the Null value.

Source code in daft/datatype.py
213
214
215
216
@classmethod
def null(cls) -> DataType:
    """Creates the Null DataType: Always the ``Null`` value."""
    return cls._from_pydatatype(PyDataType.null())

python #

python() -> DataType

Create a Python DataType: a type which refers to an arbitrary Python object.

Source code in daft/datatype.py
526
527
528
529
@classmethod
def python(cls) -> DataType:
    """Create a Python DataType: a type which refers to an arbitrary Python object."""
    return cls._from_pydatatype(PyDataType.python())

sparse_tensor #

sparse_tensor(
    dtype: DataType,
    shape: tuple[int, ...] | None = None,
    use_offset_indices: bool = False,
) -> DataType

Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

The use_offset_indices parameter determines how the indices of the SparseTensor are stored: - False (default): Indices represent the actual positions of nonzero values. - True: Indices represent the offsets between consecutive nonzero values. This can improve compression efficiency, especially when nonzero values are clustered together, as offsets between them are often zero, making them easier to compress.

Parameters:

Name Type Description Default
dtype DataType

The type of the data contained within the tensor elements.

required
shape tuple[int, ...] | None

The shape of each SparseTensor in the column. This is None by default, which allows the shapes of each tensor element to vary.

None
use_offset_indices bool

Determines how indices are represented. Defaults to False (storing actual indices). If True, stores offsets between nonzero indices.

False
Source code in daft/datatype.py
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
@classmethod
def sparse_tensor(
    cls,
    dtype: DataType,
    shape: tuple[int, ...] | None = None,
    use_offset_indices: builtins.bool = False,
) -> DataType:
    """Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided ``dtype`` as elements, each of the provided ``shape``.

    If a ``shape`` is given, each ndarray in the column will have this shape.

    If ``shape`` is not given, the ndarrays in the column can have different shapes. This is much more flexible,
    but will result in a less compact representation and may be make some operations less efficient.

    The ``use_offset_indices`` parameter determines how the indices of the SparseTensor are stored:
    - ``False`` (default): Indices represent the actual positions of nonzero values.
    - ``True``: Indices represent the offsets between consecutive nonzero values.
    This can improve compression efficiency, especially when nonzero values are clustered together,
    as offsets between them are often zero, making them easier to compress.

    Args:
        dtype: The type of the data contained within the tensor elements.
        shape: The shape of each SparseTensor in the column. This is ``None`` by default, which allows the shapes of
            each tensor element to vary.
        use_offset_indices: Determines how indices are represented.
            Defaults to `False` (storing actual indices). If `True`, stores offsets between nonzero indices.
    """
    if shape is not None:
        if not isinstance(shape, tuple) or not shape or any(not isinstance(n, int) for n in shape):
            raise ValueError("SparseTensor shape must be a non-empty tuple of ints, but got: ", shape)
    return cls._from_pydatatype(PyDataType.sparse_tensor(dtype._dtype, shape, use_offset_indices))

string #

string() -> DataType

Create a String DataType: A string of UTF8 characters.

Source code in daft/datatype.py
191
192
193
194
@classmethod
def string(cls) -> DataType:
    """Create a String DataType: A string of UTF8 characters."""
    return cls._from_pydatatype(PyDataType.string())

struct #

struct(fields: dict[str, DataType]) -> DataType

Create a Struct DataType: a nested type which has names mapped to child types.

Examples:

1
>>> struct_type = DataType.struct({"name": DataType.string(), "age": DataType.int64()})

Parameters:

Name Type Description Default
fields dict[str, DataType]

Nested fields of the Struct

required
Source code in daft/datatype.py
285
286
287
288
289
290
291
292
293
294
295
@classmethod
def struct(cls, fields: dict[str, DataType]) -> DataType:
    """Create a Struct DataType: a nested type which has names mapped to child types.

    Examples:
        >>> struct_type = DataType.struct({"name": DataType.string(), "age": DataType.int64()})

    Args:
        fields: Nested fields of the Struct
    """
    return cls._from_pydatatype(PyDataType.struct({name: datatype._dtype for name, datatype in fields.items()}))

tensor #

tensor(
    dtype: DataType, shape: tuple[int, ...] | None = None
) -> DataType

Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:

Name Type Description Default
dtype DataType

The type of the data contained within the tensor elements.

required
shape tuple[int, ...] | None

The shape of each tensor in the column. This is None by default, which allows the shapes of each tensor element to vary.

None
Source code in daft/datatype.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
@classmethod
def tensor(
    cls,
    dtype: DataType,
    shape: tuple[int, ...] | None = None,
) -> DataType:
    """Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided ``dtype`` as elements, each of the provided ``shape``.

    If a ``shape`` is given, each ndarray in the column will have this shape.

    If ``shape`` is not given, the ndarrays in the column can have different shapes. This is much more flexible,
    but will result in a less compact representation and may be make some operations less efficient.

    Args:
        dtype: The type of the data contained within the tensor elements.
        shape: The shape of each tensor in the column. This is ``None`` by default, which allows the shapes of
            each tensor element to vary.
    """
    if shape is not None:
        if not isinstance(shape, tuple) or not shape or any(not isinstance(n, int) for n in shape):
            raise ValueError("Tensor shape must be a non-empty tuple of ints, but got: ", shape)
    return cls._from_pydatatype(PyDataType.tensor(dtype._dtype, shape))

time #

time(timeunit: TimeUnit | str) -> DataType

Time DataType. Supported timeunits are "us", "ns".

Source code in daft/datatype.py
228
229
230
231
232
233
@classmethod
def time(cls, timeunit: TimeUnit | str) -> DataType:
    """Time DataType. Supported timeunits are "us", "ns"."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.time(timeunit._timeunit))

timestamp #

timestamp(
    timeunit: TimeUnit | str, timezone: str | None = None
) -> DataType

Timestamp DataType.

Source code in daft/datatype.py
235
236
237
238
239
240
@classmethod
def timestamp(cls, timeunit: TimeUnit | str, timezone: str | None = None) -> DataType:
    """Timestamp DataType."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.timestamp(timeunit._timeunit, timezone))

to_arrow_dtype #

to_arrow_dtype() -> DataType
Source code in daft/datatype.py
523
524
def to_arrow_dtype(self) -> pa.DataType:
    return self._dtype.to_arrow()

uint16 #

uint16() -> DataType

Create an unsigned 16-bit integer DataType.

Source code in daft/datatype.py
166
167
168
169
@classmethod
def uint16(cls) -> DataType:
    """Create an unsigned 16-bit integer DataType."""
    return cls._from_pydatatype(PyDataType.uint16())

uint32 #

uint32() -> DataType

Create an unsigned 32-bit integer DataType.

Source code in daft/datatype.py
171
172
173
174
@classmethod
def uint32(cls) -> DataType:
    """Create an unsigned 32-bit integer DataType."""
    return cls._from_pydatatype(PyDataType.uint32())

uint64 #

uint64() -> DataType

Create an unsigned 64-bit integer DataType.

Source code in daft/datatype.py
176
177
178
179
@classmethod
def uint64(cls) -> DataType:
    """Create an unsigned 64-bit integer DataType."""
    return cls._from_pydatatype(PyDataType.uint64())

uint8 #

uint8() -> DataType

Create an unsigned 8-bit integer DataType.

Source code in daft/datatype.py
161
162
163
164
@classmethod
def uint8(cls) -> DataType:
    """Create an unsigned 8-bit integer DataType."""
    return cls._from_pydatatype(PyDataType.uint8())