I/O
Daft offers a variety of approaches to creating a DataFrame from reading various data sources (in-memory data, files, data catalogs, and integrations) and writing to various data sources. Please see Daft I/O API docs for API details.
In-Memory
CSV
Function | Description |
read_csv | Read a CSV file or multiple CSV files into a DataFrame |
write_csv | Write a DataFrame to CSV files |
Delta Lake
See also Delta Lake for detailed integration.
Hudi
Function | Description |
read_hudi | Read a Hudi table into a DataFrame |
See also Apache Hudi for detailed integration.
Iceberg
See also Iceberg for detailed integration.
JSON
Function | Description |
read_json | Read a JSON file or multiple JSON files into a DataFrame |
Lance
Function | Description |
read_lance | Read a Lance dataset into a DataFrame |
write_lance | Write a DataFrame to a Lance dataset |
Parquet
Function | Description |
read_parquet | Read a Parquet file or multiple Parquet files into a DataFrame |
write_parquet | Write a DataFrame to Parquet files |
SQL
Function | Description |
read_sql | Read data from a SQL database into a DataFrame |
WARC
Function | Description |
read_warc | Read a WARC file or multiple WARC files into a DataFrame |
User-Defined
Function | Description |
DataSink | Interface for writing data from DataFrames |
DataSource | Interface for reading data into DataFrames |
DataSourceTask | Represents a partition of data that can be processed independently |
WriteResult | Wrapper for intermediate results written by a DataSink |
write_sink | Write a DataFrame to the given DataSink |