Skip to content

Roadmap#

Last updated: May 2025

What is in store for Daft in 2025? This roadmap outlines the big picture of what the Daft team plans to work on in the coming year, as well as some of the features to expect from these.

Please note that items on this roadmap are subject to change any time. If there are features you would like to implement, we highly welcome and encourage open source contributions! Our team is happy to provide guidance, help scope the work, and review PRs. Feel free to open an issue or PR on Github or join our Daft Slack Community.

Multimodality#

  • Support generic data source and data sink interfaces that can be implemented outside of Daft
  • Enhanced support for JSON with a VARIANT data type and JSON_TABLE
  • More in-built and optimized expressions for multimodal and nested datatypes

AI#

  • Higher level abstractions for building AI applications on top of Daft
  • Better AI-specific observability and metrics in AI functions
    • Tokens per second
    • Estimated API costs
  • Better primitives for AI workloads (discussion #3547)
    • Async UDFs
    • Streaming UDFs
  • Native LLM inference functions with Pydantic integration (discussion #2774)

Performance & Scalability#

  • Incorporate our local streaming execution engine (Swordfish) into distributed ray runner
    • Handle Map-only workloads at any scale factor (100TB+)
    • Handle 10TB+ Shuffle workloads
  • More powerful cost-based optimizer, implementing advanced optimizations
    • Improve the join ordering algorithm to be dynamic-programming based
    • Semi-join reduction
    • Common subquery elimination
  • To complement our blazing fast S3 readers, we aim to build the fastest S3 writes in the wild west

Out-of-the-box Experience#

Future Work#

The following features would be valuable additions to Daft, but are not currently on our immediate development roadmap. We're sharing these to highlight opportunities for open source contributions, invite discussion around implementation approaches, and provide visibility into longer-term possibilities. These features have been tagged with help wanted and good first issue on Daft repo.

If you are interested in working on any of these features, feel free to open an issue or start a discussion on Github or join our Daft Slack Community. Our team can provide technical direction and help scope the work appropriately. Thank you in advance 💜