Data + AI Summit 2025

Data Engineering and Streaming

A Comprehensive Guide to Streaming on the Data Intelligence Platform

A Comprehensive Guide to Streaming on the Data Intelligence Platform - Data + AI Summit 2025 | Databricks
[Youtube link]((417) A Comprehensive Guide to Streaming on the Data Intelligence Platform - YouTube)

We’re building it with you using the latest capabilities in Apache Spark™ Structured Streaming. New advanced features, from state transformations to real-time mode. How Lakeflow Declarative Pipelines simplifies managing streaming pipelines. When to use your own streaming jobs versus Lakeflow Declarative Pipelines.

Ray Zhu (Product Team)
Indrajit Roy (Engineering Team)
Streaming versus batch
- Streaming
  - Technologies
    - Kafka
    - Amazon Kinesis
    - Confluent
    - Apache Spark Structured Streaming
    - Apache Flink
    - PubSub
  - Characteristcs
    - Continuous and Low Latency Processing
  - Semantics
    - Processes only new data iteratively from the source
    - Using a checkpointing mechanism to track what data has been processed from the source (stateful)
    - Source of the streaming doesn’t have to be message bus system
    - Streaming processing can be run in both triggered and cintinuous manners
  - Why streaming:
    - Efficient incremental processing
Two sides of incremental processing
New features

a digital garden

Explorer

Data + AI Summit 2025

Data Engineering and Streaming

A Comprehensive Guide to Streaming on the Data Intelligence Platform

Table of Contents