Data Engineering and Streaming
A Comprehensive Guide to Streaming on the Data Intelligence Platform
- A Comprehensive Guide to Streaming on the Data Intelligence Platform - Data + AI Summit 2025 | Databricks
- [Youtube link]((417) A Comprehensive Guide to Streaming on the Data Intelligence Platform - YouTube)
We’re building it with you using the latest capabilities in Apache Spark™ Structured Streaming. New advanced features, from state transformations to real-time mode. How Lakeflow Declarative Pipelines simplifies managing streaming pipelines. When to use your own streaming jobs versus Lakeflow Declarative Pipelines.
-
Ray Zhu (Product Team)
-
Indrajit Roy (Engineering Team)
-
Streaming versus batch
- Streaming
- Technologies
- Kafka
- Amazon Kinesis
- Confluent
- Apache Spark Structured Streaming
- Apache Flink
- PubSub
- Characteristcs
- Continuous and Low Latency Processing
- Semantics
- Processes only new data iteratively from the source
- Using a checkpointing mechanism to track what data has been processed from the source (stateful)
- Source of the streaming doesn’t have to be message bus system
- Streaming processing can be run in both triggered and cintinuous manners
- Why streaming:
- Efficient incremental processing
- Technologies
- Streaming
-
Two sides of incremental processing
-
New features