Data Engineering and Streaming

A Comprehensive Guide to Streaming on the Data Intelligence Platform

We’re building it with you using the latest capabilities in Apache Spark™ Structured Streaming. New advanced features, from state transformations to real-time mode. How Lakeflow Declarative Pipelines simplifies managing streaming pipelines. When to use your own streaming jobs versus Lakeflow Declarative Pipelines.

  • Ray Zhu (Product Team)

  • Indrajit Roy (Engineering Team)

  • Streaming versus batch

    • Streaming
      • Technologies
        • Kafka
        • Amazon Kinesis
        • Confluent
        • Apache Spark Structured Streaming
        • Apache Flink
        • PubSub
      • Characteristcs
        • Continuous and Low Latency Processing
      • Semantics
        • Processes only new data iteratively from the source
        • Using a checkpointing mechanism to track what data has been processed from the source (stateful)
        • Source of the streaming doesn’t have to be message bus system
        • Streaming processing can be run in both triggered and cintinuous manners
      • Why streaming:
        • Efficient incremental processing
  • Two sides of incremental processing

  • New features