layout: global displayTitle: Structured Streaming Programming Guide title: Structured Streaming Programming Guide license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Miscellaneous Notes

  • Several configurations are not modifiable after the query has run. To change them, discard the checkpoint and start a new query. These configurations include:
    • spark.sql.shuffle.partitions
      • This is due to the physical partitioning of state: state is partitioned via applying hash function to key, hence the number of partitions for state should be unchanged.
      • If you want to run fewer tasks for stateful operations, coalesce would help with avoiding unnecessary repartitioning.
        • After coalesce, the number of (reduced) tasks will be kept unless another shuffle happens.
    • spark.sql.streaming.stateStore.providerClass: To read the previous state of the query properly, the class of state store provider should be unchanged.
    • spark.sql.streaming.multipleWatermarkPolicy: Modification of this would lead inconsistent watermark value when query contains multiple watermarks, hence the policy should be unchanged.

Related Resources

Further Reading

Talks

Migration Guide

The migration guide is now archived on this page.