Spark Release 3.4.4

Spark 3.4.4 is the last maintenance release containing security and correctness fixes. This release is based on the branch-3.4 maintenance branch of Spark. We strongly recommend all 3.4 users to upgrade to this stable release.

Notable changes

  • [SPARK-43242]: Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
  • [SPARK-45988]: Fix typehints to handle list GenericAlias in Python 3.11+
  • [SPARK-46535]: Fix NPE when describe extended a column without col stats
  • [SPARK-46957]: Decommission migrated shuffle files should be able to cleanup from executor
  • [SPARK-47129]: Make ResolveRelations cache connect plan properly
  • [SPARK-47172]: Add support for AES-GCM for RPC encryption
  • [SPARK-47828]: DataFrameWriterV2.overwrite fails with invalid plan
  • [SPARK-47895]: group by all should be idempotent
  • [SPARK-47897]: Fix ExpressionSet performance regression in scala 2.12
  • [SPARK-47927]: Fix nullability attribute in UDF decoder
  • [SPARK-48016]: Fix a bug in try_divide function when with decimals
  • [SPARK-48019]: Fix incorrect behavior in ColumnVector/ColumnarArray with dictionary and nulls
  • [SPARK-48037]: Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
  • [SPARK-48081]: Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type
  • [SPARK-48105]: Fix the race condition between state store unloading and snapshotting
  • [SPARK-48128]: For BitwiseCount / bit_count expression, fix codegen syntax error for boolean type inputs
  • [SPARK-48155]: AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec
  • [SPARK-48172]: Fix escaping issues in JDBCDialects
  • [SPARK-48248]: Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
  • [SPARK-48292]: Revert SPARK-39195 Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
  • [SPARK-48484]: Fix: V2Write use the same TaskAttemptId for different task attempts
  • [SPARK-48642]: False SparkOutOfMemoryError caused by killing task on spilling
  • [SPARK-48710]: Limit NumPy version to supported range (>=1.15,<2)
  • [SPARK-48759]: Add migration doc for CREATE TABLE AS SELECT behavior change behavior change since Spark 3.4
  • [SPARK-48791]: Fix perf regression caused by the accumulators registration overhead using CopyOnWriteArrayList
  • [SPARK-48930]: Redact awsAccessKeyId by including accesskey pattern
  • [SPARK-48934]: Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
  • [SPARK-48965]: Use the correct schema in Dataset#toJSON
  • [SPARK-48991]: Move path initialization into try-catch block in FileStreamSink.hasMetadata
  • [SPARK-49000]: Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
  • [SPARK-49005]: Use 17-jammy tag instead of 17-jre to prevent Python 3.12
  • [SPARK-49038]: SQLMetric should report the raw value in the accumulator update event
  • [SPARK-49039]: Reset checkbox when executor metrics are loaded in the Stages tab
  • [SPARK-49094]: Fix ignoreCorruptFiles non-functioning for hive orc impl with mergeSchema off
  • [SPARK-49176]: Fix spark.ui.custom.executor.log.url docs by adding K8s
  • [SPARK-49179]: Fix v2 multi bucketed inner joins throw AssertionError
  • [SPARK-49182]: Stop publish site/docs/{version}/api/python/_sources dir
  • [SPARK-49193]: Improve the performance of RowSetUtils.toColumnBasedSet
  • [SPARK-49197]: Redact Spark Command output in launcher module
  • [SPARK-49261]: Don’t replace literals in aggregate expressions with group-by expressions
  • [SPARK-49352]: Avoid redundant array transform for identical expression
  • [SPARK-49385]: Fix getReusablePVCs to use podCreationTimeout instead of podAllocationDelay
  • [SPARK-49408]: Use IndexedSeq in ProjectingInternalRow
  • [SPARK-49595]: Fix DataFrame.unpivot/melt in Spark Connect Scala Client
  • [SPARK-49628]: ConstantFolding should copy stateful expression before evaluating
  • [SPARK-49750]: Mention delegation token support in K8s mode
  • [SPARK-49760]: Correct handling of SPARK_USER env variable override in app master
  • [SPARK-49804]: Fix to use the exit code of executor container always
  • [SPARK-49836]: Fix possibly broken query when window is provided to window/session_window fn
  • [SPARK-49843]: Fix change comment on char/varchar columns
  • [SPARK-49959]: Fix ColumnarArray.copy() to read nulls from the correct offset
  • [SPARK-50021]: Fix ApplicationPage to hide App UI links when UI is disabled
  • [SPARK-50022]: Fix MasterPage to hide App UI links when UI is disabled

Dependency Changes

While being a maintenance release we did still upgrade some dependencies in this release they are:

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.


Spark News Archive