Skip to content

Commit d4e32c8

Browse files
committed
[SPARK-35654][CORE] Allow ShuffleDataIO control DiskBlockManager.deleteFilesOnStop
### What changes were proposed in this pull request? This PR aims to change `DiskBlockManager` like the following to allow `ShuffleDataIO` to decide the behavior of shuffle file deletion. ```scala - private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolean) + private[spark] class DiskBlockManager(conf: SparkConf, var deleteFilesOnStop: Boolean) ``` ### Why are the changes needed? `SparkContext` creates 1. `SparkEnv` (with `BlockManager` and its `DiskBlockManager`) 2. loads `ShuffleDataIO` 3. initialize block manager. ```scala _env = createSparkEnv(_conf, isLocal, listenerBus) ... _shuffleDriverComponents = ShuffleDataIOUtils.loadShuffleDataIO(config).driver() _shuffleDriverComponents.initializeApplication().asScala.foreach { case (k, v) => _conf.set(ShuffleDataIOUtils.SHUFFLE_SPARK_CONF_PREFIX + k, v) } ... _env.blockManager.initialize(_applicationId) ... ``` `DiskBlockManager` is created first at `BlockManager` constructor and we cannot change `deleteFilesOnStop` later at `ShuffleDataIO`. By switching to `var`, we can implement enhanced shuffle data management feature via `ShuffleDataIO` like #32730 . ``` val diskBlockManager = { // Only perform cleanup if an external service is not serving our shuffle files. val deleteFilesOnStop = !externalShuffleServiceEnabled || executorId == SparkContext.DRIVER_IDENTIFIER new DiskBlockManager(conf, deleteFilesOnStop) } ``` ### Does this PR introduce _any_ user-facing change? No. This is a private class. ### How was this patch tested? N/A Closes #32784 from dongjoon-hyun/SPARK-35654. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 5e30666 commit d4e32c8

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala

+4-1
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,11 @@ import org.apache.spark.util.{ShutdownHookManager, Utils}
3232
*
3333
* Block files are hashed among the directories listed in spark.local.dir (or in
3434
* SPARK_LOCAL_DIRS, if it's set).
35+
*
36+
* ShuffleDataIO also can change the behavior of deleteFilesOnStop.
3537
*/
36-
private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolean) extends Logging {
38+
private[spark] class DiskBlockManager(conf: SparkConf, var deleteFilesOnStop: Boolean)
39+
extends Logging {
3740

3841
private[spark] val subDirsPerLocalDir = conf.get(config.DISKSTORE_SUB_DIRECTORIES)
3942

0 commit comments

Comments
 (0)