Skip to content

Commit f5d2165

Browse files
committed
[SPARK-33440][CORE] Use current timestamp with warning log in HadoopFSDelegationTokenProvider when the issue date for token is not set up properly
### What changes were proposed in this pull request? This PR proposes to use current timestamp with warning log when the issue date for token is not set up properly. The next section will explain the rationalization with details. ### Why are the changes needed? Unfortunately not every implementations respect the `issue date` in `AbstractDelegationTokenIdentifier`, which Spark relies on while calculating. The default value of issue date is 0L, which is far from actual issue date, breaking logic on calculating next renewal date under some circumstance, leading to 0 interval (immediate) on rescheduling token renewal. In HadoopFSDelegationTokenProvider, Spark calculates token renewal interval as below: https://github.com/apache/spark/blob/2c64b731ae6a976b0d75a95901db849b4a0e2393/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala#L123-L134 The interval is calculated as `token.renew() - identifier.getIssueDate`, which is providing correct interval assuming both `token.renew()` and `identifier.getIssueDate` produce correct value, but it's going to be weird when `identifier.getIssueDate` provides 0L (default value), like below: ``` 20/10/13 06:34:19 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 1603175657000 for token S3ADelegationToken/IDBroker 20/10/13 06:34:19 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 86400048 for token HDFS_DELEGATION_TOKEN ``` Hopefully we pick the minimum value as safety guard (so in this case, `86400048` is being picked up), but the safety guard leads unintentional bad impact on this case. https://github.com/apache/spark/blob/2c64b731ae6a976b0d75a95901db849b4a0e2393/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala#L58-L71 Spark leverages the interval being calculated in above, "minimum" value of intervals, and blindly adds the value to token's issue date to calculates the next renewal date for the token, and picks "minimum" value again. In problematic case, the value would be `86400048` (86400048 + 0) which is quite smaller than current timestamp. https://github.com/apache/spark/blob/2c64b731ae6a976b0d75a95901db849b4a0e2393/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala#L228-L234 The next renewal date is subtracted with current timestamp again to get the interval, and multiplexed by configured ratio to produce the final schedule interval. In problematic case, this value goes to negative. https://github.com/apache/spark/blob/2c64b731ae6a976b0d75a95901db849b4a0e2393/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala#L180-L188 There's a safety guard to not allow negative value, but that's simply 0 meaning schedule immediately. This triggers next calculation of next renewal date to calculate the schedule interval, lead to the same behavior, hence updating delegation token immediately and continuously. As we fetch token just before the calculation happens, the actual issue date is likely slightly before, hence it's not that dangerous to use current timestamp as issue date for the token the issue date has not been set up properly. Still, it's better not to leave the token implementation as it is, so we log warn message to let end users consult with token implementer. ### Does this PR introduce _any_ user-facing change? Yes. End users won't encounter the tight loop of schedule of token renewal after the PR. In end users' perspective of reflection, there's nothing end users need to change. ### How was this patch tested? Manually tested with problematic environment. Closes #30366 from HeartSaVioR/SPARK-33440. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
1 parent c699435 commit f5d2165

File tree

2 files changed

+27
-4
lines changed

2 files changed

+27
-4
lines changed

core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ private[spark] class HadoopDelegationTokenManager(
178178

179179
private def scheduleRenewal(delay: Long): Unit = {
180180
val _delay = math.max(0, delay)
181-
logInfo(s"Scheduling renewal in ${UIUtils.formatDuration(delay)}.")
181+
logInfo(s"Scheduling renewal in ${UIUtils.formatDuration(_delay)}.")
182182

183183
val renewalTask = new Runnable() {
184184
override def run(): Unit = {
@@ -230,6 +230,8 @@ private[spark] class HadoopDelegationTokenManager(
230230
val now = System.currentTimeMillis
231231
val ratio = sparkConf.get(CREDENTIALS_RENEWAL_INTERVAL_RATIO)
232232
val delay = (ratio * (nextRenewal - now)).toLong
233+
logInfo(s"Calculated delay on renewal is $delay, based on next renewal $nextRenewal " +
234+
s"and the ratio $ratio, and current time $now")
233235
scheduleRenewal(delay)
234236
creds
235237
}

core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,8 @@ private[deploy] class HadoopFSDelegationTokenProvider
6363
val identifier = token
6464
.decodeIdentifier()
6565
.asInstanceOf[AbstractDelegationTokenIdentifier]
66-
identifier.getIssueDate + interval
66+
val tokenKind = token.getKind.toString
67+
getIssueDate(tokenKind, identifier) + interval
6768
}
6869
if (nextRenewalDates.isEmpty) None else Some(nextRenewalDates.min)
6970
}
@@ -126,13 +127,33 @@ private[deploy] class HadoopFSDelegationTokenProvider
126127
Try {
127128
val newExpiration = token.renew(hadoopConf)
128129
val identifier = token.decodeIdentifier().asInstanceOf[AbstractDelegationTokenIdentifier]
129-
val interval = newExpiration - identifier.getIssueDate
130-
logInfo(s"Renewal interval is $interval for token ${token.getKind.toString}")
130+
val tokenKind = token.getKind.toString
131+
val interval = newExpiration - getIssueDate(tokenKind, identifier)
132+
logInfo(s"Renewal interval is $interval for token $tokenKind")
131133
interval
132134
}.toOption
133135
}
134136
if (renewIntervals.isEmpty) None else Some(renewIntervals.min)
135137
}
138+
139+
private def getIssueDate(kind: String, identifier: AbstractDelegationTokenIdentifier): Long = {
140+
val now = System.currentTimeMillis()
141+
val issueDate = identifier.getIssueDate
142+
if (issueDate > now) {
143+
logWarning(s"Token $kind has set up issue date later than current time. (provided: " +
144+
s"$issueDate / current timestamp: $now) Please make sure clocks are in sync between " +
145+
"machines. If the issue is not a clock mismatch, consult token implementor to check " +
146+
"whether issue date is valid.")
147+
issueDate
148+
} else if (issueDate > 0L) {
149+
issueDate
150+
} else {
151+
logWarning(s"Token $kind has not set up issue date properly. (provided: $issueDate) " +
152+
s"Using current timestamp ($now) as issue date instead. Consult token implementor to fix " +
153+
"the behavior.")
154+
now
155+
}
156+
}
136157
}
137158

138159
private[deploy] object HadoopFSDelegationTokenProvider {

0 commit comments

Comments
 (0)