-
Notifications
You must be signed in to change notification settings - Fork 125
Redo HTTP cookie parsing #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm, seems like some older versions of Swift won't import the function using the header shim. We'll need an actual C function to forward to |
116faed
to
d790ad4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a great idea, thanks! I've left some notes in the diff.
Also, the API breakage script doesn't seem to be able to build the module (it's failing to build/find the shims library, from what I can gather). I took a look at it, but it uses a lot of swift commands I'm not familiar with (like |
Just an FYI: I'm a bit swamped at the moment, but I'll push an update to address the feedback later this week |
450ac3e
to
36c6010
Compare
OK apologies for taking so long, and I'm aware people will likely be on holiday so maybe not able to review it for a while. I think I've addressed all the feedback: C library has been renamed, cookie parsing now uses the UTF-8 view, the Foundation Date's setter has been restored, etc. I've also added a lot more tests for invalid date strings. The test failures on 5.5/nightly appear to be related to async/Sendable stuff, so nothing to do with this patch. |
d4d7188
to
68c51ea
Compare
return nil | ||
} | ||
let timestampString: String | ||
if timestampUTF8.reversed().starts(with: "TMG".utf8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there is no general "hasSuffix" function, so this is the best I could do without adding more one-off helper functions 😅
I thought about doing .reversed().starts(with: "GMT".utf8.reversed())
, but the double-reversing could have a performance/size impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I think a one-off helper function will help with clarity here. 😉
if !hasPrefix(dquote) || !hasSuffix(dquote) { | ||
return self | ||
fileprivate init(utf8Slice: String.UTF8View.SubSequence, in base: String) { | ||
self.init(base[utf8Slice.startIndex..<utf8Slice.endIndex]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary for us to slice the base? Isn't utf8Slice
already the subsequence we're passing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
String.init?(String.UTF8View.Subsequence)
can fail if the indexes are not aligned to character boundaries. We don't expect that to happen, but the SSWG guidelines also say to avoid force-unwrapping wherever possible. -
String.init(Substring)
(used here) can not fail. If the indexes are not aligned to character boundaries, it will adjust them. It's... not unreasonable fallback behaviour, and means we can avoid a bunch of optionals that won't be ever be nil.
In terms of performance, they both allocate independent String values, so are both O(n). I would suppose the String.init(Substring)
initializer can be slightly faster, as by widening the bounds to aligned character boundaries, it may be able to skip UTF-8 validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a doc-comment, explaining how this is different from String.init?(String.UTF8View.Subsequence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather have the optionals, tbh. The SSWG guidelines are understandable, but we're asserting that these are ASCII strings, and so there can never be a case where we have alignment issues. However, I'd like to hear from @fabianfett (SSWG member with a lot of interest in AHC) and @tomerd (SSWG member) for some more guidance here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, technically we're only asserting that our processing (trimming, etc) operates on whole scalars, which is fine since we only do ASCII-level processing. The strings could contain Unicode and there still wouldn't be any possibility of non-aligned indexes.
That being said, this automatic alignment happens all the time - you've only noticed it here because I wrote it in a weird way, but it happens whenever you create a Substring
from a Substring.UTF8View
. I'm uploading a patch which makes it a little less ugly, but there isn't a great way to avoid it completely. The options are:
-
String?(Substring.UTF8View)
, meaning allocating a String, and dealing with the optionals somehow. And you've been consistent about wanting to avoid allocations (and I agree with that sentiment), and the optionals are gross, so I assume this is undesirable. -
Substring(Substring.UTF8View)
, which does not allocate but performs automatic alignment.
I don't mind either way, but those are the trade-offs.
case ("expires", .some(let value))?: | ||
self.expires_timestamp = parseCookieTime(value, in: header) | ||
case ("max-age", .some(let value))?: | ||
self.maxAge = Int(String(utf8Slice: value, in: header)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid reifying the String
here? We can construct Int
from an arbitrary StringProtocol
, which will save us the conversion cost.
guard !trimmedName.isEmpty else { | ||
return nil | ||
} | ||
return (String(decoding: trimmedName, as: UTF8.self).lowercased(), trimmedValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do a decoding
flow here? I think we can just construct the String
directly from the Substring.UTF8View
. Additionally, can we apply lowercased
on the Substring.UTF8View
ourselves? As we believe the text is entirely ASCII it's a simple bitwise operation, while lowercasing a Unicode string is complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, nothing checks that the string is ASCII, and that's the sort of optimisation which I would be very disappointed if the standard library didn't do by itself (especially since it has internal flags which means it already knows whether a String is ASCII).
But adding ASCII checks is only part of it - the set of allowed bytes depends on which part of the cookie string you're talking about (RFC-6265 - note that cookie-octet
is not the same as path-value
or extension-av
, and is actually a restricted subset of ASCII which excludes control characters and certain delimiters).
It would be useful to add those checks at some point, and the move to parsing at the UTF-8 level makes it easier to add that as a follow-up. But those should be accompanied by much more comprehensive tests which would bloat this PR, IMO, and distract from its focus which is timestamp parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard library does perform it, though it still has to alloc a new string to do it. C'est la vie. For now I'm happy to delay on this.
return nil | ||
} | ||
let timestampString: String | ||
if timestampUTF8.reversed().starts(with: "TMG".utf8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I think a one-off helper function will help with clarity here. 😉
let timestampString: String | ||
if timestampUTF8.reversed().starts(with: "TMG".utf8) { | ||
let timezoneStart = timestampUTF8.index(timestampUTF8.endIndex, offsetBy: -3) | ||
let trimmedTimestampUTF8 = timestampUTF8[..<timezoneStart].trimmingASCIISpaces() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing index(_:offsetBy:)
here, just do dropLast(3)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need the index, and this function already includes variables called timestampUTF8
, trimmedTimestampUTF8
, timestampString
, so storing a single index called timezoneStart
seems most easily understandable IMO.
And we can't do something like:
var trimmedTimestampUTF8 = timestampUTF8.dropLast(3)
trimmedTimestampUTF8 = trimmedTimestampUTF8.trimmingASCIISpaces()
...because the point is we need to check that trimmedTimestampUTF8.endIndex != timezoneStart
(i.e. there was at least one space between the timezone and rest of the timestamp).
@Lukasa , I've made the following changes:
The |
OK, 2 more commits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor notes here, this looks fantastic. Happy to take your feedback here, take whichever of the notes you feel good about, then let me know and I'll hit the big green button.
/// - maxAge: The cookie's age in seconds, defaults to nil. | ||
/// - httpOnly: Whether this cookie should be used by HTTP servers only, defaults to false. | ||
/// - secure: Whether this cookie should only be sent using secure channels, defaults to false. | ||
public init(name: String, value: String, path: String = "/", domain: String? = nil, expires: Date? = nil, maxAge: Int? = nil, httpOnly: Bool = false, secure: Bool = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is public API, should we make some room for ourselves by making this failable now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it failable would be an API-breaking change (semver-major) though, right?
It would be good to do, but I'd rather land this and check that out as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I lost track of the fact that this is an existing init
. We'd need to deprecate-and-replace it. Happy to delay that to a follow-up.
for component in components { | ||
switch component.parseCookieComponent() { | ||
case ("path", let value)?: | ||
// Unlike other values, unspecified and empty paths reset to the default path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we cite comments like this? I love to see them and I'm really pleased this one is here, but it makes my life much easier as a reviewer (and future reader) if I can rapidly follow the citation (in this case, RFC 6265 § 5.2.4).
parsedPath = value | ||
case ("domain", let value)?: | ||
guard var value = value, !value.isEmpty else { | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really mild nit and you can ignore it if you feel strongly, but I kinda hate break
in a switch
nested in a loop. It takes me a second or two to figure out to what it applies. Can we use continue
here instead?
} | ||
self.expires_timestamp = timestamp | ||
case ("max-age", let value)?: | ||
guard let value = value, let age = Int(Substring(value)) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun task for the future: let's just write our own Int.init(_ bytes: Substring.UTF8View)
initializer. It doesn't really matter, but it'd be fun and very slightly faster. Let's not do it now though, just planting seeds to keep you engaged in the future 😉.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not too difficult. I already did a similar thing for WebURL, because the standard library's version only accepts StringProtocol
rather than arbitrary buffers of bytes: https://github.com/karwa/swift-url/blob/1f1263f94e25eb43d8688dc668de6638e092f748/Sources/WebURL/Util/ASCII.swift#L502
But yeah, it can be a follow-up. I'll have a patch up which addresses the nits in a little bit.
@Lukasa okay, addressed 2 nits:
As for making the memberwise public initialiser failable, I'm not sure how API-breaking/semver-major changes are handled (leave the PR sitting around until the next release is scheduled? conditional compilation?), but it could be done by a follow-up PR. |
This PR does a couple of things:
Use
strptime
for time parsing.This is what Apple's documentation recommends for fixed, locale-independent formats. Unfortunately, Glibc needs us to define the
_GNU_SOURCE
flag, which requires the use of a C shim module.Besides URL (which is debatable as to how "required" it is - you can make requests without URLs), this is the only "required" functionality which depends on Foundation.
Rewrite cookie parsing
The previous implementation had more allocations and copies than were necessary, and some really subtle uses of Foundation APIs such as
.components(separatedBy: ";")
. I've replaced it with something that tries to keep things as Substrings as much as possible and uses only the standard library.Move Foundation APIs on HTTPClient.Cookie to a different file
It turns out, there are really not many places where AsyncHTTPClient uses Foundation types in public APIs. There are a couple of initializers on Request which take a URL, Cookie has this Date, and Body has a convenience for working with Data. But that's it. By putting these APIs in a separate file, it will be easier to make them opt-out for no-foundation builds.