Skip to content

Parity: URLCache #2341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Parity: URLCache #2341

wants to merge 1 commit into from

Conversation

millenomi
Copy link
Contributor

This strictly implements the class. We need a separate adoption patch to hook it up to _HTTPProtocol, URLSession and URLSessionConfiguration. It has on-disk and in-memory caching. This also fixes a couple issues with archiving.

Implement it all.
@millenomi
Copy link
Contributor Author

cc @drodriguez @karthikkeyan

@millenomi
Copy link
Contributor Author

@swift-ci please test linux

@millenomi
Copy link
Contributor Author

I'm strongly considering _FTPProtocol not employing caching.

@millenomi
Copy link
Contributor Author

I mean obviously _HTTPURLProtocol/_FTPURLProtocol, whoops.

Copy link
Contributor

@karthikkeyan karthikkeyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation @millenomi I really like it.

The only improvement I would want to suggest is to reduces the number of loops in evictFrom* methods. Overall the methods time is O(n), but I would try to reduce number of loops if possible; that can be done in subsequent PRs, or I will try when I get some spare time.


private func identifier(for request: URLRequest) -> String? {
guard let url = request.url?.absoluteString else { return nil }
return Data(url.utf8).base64EncodedString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of base64EncodedString(), I totally forgot about the existence of this API when I was implementing.

Just one thing about this implementation, what if the same URL is used for different HTTP-Methods, doesn't this implementation produce the same identifier for different URLRequest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this identifier should not be returned for many cases of URLRequest. Some methods should not be cached (PUT, POST?), some HTTP headers might instruct not to be cached. I also think that fragments in an URL should not be taken into account while caching, and I'm not sure about query parameters.

I also don't like it is unbounded, and an URL can be up to 4KB in size or something like that, but file names cannot be that long in some systems.

It is a simple start, but I would not mind a TODO here.

@karthikkeyan
Copy link
Contributor

@millenomi Something like this,

func evictFromMemoryCacheAssumingLockHeld(maximumSize: Int) {
    var totalSize = 0
    var orderedEntryProperties: [(cost: Int, index: Int)] = []
    for (index, identifier) in inMemoryCacheOrder.enumerated() {
        let cost = inMemoryCacheContents[identifier]!.cost
        orderedEntryProperties.append((cost, index))
        totalSize += cost
    }
    
    guard totalSize > maximumSize else { return }
    
    var i = 0
    for entryProperty in orderedEntryProperties {
        inMemoryCacheContents.removeValue(forKey: inMemoryCacheOrder[entryProperty.index])
        totalSize -= entryProperty.cost
        if totalSize < maximumSize {
            break
        }
        
        i += 1
    }
    inMemoryCacheOrder = inMemoryCacheOrder[(i + 1)..<inMemoryCacheOrder.count]
}

This way we can reduce the 6 different loops down to just 2. I am not sure if it is worth doing, but what you think about the above approach?

Copy link
Contributor

@drodriguez drodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that there's no locking around the disk operations. There's moments that the disk files are removed and created, and I don't think some of those operations will behave as expected when several threads are trying to store/recover cached contents for the same key.

It is also not ideal that the disk operations block the caller. But once this is in, I can propose a lazy implementation for the disk part, and check that it works correctly.

Thanks for the work!

}

let nsmimetype = aDecoder.decodeObject(of: NSString.self, forKey: "NS.mimeType")
self.mimeType = nsmimetype as String?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This optional member uses one code pattern (directly assigning nil), while the two below use another code pattern (if let). As far as I see there's no difference in results, but using two different patterns might indicate intention. I would recommend using the same pattern in all three.

try FileManager.default.removeItem(at: writableTestDirectoryURL)
try FileManager.default.createDirectory(at: writableTestDirectoryURL, withIntermediateDirectories: true)

XCTAssertNil(cache.cachedResponse(for: request))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn’t it be better to recreate a second cache in the same directory and query that new cache, forcing to use the disk, since the memory of the first cache is gone?

super.setUp()

let pid = ProcessInfo.processInfo.processIdentifier
writableTestDirectoryURL = URL(fileURLWithPath: NSTemporaryDirectory()).appendingPathComponent("org.swift.TestFoundation.TestURLCache.\(pid)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processIdentifier will not change from test to test. It might be better to use ProcessInfo.globallyUniqueString instead, to avoid reusing the same directory again and again.


inMemoryCacheLock.performLocked {
if inMemoryCacheContents[identifier] != nil {
inMemoryCacheOrder.removeAll(where: { $0 == identifier })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there should be only one, maybe avoid the removeAll, and because we know it should exist if it is present in the dictionary, we can use forced unwrap.

inMemoryCacheOrder.remove(at: inMemoryCacheOrder.firstIndex(where: { $0 == identifier })!)

inMemoryCacheOrder = []
}

evictFromDiskCache(maximumSize: 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if removing the directory and recreating it would be faster, instead of enumerating and deleting each file one by one.

if entry.value.date > date {
identifiersToRemove.insert(entry.key)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let identifiersToRemove = inMemoryCacheContents.values.filter { $0.date > date }.map { $0.identifier }

for toRemove in identifiersToRemove {
inMemoryCacheContents.removeValue(forKey: toRemove)
}
inMemoryCacheOrder.removeAll { identifiersToRemove.contains($0) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be done iterating the array, and using the dictionary for lookups, instead of ending doing O(n*m) in this final step.

inMemoryCacheOrder.removeAll { identifier in
  if inMemoryCacheContents[identifier]!.date > date {
    inMemoryCacheContents.removeValue(forKey: identifier)
    return true
  } else {
    return false
  }
}

This way, the code above is unnecessary.


for url in urlsToRemove {
try? FileManager.default.removeItem(at: url)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the implementation of enumerateDiskEntries, since the contentsOfDirectory(at:…) are generated before invoking the block, I think you can merge both loops into one and forget about urlsToRemove.

enumerateDiskEntries { entry, _ in
  if entry.date > data {
    try? FileManager.default.removeItem(at: url)
  }
}

@millenomi
Copy link
Contributor Author

Reviving this.

@millenomi
Copy link
Contributor Author

Actually: I'm going to close this, and post a consolidated PR that includes the HTTP correctness fixes discussed above (but not your suggestions yet, @drodriguez — I'm going to go through them.)

@millenomi
Copy link
Contributor Author

Moved to #2401.

@millenomi millenomi closed this Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants