-
Notifications
You must be signed in to change notification settings - Fork 234
Reduce allocations when parsing files #715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to replace all of this with calling
That will be platform agnostic - or rather leave it up to .NET Core to do the right thing.
Link:
https://docs.microsoft.com/en-us/dotnet/api/system.environment.newline?view=netcore-2.0#System_Environment_NewLine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. You'd either end up with too many empty strings, or not enough. If you're expecting the index into the list of lines to correspond to the line number in the file, it wouldn't be correct...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not replicating that behaviour you're seeing there:
I'd imagine we want to keep empty lines for fidelity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's a core vs framework difference. Core has an overload for
string
, the framework does not. So on PS 5.1, it picks the (I think?)params char[]
version and coerces the string to that and we get the undesired behavior.Core has a
string
version and the options has a default value (https://github.com/dotnet/coreclr/blob/0fbd855e38bc3ec269479b5f6bf561dcfd67cbb6/src/System.Private.CoreLib/shared/System/String.Manipulation.cs#L1244), so that gets picked.All of that said, I think it's still broken:
There are lots of valid reasons to be dealing with Unix line-broken files on Windows (and to a lesser extent vice-versa), so I don't think relying on
Environment.NewLine
works.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. In that case I'm reasonably satisfied the complexity is unavoidable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about one of these solutions:
https://stackoverflow.com/questions/1547476/easiest-way-to-split-a-string-on-newlines-in-net
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringReader
appears promising,Split
not so much, especially on the full framework. The complicated code still edges out StringReader on execution time, but not allocations, and it's not by a lot. Let me know if you'd prefer the StringReader method and I'll update the PR. With either implementation, giving the List a hint on capacity with a crude heuristic of lines averaging 50 chars (the actual average across a few hundred thousand lines of real PS I had handy was 43) seems like a marginal win with little downside.Code: https://gist.github.com/mattpwhite/c2ec7bcded6645d529574969f65e7d4f
Results (using a ~1300 line, ~55000 char source file)
Framework:
Core:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringReader
seems like the way to go