Skip to content

Reduce allocations when parsing files #715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1096,6 +1096,7 @@ protected async Task HandleCommentHelpRequest(
triggerLine1b,
out helpLocation);
var result = new CommentHelpRequestResult();
IList<string> lines = null;
if (functionDefinitionAst != null)
{
var funcExtent = functionDefinitionAst.Extent;
Expand All @@ -1104,7 +1105,7 @@ protected async Task HandleCommentHelpRequest(
{
// check if the previous character is `<` because it invalidates
// the param block the follows it.
var lines = ScriptFile.GetLines(funcText).ToArray();
lines = ScriptFile.GetLines(funcText);
var relativeTriggerLine0b = triggerLine1b - funcExtent.StartLineNumber;
if (relativeTriggerLine0b > 0 && lines[relativeTriggerLine0b].IndexOf("<") > -1)
{
Expand All @@ -1122,8 +1123,12 @@ protected async Task HandleCommentHelpRequest(
requestParams.BlockComment,
true,
helpLocation));

var help = analysisResults?.FirstOrDefault()?.Correction?.Edits[0].Text;
result.Content = help == null ? null : ScriptFile.GetLines(help).ToArray();
result.Content = help != null
? (lines ?? ScriptFile.GetLines(funcText)).ToArray()
: null;

if (helpLocation != null &&
!helpLocation.Equals("before", StringComparison.OrdinalIgnoreCase))
{
Expand Down
25 changes: 22 additions & 3 deletions src/PowerShellEditorServices/Workspace/ScriptFile.cs
Original file line number Diff line number Diff line change
Expand Up @@ -197,14 +197,33 @@ public ScriptFile(
/// </summary>
/// <param name="text">Input string to be split up into lines.</param>
/// <returns>The lines in the string.</returns>
public static IEnumerable<string> GetLines(string text)
public static IList<string> GetLines(string text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to replace all of this with calling

text.Split(Environment.NewLine)

That will be platform agnostic - or rather leave it up to .NET Core to do the right thing.

Link:
https://docs.microsoft.com/en-us/dotnet/api/system.environment.newline?view=netcore-2.0#System_Environment_NewLine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. You'd either end up with too many empty strings, or not enough. If you're expecting the index into the list of lines to correspond to the line number in the file, it wouldn't be correct...

PS C:\> "foo`r`nbar".Split([environment]::newline).Length
3
PS C:\> "foo`r`n`r`nbar".Split([environment]::newline, [StringSplitOptions]::RemoveEmptyEntries).Length
2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not replicating that behaviour you're seeing there:

> "foo`r`nbar".Split("`r`n").Length
2
> "foo`nbar".Split([environment]::NewLine).Length  
2

I'd imagine we want to keep empty lines for fidelity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's a core vs framework difference. Core has an overload for string, the framework does not. So on PS 5.1, it picks the (I think?) params char[] version and coerces the string to that and we get the undesired behavior.

PS C:\> $psversiontable

Name                           Value
----                           -----
PSVersion                      5.1.15063.1206
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.15063.1206
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1


PS C:\> "foo`r`nbar".Split("`r`n").Length
3
PS C:\> 'a'.split

OverloadDefinitions
-------------------
string[] Split(Params char[] separator)
string[] Split(char[] separator, int count)
string[] Split(char[] separator, System.StringSplitOptions options)
string[] Split(char[] separator, int count, System.StringSplitOptions options)
string[] Split(string[] separator, System.StringSplitOptions options)
string[] Split(string[] separator, int count, System.StringSplitOptions options)



PS C:\> "foo`r`nbar".Split(([string[]]"`r`n"), [StringSplitOptions]::None).Length
2

Core has a string version and the options has a default value (https://github.com/dotnet/coreclr/blob/0fbd855e38bc3ec269479b5f6bf561dcfd67cbb6/src/System.Private.CoreLib/shared/System/String.Manipulation.cs#L1244), so that gets picked.

PS C:\> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      6.1.0-preview.4
PSEdition                      Core
GitCommitId                    6.1.0-preview.4
OS                             Microsoft Windows 10.0.15063
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0


PS C:\>  "foo`r`nbar".Split("`r`n").Length
2
PS C:\> 'a'.split

OverloadDefinitions
-------------------
string[] Split(char separator, System.StringSplitOptions options)
string[] Split(char separator, int count, System.StringSplitOptions options)
string[] Split(Params char[] separator)
string[] Split(char[] separator, int count)
string[] Split(char[] separator, System.StringSplitOptions options)
string[] Split(char[] separator, int count, System.StringSplitOptions options)
string[] Split(string separator, System.StringSplitOptions options)
string[] Split(string separator, int count, System.StringSplitOptions options)
string[] Split(string[] separator, System.StringSplitOptions options)
string[] Split(string[] separator, int count, System.StringSplitOptions options)



PS C:\>  "foo`r`nbar".Split("`r`n".ToCharArray()).Length
3

All of that said, I think it's still broken:

PS C:\>  "foo`nbar".Split("`r`n").Length
1

There are lots of valid reasons to be dealing with Unix line-broken files on Windows (and to a lesser extent vice-versa), so I don't think relying on Environment.NewLine works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. In that case I'm reasonably satisfied the complexity is unavoidable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringReader appears promising, Split not so much, especially on the full framework. The complicated code still edges out StringReader on execution time, but not allocations, and it's not by a lot. Let me know if you'd prefer the StringReader method and I'll update the PR. With either implementation, giving the List a hint on capacity with a crude heuristic of lines averaging 50 chars (the actual average across a few hundred thousand lines of real PS I had handy was 43) seems like a marginal win with little downside.

Code: https://gist.github.com/mattpwhite/c2ec7bcded6645d529574969f65e7d4f

Results (using a ~1300 line, ~55000 char source file)

Framework:

// * Summary *

BenchmarkDotNet=v0.11.0, OS=Windows 10.0.15063.1206 (1703/CreatorsUpdate/Redstone2)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
Frequency=3515626 Hz, Resolution=284.4444 ns, Timer=TSC
  [Host]    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3130.0
  RyuJitX64 : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3130.0

Job=RyuJitX64  Jit=RyuJit  Platform=X64

                    Method |      Mean |     Error |    StdDev |    Gen 0 |    Gen 1 |    Gen 2 | Allocated |
-------------------------- |----------:|----------:|----------:|---------:|---------:|---------:|----------:|
                  Original | 192.61 us | 0.6213 us | 0.5811 us | 133.3008 |  66.6504 |  66.6504 | 581.12 KB |
               Complicated |  74.19 us | 0.3941 us | 0.3686 us |  33.5693 |  14.6484 |        - | 167.45 KB |
  ComplicatedGuessCapacity |  70.97 us | 0.3468 us | 0.3244 us |  32.8369 |  11.8408 |        - | 161.47 KB |
                     Split | 349.20 us | 1.2177 us | 1.1391 us | 133.3008 | 133.3008 | 133.3008 | 582.84 KB |
              StringReader |  78.63 us | 0.3658 us | 0.3422 us |  34.7900 |  12.3291 |        - | 167.49 KB |
 StringReaderGuessCapacity |  77.80 us | 0.2394 us | 0.2123 us |  33.3252 |  10.9863 |        - | 161.51 KB |

Core:

BenchmarkDotNet=v0.11.0, OS=Windows 10.0.15063.1206 (1703/CreatorsUpdate/Redstone2)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
Frequency=3515626 Hz, Resolution=284.4444 ns, Timer=TSC
.NET Core SDK=2.1.302
  [Host]    : .NET Core ? (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT
  RyuJitX64 : .NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT

Job=RyuJitX64  Jit=RyuJit  Platform=X64

                    Method |      Mean |     Error |    StdDev |   Gen 0 |   Gen 1 | Allocated |
-------------------------- |----------:|----------:|----------:|--------:|--------:|----------:|
                  Original | 134.68 us | 0.7214 us | 0.6748 us | 62.2559 | 21.2402 | 299.61 KB |
               Complicated |  69.97 us | 0.4752 us | 0.4213 us | 34.9121 | 11.8408 | 167.45 KB |
  ComplicatedGuessCapacity |  67.13 us | 0.3257 us | 0.2887 us | 34.9121 |  8.1787 | 161.47 KB |
                     Split | 207.69 us | 1.1943 us | 1.0587 us | 31.0059 |  9.5215 | 145.49 KB |
              StringReader |  80.09 us | 0.3387 us | 0.2828 us | 34.5459 | 11.8408 | 167.48 KB |
 StringReaderGuessCapacity |  77.07 us | 0.3251 us | 0.3041 us | 33.4473 |  9.7656 |  161.5 KB |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringReader seems like the way to go

{
if (text == null)
{
throw new ArgumentNullException(nameof(text));
}

return text.Split('\n').Select(line => line.TrimEnd('\r'));
// ReadLine returns null immediately for empty string, so special case it.
if (text.Length == 0)
{
return new List<string> {string.Empty};
}

using (var reader = new StringReader(text))
{
// 50 is a rough guess for typical average line length, this saves some list
// resizes in the common case and does not hurt meaningfully if we're wrong.
var list = new List<string>(text.Length / 50);
string line;

while ((line = reader.ReadLine()) != null)
{
list.Add(line);
}

return list;
}
}

/// <summary>
Expand Down Expand Up @@ -517,7 +536,7 @@ private void SetFileContents(string fileContents)
{
// Split the file contents into lines and trim
// any carriage returns from the strings.
this.FileLines = GetLines(fileContents).ToList();
this.FileLines = GetLines(fileContents);

// Parse the contents to get syntax tree and errors
this.ParseFileContents();
Expand Down
29 changes: 29 additions & 0 deletions test/PowerShellEditorServices.Test/Session/ScriptFileTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,35 @@ public void CanGetRangeAtLineBoundaries()

Assert.Equal(expectedLines, lines);
}

[Fact]
public void CanSplitLines()
{
Assert.Equal(TestStringLines, scriptFile.FileLines);
}

[Fact]
public void CanGetSameLinesWithUnixLineBreaks()
{
var unixFile = ScriptFileChangeTests.CreateScriptFile(TestString.Replace("\r\n", "\n"));
Assert.Equal(scriptFile.FileLines, unixFile.FileLines);
}

[Fact]
public void CanGetLineForEmptyString()
{
var emptyFile = ScriptFileChangeTests.CreateScriptFile(string.Empty);
Assert.Equal(1, emptyFile.FileLines.Count);
Assert.Equal(string.Empty, emptyFile.FileLines[0]);
}

[Fact]
public void CanGetLineForSpace()
{
var spaceFile = ScriptFileChangeTests.CreateScriptFile(" ");
Assert.Equal(1, spaceFile.FileLines.Count);
Assert.Equal(" ", spaceFile.FileLines[0]);
}
}

public class ScriptFilePositionTests
Expand Down