Skip to content

Replace bad StringReader usage with String.Split() #784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 6, 2018

Conversation

rjmholt
Copy link
Contributor

@rjmholt rjmholt commented Oct 23, 2018

Fixes #781.

See also: https://github.com/dotnet/corefx/issues/32990.

StringReader.ReadLine() likes to ditch the last line in a file if it's a newline.

The simplest approach is to replace with a String.Split(), which I've done in this PR.

This would probably help up with all the off-by-one errors we keep seeing.

If we don't think String.Split() is efficient enough (or want to deduplicate the creation of the array and feeding it into a list), I would be very happy to rewrite this manually.


return list;
}
return new List<string>(text.Split(s_newlines, StringSplitOptions.None));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I take it this splits by \r\n if it finds \r\n and then splits by \n if it just finds \n? Just want to make sure that it doesn't get split by \n if it finds a \r\n

Copy link
Contributor Author

@rjmholt rjmholt Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you raise an excellent point! I need to reorder the array:

To avoid ambiguous results when strings in separator have characters in common, the Split operation proceeds from the beginning to the end of the value of the instance, and matches the first element in separator that is equal to a delimiter in the instance. The order in which substrings are encountered in the instance takes precedence over the order of elements in separator.

Copy link
Member

@TylerLeonhardt TylerLeonhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 after you get CI to pass

Copy link
Collaborator

@SeeminglyScience SeeminglyScience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... do we really need to store each line every single edit? Always struck me as odd but I never looked into why we were doing it. It'd be great if we could maybe just store the offsets until something actually called GetLines().

I don't know how efficient the full framework implementation of string.Split is but I'm sure the net core is more than good enough. It's probably not worth over engineering this unless we can do away with allocating a string for every line.

Maybe take a peek and see how feasible it would be to change that, otherwise LGTM.

@rjmholt
Copy link
Contributor Author

rjmholt commented Oct 24, 2018

@SeeminglyScience this change should only affect ScriptFile creation right? Like reading in a file for the first time:

/// <summary>
/// Creates a new ScriptFile instance by reading file contents from
/// the given TextReader.
/// </summary>
/// <param name="filePath">The path at which the script file resides.</param>
/// <param name="clientFilePath">The path which the client uses to identify the file.</param>
/// <param name="textReader">The TextReader to use for reading the file's contents.</param>
/// <param name="powerShellVersion">The version of PowerShell for which the script is being parsed.</param>
public ScriptFile(
string filePath,
string clientFilePath,
TextReader textReader,
Version powerShellVersion)
{
this.FilePath = filePath;
this.ClientFilePath = clientFilePath;
this.IsAnalysisEnabled = true;
this.IsInMemory = Workspace.IsPathInMemory(filePath);
this.powerShellVersion = powerShellVersion;
this.SetFileContents(textReader.ReadToEnd());
}

The actual lines are managed by that ApplyChange() method, which applies on a line-by-line basis (so the expense of the edit in a normal file does not depend on the file size, just the edit size):

/// <summary>
/// Applies the provided FileChange to the file's contents
/// </summary>
/// <param name="fileChange">The FileChange to apply to the file's contents.</param>
public void ApplyChange(FileChange fileChange)
{
// Break up the change lines
string[] changeLines = fileChange.InsertString.Split('\n');
if (fileChange.IsReload)
{
this.FileLines.Clear();
foreach (var changeLine in changeLines)
{
this.FileLines.Add(changeLine);
}
}
else
{
// VSCode sometimes likes to give the change start line as (FileLines.Count + 1).
// This used to crash EditorServices, but we now treat it as an append.
// See https://github.com/PowerShell/vscode-powershell/issues/1283
if (fileChange.Line == this.FileLines.Count + 1)
{
foreach (string addedLine in changeLines)
{
string finalLine = addedLine.TrimEnd('\r');
this.FileLines.Add(finalLine);
}
}
// Similarly, when lines are deleted from the end of the file,
// VSCode likes to give the end line as (FileLines.Count + 1).
else if (fileChange.EndLine == this.FileLines.Count + 1 && String.Empty.Equals(fileChange.InsertString))
{
int lineIndex = fileChange.Line - 1;
this.FileLines.RemoveRange(lineIndex, this.FileLines.Count - lineIndex);
}
// Otherwise, the change needs to go between existing content
else
{
this.ValidatePosition(fileChange.Line, fileChange.Offset);
this.ValidatePosition(fileChange.EndLine, fileChange.EndOffset);
// Get the first fragment of the first line
string firstLineFragment =
this.FileLines[fileChange.Line - 1]
.Substring(0, fileChange.Offset - 1);
// Get the last fragment of the last line
string endLine = this.FileLines[fileChange.EndLine - 1];
string lastLineFragment =
endLine.Substring(
fileChange.EndOffset - 1,
(this.FileLines[fileChange.EndLine - 1].Length - fileChange.EndOffset) + 1);
// Remove the old lines
for (int i = 0; i <= fileChange.EndLine - fileChange.Line; i++)
{
this.FileLines.RemoveAt(fileChange.Line - 1);
}
// Build and insert the new lines
int currentLineNumber = fileChange.Line;
for (int changeIndex = 0; changeIndex < changeLines.Length; changeIndex++)
{
// Since we split the lines above using \n, make sure to
// trim the ending \r's off as well.
string finalLine = changeLines[changeIndex].TrimEnd('\r');
// Should we add first or last line fragments?
if (changeIndex == 0)
{
// Append the first line fragment
finalLine = firstLineFragment + finalLine;
}
if (changeIndex == changeLines.Length - 1)
{
// Append the last line fragment
finalLine = finalLine + lastLineFragment;
}
this.FileLines.Insert(currentLineNumber - 1, finalLine);
currentLineNumber++;
}
}
}
// Parse the script again to be up-to-date
this.ParseFileContents();
}

This list of lines data structure is similar to how vi handles text.

I've actually been thinking about exploring other text-editor data structures to possibly optimise the efficiency of storing and editing our version of files. A few that I've heard of and might be worth looking at are:

@SeeminglyScience
Copy link
Collaborator

@rjmholt Yeah nevermind... I forgot the change event only gives us partials. Definitely not worth messing with in this PR.

LGTM!

Copy link
Member

@TylerLeonhardt TylerLeonhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@rkeithhill rkeithhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rjmholt rjmholt merged commit 5afa456 into PowerShell:master Nov 6, 2018
@rjmholt rjmholt deleted the remove-stringreader-usage branch December 12, 2018 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants