Skip to content
This repository was archived by the owner on Apr 12, 2024. It is now read-only.

Why does Angular need its own angular.uppercase and lowercase methods? #11387

Closed
thorn0 opened this issue Mar 21, 2015 · 16 comments
Closed

Why does Angular need its own angular.uppercase and lowercase methods? #11387

thorn0 opened this issue Mar 21, 2015 · 16 comments

Comments

@thorn0
Copy link
Contributor

thorn0 commented Mar 21, 2015

What's wrong with the standard toUpperCase and toLowerCase methods?
Looks like there was some issue with the Turkish locale, but does it still exist in the browsers supported by Angular? I asked about it on StackOverflow, but didn't get an answer.

@realityking
Copy link
Contributor

The reason is documented in a source comment:

angular.js/src/Angular.js

Lines 161 to 163 in e5d1d65

// String#toLowerCase and String#toUpperCase don't produce correct results in browsers with Turkish
// locale, for this reason we need to detect this case and redefine lowercase/uppercase methods
// with correct but slower alternatives.

It's also documented on MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleLowerCase

Lastly, I think the StackOverflow answers were pretty accurate.

@thorn0
Copy link
Contributor Author

thorn0 commented Mar 21, 2015

Neither the comment nor the SO answers are informative.
The question is

In which exactly JS engines are toLowerCase() & toUpperCase() locale-sensitive?"

How do those answers answer it? How does the source comment answer it? Moreover, the comment and the answers contradict each other. In theory, toLowerCase() & toUpperCase() shouldn't be locale-sensitive. That's what one of the answers says. And your MDN link says the same thing. But the comment states something very opposite:

String#toLowerCase and String#toUpperCase don't produce correct results in browsers with Turkish locale

Again, in which exactly browsers does this happen?

@wesleycho
Copy link
Contributor

Searching on Google, this seems like a general problem due to the Turkish language itself - this article seems to explain it: http://www.i18nguy.com/unicode/turkish-i18n.html

If I had to guess, this is a problem with every browser.

@thorn0
Copy link
Contributor Author

thorn0 commented Mar 21, 2015

The general problem exists, that's true. But who said that it (still) exists also in JavaScript? Who faced it during say last 10 years?

@wesleycho
Copy link
Contributor

I doubt it was pulled from thin air - that is not how the Angular team functions.

@thorn0
Copy link
Contributor Author

thorn0 commented Mar 21, 2015

The hacks and workarounds for IE8 are being removed from the code of Angular now. But this thing looks like a workaround for a bug in some even older browser.

@pkozlowski-opensource
Copy link
Member

@thorn0 why not testing on your end (you seem to be from Turkey) and sending a PR to remove those work-arounds if no longer necessary? I'm pretty sure that everyone would be more than happy to remove unnecessary code.

@thorn0
Copy link
Contributor Author

thorn0 commented Mar 21, 2015

I wanted to know why this code is there. If the issue still exists (if ever existed) in some browser, it'd be nice to know about it. I'm going to check it in different browsers, but of course I don't have access to all the needed OS+device+browser combinations.

@pkozlowski-opensource
Copy link
Member

@thorn0 AFAIK Turkish locale was the only / main reason. So if you can confirm that this bug doesn't affect users of modern browsers and submit a PR to see if all the tests are passing on CI this could potentially be merging. But as you are saying, no one will be able to re-test this on all the possible devices in the wild, so we might potentially introduce a breaking change here....

@thorn0
Copy link
Contributor Author

thorn0 commented Mar 21, 2015

BTW, another library containing code like that is Google Closure Library. See these lines. Supposedly, it came to Angular from there.

@lgalfaso
Copy link
Contributor

I think this can be split into two issues

  • Why does Angular need an implementation of toUpperCase and toLowerCase
  • Why does Angular needs to have a public method that exposes this

The former point should be easy to decide if someone can verify that this is no longer an issue with the supported browsers
The later would be hard to remove, but I would be ok to deprecate it (even if it still works)

@ryanhart2
Copy link
Contributor

In case it helps, the code was added to the angular.js file by @mhevery in Oct 2010 as part of this commit "create HTML sanitizer to allow inclusion of untrusted HTML in safe manner".

Everything that I could find on the topic only mentioned this being a problem in Java. The ECMAScript standards (3 and 5) state that the toUpperCase and toLowerCase functions are not locale sensitive and the toLocaleUpperCase and toLocaleLowerCase functions are locale sensitive.

So it doesn't seem to be required, but I don't know how to test on multiple browsers in the Turkish locale.

@thorn0
Copy link
Contributor Author

thorn0 commented Oct 24, 2015

I think the right thing to do for now would be just undocumenting angular.uppercase and angular.lowercase without removing them.

@mhevery
Copy link
Contributor

mhevery commented Mar 21, 2017

The basic issue is that when malicious code writes SCRIPT we need to detect it and take it out. The way we do it is to do toLowerCase on it, but that will produce scrıpt (notice no dot on ı) in some locales and then 'scrıpt' === 'script' fails which means that malicious code gets by the sanitizer.

If you can prove that in JS "I".toLowerCase() === "i" true on all locals, than the code is safe to remove. This code was added on request of google security team.

@petebacondarwin
Copy link
Contributor

I notice that there are various places in the sanitizer that "don't" use our custom lowercase. E.g. https://github.com/angular/angular.js/blob/master/src/ngSanitize/sanitize.js#L378
Does this need to be fixed?

@thorn0
Copy link
Contributor Author

thorn0 commented Mar 21, 2017

As far as I can tell, AngularJS and the Closure Library are the only libraries that include this workaround, and nowhere else on the Internet is it mentioned that this problem ever existed in the JS world, only in Java.

mgol added a commit to mgol/angular.js that referenced this issue Apr 5, 2017
The `manualLowercase` & `manualUppercase` functions were inspired by Google Caja
code. Caja is written in Java, though, where problems with `toLowerCase`
working differently in Turkish locale are well known[1]. In JavaScript
`String#toLowerCase` is defined in the ECMAScript spec and all implementations
are required to lowercase I in the same way, regardless of the current locale.
Differences may (and do) happen only in `String#toLocaleLowerCase`.

Other libraries doing string normalization, like jQuery or DOMPurify don't
apply special lowercasing logic in a Turkish environment.

Therefore, the `manualLowercase` & `manualUppercase` logic is dead code in
AngularJS and can be removed.

Also, the `manualLowercase` & `manualUppercase` functions are incomplete; they
only lowercase ASCII characters which is different to native
`String#toLowerCase`. Since those functions are used in many places in the
library, they would break a lot of code. For example, the lowercase filter would
not lowercase Ω to ω but leave it as Ω.

[1] https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

Ref angular#11387
mgol added a commit to mgol/angular.js that referenced this issue Apr 12, 2017
The `manualLowercase` & `manualUppercase` functions were inspired by Google Caja
code. Caja is written in Java, though, where problems with `toLowerCase`
working differently in Turkish locale are well known[1]. In JavaScript
`String#toLowerCase` is defined in the ECMAScript spec and all implementations
are required to lowercase I in the same way, regardless of the current locale.
Differences may (and do) happen only in `String#toLocaleLowerCase`.

Other libraries doing string normalization, like jQuery or DOMPurify don't
apply special lowercasing logic in a Turkish environment.

Therefore, the `manualLowercase` & `manualUppercase` logic is dead code in
AngularJS and can be removed.

Also, the `manualLowercase` & `manualUppercase` functions are incomplete; they
only lowercase ASCII characters which is different to native
`String#toLowerCase`. Since those functions are used in many places in the
library, they would break a lot of code. For example, the lowercase filter would
not lowercase Ω to ω but leave it as Ω.

[1] https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

Ref angular#11387
mgol added a commit to mgol/angular.js that referenced this issue May 24, 2017
The `manualLowercase` & `manualUppercase` functions were inspired by Google Caja
code. Caja is written in Java, though, where problems with `toLowerCase`
working differently in Turkish locale are well known[1]. In JavaScript
`String#toLowerCase` is defined in the ECMAScript spec and all implementations
are required to lowercase I in the same way, regardless of the current locale.
Differences may (and do) happen only in `String#toLocaleLowerCase`.

Other libraries doing string normalization, like jQuery or DOMPurify don't
apply special lowercasing logic in a Turkish environment.

Therefore, the `manualLowercase` & `manualUppercase` logic is dead code in
AngularJS and can be removed.

Also, the `manualLowercase` & `manualUppercase` functions are incomplete; they
only lowercase ASCII characters which is different to native
`String#toLowerCase`. Since those functions are used in many places in the
library, they would break a lot of code. For example, the lowercase filter would
not lowercase Ω to ω but leave it as Ω.

[1] https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

Ref angular#11387
mgol added a commit to mgol/angular.js that referenced this issue Jun 5, 2017
The `manualLowercase` & `manualUppercase` functions were inspired by Google Caja
code. Caja is written in Java, though, where problems with `toLowerCase`
working differently in Turkish locale are well known[1]. In JavaScript
`String#toLowerCase` is defined in the ECMAScript spec and all implementations
are required to lowercase I in the same way, regardless of the current locale.
Differences may (and do) happen only in `String#toLocaleLowerCase`.

Other libraries doing string normalization, like jQuery or DOMPurify don't
apply special lowercasing logic in a Turkish environment.

Therefore, the `manualLowercase` & `manualUppercase` logic is dead code in
AngularJS and can be removed.

Also, the `manualLowercase` & `manualUppercase` functions are incomplete; they
only lowercase ASCII characters which is different to native
`String#toLowerCase`. Since those functions are used in many places in the
library, they would break a lot of code. For example, the lowercase filter would
not lowercase Ω to ω but leave it as Ω.

[1] https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

Ref angular#11387
mgol added a commit to mgol/angular.js that referenced this issue Nov 8, 2017
The `manualLowercase` & `manualUppercase` functions were inspired by Google Caja
code. Caja is written in Java, though, where problems with `toLowerCase`
working differently in Turkish locale are well known[1]. In JavaScript
`String#toLowerCase` is defined in the ECMAScript spec and all implementations
are required to lowercase I in the same way, regardless of the current locale.
Differences may (and do) happen only in `String#toLocaleLowerCase`.

Other libraries doing string normalization, like jQuery or DOMPurify don't
apply special lowercasing logic in a Turkish environment.

Therefore, the `manualLowercase` & `manualUppercase` logic is dead code in
AngularJS and can be removed.

Also, the `manualLowercase` & `manualUppercase` functions are incomplete; they
only lowercase ASCII characters which is different to native
`String#toLowerCase`. Since those functions are used in many places in the
library, they would break a lot of code. For example, the lowercase filter would
not lowercase Ω to ω but leave it as Ω.

[1] https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

Ref angular#11387
mgol added a commit to mgol/angular.js that referenced this issue Apr 4, 2018
The `manualLowercase` & `manualUppercase` functions were inspired by Google Caja
code. Caja is written in Java, though, where problems with `toLowerCase`
working differently in Turkish locale are well known[1]. In JavaScript
`String#toLowerCase` is defined in the ECMAScript spec and all implementations
are required to lowercase I in the same way, regardless of the current locale.
Differences may (and do) happen only in `String#toLocaleLowerCase`.

Other libraries doing string normalization, like jQuery or DOMPurify don't
apply special lowercasing logic in a Turkish environment.

Therefore, the `manualLowercase` & `manualUppercase` logic is dead code in
AngularJS and can be removed.

Also, the `manualLowercase` & `manualUppercase` functions are incomplete; they
only lowercase ASCII characters which is different to native
`String#toLowerCase`. Since those functions are used in many places in the
library, they would break a lot of code. For example, the lowercase filter would
not lowercase Ω to ω but leave it as Ω.

[1] https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

Ref angular#11387
mgol added a commit to mgol/angular.js that referenced this issue Apr 6, 2018
The `manualLowercase` & `manualUppercase` functions were inspired by Google Caja
code. Caja is written in Java, though, where problems with `toLowerCase`
working differently in Turkish locale are well known[1]. In JavaScript
`String#toLowerCase` is defined in the ECMAScript spec and all implementations
are required to lowercase I in the same way, regardless of the current locale.
Differences may (and do) happen only in `String#toLocaleLowerCase`.

Other libraries doing string normalization, like jQuery or DOMPurify don't
apply special lowercasing logic in a Turkish environment.

Therefore, the `manualLowercase` & `manualUppercase` logic is dead code in
AngularJS and can be removed.

Also, the `manualLowercase` & `manualUppercase` functions are incomplete; they
only lowercase ASCII characters which is different to native
`String#toLowerCase`. Since those functions are used in many places in the
library, they would break a lot of code. For example, the lowercase filter would
not lowercase Ω to ω but leave it as Ω.

[1] https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

Ref angular#11387
mgol added a commit that referenced this issue Apr 13, 2018
The `manualLowercase` & `manualUppercase` functions were inspired by Google Caja
code which worked around Java issues where problems with `toLowerCase`
working differently in Turkish locale are well known[1]. In JavaScript
`String#toLowerCase` is defined in the ECMAScript spec and all implementations
are required to lowercase I in the same way, regardless of the current locale.
Differences may (and do) happen only in `String#toLocaleLowerCase`.

The mirroring of the Java workarounds in Caja was needed due to an old Rhino bug.
Rhino is a pre-Nashorn JavaScript interpreter written in Java and it used to
delegate `String.prototype.toLowerCase` to `java.lang.String.toLowerCase`. This
has since been long fixed.

Other libraries doing string normalization, like jQuery or DOMPurify don't
apply special lowercasing logic in a Turkish environment.

Therefore, the `manualLowercase` & `manualUppercase` logic is dead code in
AngularJS and can be removed.

Also, the `manualLowercase` & `manualUppercase` functions are incomplete; they
only lowercase ASCII characters which is different to native
`String#toLowerCase`. Since those functions are used in many places in the
library, they would break a lot of code. For example, the lowercase filter would
not lowercase Ω to ω but leave it as Ω.

[1] https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

Closes #15890
Ref #11387
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants