Skip to content

Encoding problems #229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MarioSilv opened this issue Mar 22, 2019 · 2 comments
Open

Encoding problems #229

MarioSilv opened this issue Mar 22, 2019 · 2 comments

Comments

@MarioSilv
Copy link

Greetings.
I have a simple string object like this
String blabla = "{"id": 8,"name": "SANTARÉM"}",
in which i use JsonIterator.deserialize(blabla).get("name") and what i get is "SANTARɍ" and not "SANTARÉM";
I already try to check if JsonIterator has some configuration for enconding strings but didn't find anything.

Kind Regards,

@miere
Copy link

miere commented Apr 25, 2019

Hey @MarioSilv , I've faced the same issue in the past, but I'm not sure if it's the same situation as yours. I'm afraid your issue would be easily reproducible if you give us more context.

The best case scenario would be if you provide us a (really) small project containing a simple (straightforward) unit test reproducing this error.

@leocampos
Copy link

leocampos commented Nov 14, 2019

How to reproduce the problem:
Set the JVM to a default encoding (such as US-ASCII)
// Add a VM option: -Dfile.encoding=US-ASCII

  public static void main(String[] args) {
    String jsonWithVeryCommonCharacterInGerman = "{\"name\":\"Thomas Müller\"}";
    Any anyFromThatJson = JsonIterator.deserialize(jsonWithVeryCommonCharacterInGerman);
    String backToText = JsonStream.serialize(anyFromThatJson);
    System.out.println(backToText);
  }

prints {"name":"Thomas M?ller"}

What happens is this:

public static final Any deserialize(String input) {
        return deserialize(input.getBytes()); //<- Uses getBytes without the option to provide the encoding
}

This part is pretty easy to get around by providing the array of bytes already decoded:
JsonIterator.deserialize(toBeDeserialized.getBytes(StandardCharsets.UTF_8)); //charset here just the example. You have to know the encoding you have your strings in.

The biggest problem is in the serialize method:
JsonStream.serialize
which creates a new String without providing the encoding
var4 = new String(stream.buf, 0, stream.count);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants