Skip to content
This repository was archived by the owner on Jun 18, 2024. It is now read-only.

Documentation should specify the required unicode character encoding #214

Closed
philipashlock opened this issue Dec 3, 2013 · 9 comments
Closed

Comments

@philipashlock
Copy link
Contributor

The use of unicode for character encoding is part of the JSON standard with a strong emphasis on UTF-8.

"The character encoding of JSON text is always Unicode. UTF-8 is the only encoding that makes sense on the wire, but UTF-16 and UTF-32 are also permitted."
source: http://www.json.org/fatfree.html

This should be articulated in the documentation. Since this is part of the JSON standard, this could be described as part of the "Metadata File Format – JSON" section on schema.md but it might also make sense to mention in the JSON section or it's own section on catalog.md

cc: @Raseman

@gbinal
Copy link
Contributor

gbinal commented Dec 3, 2013

Makes sense - Are you game to suggest a quick edit to /catalog to that effect?

@JoshData
Copy link
Contributor

JoshData commented Feb 9, 2014

+1, and, to be more specific, that required encoding should be UTF-8 specifically, not just any of the encodings mentioned by the JSON spec

@waldoj
Copy link

waldoj commented Feb 9, 2014

Interesting! I had thought this was an unnecessary specification (I seem to recall writing that in response to a ticket here, many months ago), because I thought that UTF-8 was the only acceptable JSON encoding. I had no idea that UTF-16- and UTF-32 were permitted.

@konklone
Copy link
Contributor

konklone commented Feb 9, 2014

And I had no idea what the difference between UTF-8, UTF-16, and UTF-32 were until reading those Wikipedia articles just now. Now I understand. And now I would like to forget. UTF-8 forever.

@JoshData
Copy link
Contributor

JoshData commented Feb 9, 2014

@waldoj That's what I thought too. Thanks @philipashlock for upping our games! :)

@waldoj
Copy link

waldoj commented Feb 9, 2014

And I had no idea what the difference between UTF-8, UTF-16, and UTF-32 were until reading those Wikipedia articles was just now.

Oh, you sweet, innocent boy.

;)

@gbinal
Copy link
Contributor

gbinal commented Feb 10, 2014

Just to be clear, then, this would take the form of clarification in the documentation of Project Open Data, not necc. in anything being needed in the public data listing itself, right?

If so, does the following work and if not, would anyone be game to take a swing at it?

The json catalog files should only use UTF-8 character encoding.

@gbinal
Copy link
Contributor

gbinal commented May 5, 2014

I've proposed #304 for this.

@gbinal
Copy link
Contributor

gbinal commented Aug 1, 2014

#304 resolved this.

@gbinal gbinal closed this as completed Aug 1, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants