-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: break read_json, read_msgpack API, disallow string data input #5954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tell me again why this is a problem? |
Because it overloads too much. the filename typo yielding a json parse error and the 150k filename |
I just think its conveient... e.g.
i though you were going to have a kw that 'fixed' this, e.g. |
I don't remember suggesting that. We've discussed this...? |
no...was suggesting that (or maybe |
Yeah, I think the load/loads convention is pretty familiar, that'd be preferable and |
I'd vote for string= kwarg over anything else, just because for non-C |
I don't follow the non-C bit. Loking at it, How about we remove data string input from read_msgpack (being binary) If we move to load/loads, we'd still have to deal with Note again that neither |
related to #5957 actually json/msgpack/html are consistent NOW (csv is not though). why again is this a problem; it does the right thing with strings / files what are the edge cases you are referring? a mistyped filename gives a JSON error? |
I explained the rationale several times already, going over it again isn't likely to help. |
and read_html does not accept filenames except as url, so they are not consistent. |
Then that's a bug. Last time I checked, |
So at least the docstring suggests it's reasonable. Et tu, Brute? sigh |
Oh whoops I responded too hastily, I'm not sure if |
Thank you for that. To me it's obvious this should change, but considering there's so much |
In the interest of completeness, and at the risk of beating a dead horse,
|
Of course if it's a non existent path then it will be treated as a raw string...which is the issue. |
revisiting #5655 due to recent #5874, it's just not right.
most pandas read_* methods take a path/url of file like. the newish read_json/read_msgpack
also accept a (byte)string of data, and tries to guess whether it's data or a filepath.
That creates weird corner cases and is sufficently wrong IMO that it's worth
making a breaking change. Users will now have to wrap their data in BytesIO/StringIO.
not much of a problem, except perhaps the lost convenience of
pd.read_json(j)
. But since jsonusually comes from a file or url which are supported directly I'm hoping it won't be
that disruptive.
0.14.0 ofcourse.