-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: is StataReader supposed to assign the index? #3641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jseabold this is more your baliwick |
Doesn't to_csv and read_csv etc. roundtrip like this by default too? I don't much like it. I always have to do |
I guess maybe an option is needed as well (like
|
alright, will move this to 0.12, in case want to add the enhancement ( |
IIUC, Stata data files should have a 'sorted by' field somewhere, but I don't think it is used anywhere at present inside the StataReader/StataWriter machinery. Since sorting seems to be used infrequently in the Stata community and does not map directly into the index concept of Pandas, I would suggest not to fiddle with that attribute. Adding both of the suggested enhancements at the Python level would be a very good thing, though. Columns are always named in Stata, so the array of cases to think through would be much smaller than in read_csv. |
@hmgaudecker this doesn't have anything to do with Stata per se, more of an interface to pandas. In that the index comes out as a column; I am not sure if there is a way to record that the index should be set after recreation in pandas. (it is easy enough to do |
That was my point, maybe a bit too short: If Stata had the same concept of an index as Pandas, the natural way would be to use it in both directions (I thought that was what you meant by whether there was a way to record this). But it doesn't. The closest thing it has would be the 'sorted by' thing, but for datasets saved by the average Stata user, one wouldn't want to infer (by default) that this is the index. |
@hmgaudecker fair enough. To be consistent I actually think that the param |
I think the concept "saved with an index" just doesn't apply to Stata files. For the round trip, sure - but that is probably more of a use case for tests (important enough, but maybe not for the defaults) than anything in real life. I see two typical cases.
So I think the current "defaults" (i.e. writer with Leaves the question of what to do upon writing if the |
you options look right. If there is no name, I believe its just set as ' |
The above looks like it. But that behaviour could be confusing to the casual Pandas and regular Stata user. And probably throw a strange error if a column named |
closed as needs to be user defined as stata is not a complete serialization format |
This is the example in io.rst for Stata (in current master)
coming from PR #3270 and issue #1512
I am not sure if you have enough information saved to know that this needs a
df.set_index('index')
?
The text was updated successfully, but these errors were encountered: