Skip to content

Add support for a pandasrc #4907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jtratner opened this issue Sep 20, 2013 · 38 comments
Closed

Add support for a pandasrc #4907

jtratner opened this issue Sep 20, 2013 · 38 comments
Assignees

Comments

@jtratner
Copy link
Contributor

Just using the existing configuration framework, but with a file format like matplotlib uses... See how they do it here: http://matplotlib.org/users/customizing.html.

Plus, we can document all the config options in a single file.

related #2452, #3046

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

+1 for a matplotlibish format

@ghost
Copy link

ghost commented Sep 21, 2013

both ipython and python itself have existing startup file mechanisms in place,
I'd still like to hear a good argument explaining what a .pandasrc file provides
that those do not.

@jtratner
Copy link
Contributor Author

@y-p lets you distribute a project with settings options in the directory to make it work the way you expect and allow you to set something for one project that you aren't doing for others. You can't unilaterally overwrite or append to people's python startup files. This is a way to get granular formatting without needing to alter those things or even know where they are. And this doesn't prevent you from overriding those settings in a startup file.

@ghost
Copy link

ghost commented Sep 21, 2013

I see, so it's not about an addition to a user's dotfiles but about making a project self-contained.
In that case, how would such a file improve on just putting some plain ol' initialization code in
the source files?

@jtratner
Copy link
Contributor Author

@y-p requires imports or setting environment variables. It's also a pretty minor addition - what's below plus a bit of searching for the path to the config file:

def from_object(obj):
    if hasattr('items'):
        set_option(*obj.items())
    else:
        for k in obj:
            set_option(k, obj[k])

def from_file(path_or_buf):
    from pandas.core.common import _get_handle
    option_splitter = re.compile('\s*[:=]\s*').split
    f = _get_handle(path_or_buf)
    errors = []
    for i, line in enumerate(f):
        # allow for comments
        line = line.split('#')[0].strip()
        if line:
            try:
                split = option_splitter(line)
                if len(split) == 2:
                    option, value = split
                    set_option(option, value)
                else:
                    raise ValueError("Malformed option")
            except (KeyError, ValueError) as e:
                errors.append("%d: %s" % (i, e))
    print errors

@jtratner
Copy link
Contributor Author

and clearly better errors, etc.

@jtratner
Copy link
Contributor Author

Plus, this separates config options from actual code, which is a net gain.

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

Personally, I'd rather set some config file than have a bunch of calls to pd.set_config() in a startup file. And like @jtratner says, supporting different configs for different projects by e.g., reading a single from the current directory is easier to keep track of than a bunch of calls to pd.set_config().

@ghost
Copy link

ghost commented Sep 21, 2013

Nope, initialization code does not requires either envars or imports.
Don't see how the errors would be clearly better. If it is, why not improve the existing
error messages?
If you want to seperate config options from actual code, use another source file, hey presto.
Configuraion is code last time I checked, and seems to be elsewhere as well.
I don't think I've worked on a project that does not have some sort of settings.py, that's
familiar to most people (certainly django folk).

I don't follow what @cpcloud means by "keeping track of" at all. configuration is
just code. I don't think that holds up.

re verbosity of pd.set_config. yep. big deal. How many LOC are we talking? meh.

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

Nope, initialization code does not requires either envars or imports.

I don't see how that's true re imports.

I don't think I've worked on a project that does not have some sort of settings.py

Lucky you. I have, and it's not fun.

I don't follow what @cpcloud means by "keeping track of" at all.

I just meant that I'd rather have a single config file per project than having to copy paste pd.set_config() and tweak the calls, but a settings.py-like file works just fine too. It's not a big deal, I actually use the defaults for almost everything so this doesn't really bother me.

And that's about all I'll say on that.

@jtratner
Copy link
Contributor Author

okay, that's fine. I'd like to add from_object (to allow you to create a dict and set a bunch of options at once) and make options support __getitem__ and __setitem__ just to clean things up.

In other words, allows you to do this (which is nearly equivalent to what the pandasrc would do in syntax, etc):

config.from_object({
'io.excel.xlsx.writer': 'openpyxl',
'display.max_rows': 80
})

and this

config.options['io.excel.xlsx.writer'] = 'openpyxl'
config.options['display.max_rows'] = 80

Is that okay? feels cleaner to me than a series of function calls.

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

look ok by me....only requrest I have is to make setting the docs for the option easier (not sure how as you generally need/want a multi-line, so end up creating a variable to hold it.....)

@ghost
Copy link

ghost commented Sep 22, 2013

Yeah, I like that too. The python logging module supports a from_dict for configuration,
which comes in handy in that situation. I find I need to change options so I don't miss
this in practice, but if you feel there's a need then from_dict is a good way to do it.

re supporting set/getitem, You can already get/set values directly, e.g. display.foo =1.
"Cleaner" is partly a matter of taste and that form isn't more concise then the
existing set_option mechanism, nor more convenient then the existing options. way of doing it
which also provides tab-completion. Do you feel strongly that adding a 3rd way to do the exact
same thing is worth it?

@jtratner
Copy link
Contributor Author

Didn't realize it supports setattr.

On Sun, Sep 22, 2013 at 7:09 AM, y-p [email protected] wrote:

Yeah, I like that too. The python logging module supports a from_dict for
configuration,
which comes in handy in that situation. I find I need to change options so
rarely I don't miss
this in practice, but if you feel there's a need then from_dict is a good
way to do it.

re supporting set/getitem, You can already get/set values directly, e.g. display.foo
=1.
"Cleaner" is partly a matter of taste and that form isn't not more concise
then the
existing set_option mechanism, nor more convenient then the existing
options. way of doing it
which also provides tab-completion. Do you feel strongly that adding a 3rd
way to do the exact
same thing is worth it?


Reply to this email directly or view it on GitHubhttps://github.com//issues/4907#issuecomment-24880117
.

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

heres a perfect case for this, #2612

@jreback
Copy link
Contributor

jreback commented Oct 2, 2013

push to 0.14?

@jtratner
Copy link
Contributor Author

jtratner commented Oct 2, 2013

Sure, but are we even doing this anymore?

On Wed, Oct 2, 2013 at 5:36 PM, jreback [email protected] wrote:

push to 0.14?


Reply to this email directly or view it on GitHubhttps://github.com//issues/4907#issuecomment-25579191
.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2013

can certainly close? didn't you have a use case for it?

@jtratner
Copy link
Contributor Author

jtratner commented Oct 2, 2013

it would be useful to me, but @y-p's point that it's not necessary seems reasonable too.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2013

ok move to someday or 014 for revisiting

@jtratner
Copy link
Contributor Author

jtratner commented Oct 2, 2013

whichever.

On Wed, Oct 2, 2013 at 7:15 PM, jreback [email protected] wrote:

ok move to someday or 014 for revisiting


Reply to this email directly or view it on GitHubhttps://github.com//issues/4907#issuecomment-25585303
.

@jreback
Copy link
Contributor

jreback commented Dec 19, 2013

@jtratner did you move this to 0.13?

@jtratner
Copy link
Contributor Author

I've put this for 0.13, because the feedback from SettingWithCopy suggests that there are a non-trivial number of people who are going to want to be able to shut off the copy warnings automatically.

@jtratner
Copy link
Contributor Author

shouldn't take too long to do either.

@jreback
Copy link
Contributor

jreback commented Dec 19, 2013

and how is this different that just doing pd.set_option('chained_assignment',None)....so far I saw exactly 1 comment on that from bigbug; and IMHO he should keep it on... my2c

@jtratner
Copy link
Contributor Author

it's not, it just means you can turn it off for old scripts. There have been one or two issues posted on pandas about this as well - e.g. #5597

@jreback
Copy link
Contributor

jreback commented Dec 19, 2013

users who want to turn off the warnings almost certainly have an ipython startup script already. I just think a pandasrc is pretty duplicative IMHO. The point of the warning is mostly for new users in any event.

@jtratner
Copy link
Contributor Author

okay, pushed to someday again

@ghost
Copy link

ghost commented Dec 19, 2013

The python/ipython startup files are less known then we might think, and there's
only a slight btw in the FAQ. I'll open a doc issue for 0.14.

@cbrnr
Copy link
Contributor

cbrnr commented Oct 18, 2017

I'd like to revive this issue. A pandasrc file would be very useful for people that do not want to import pandas in their IPython startup script.

@t-makaro
Copy link

I'd like to have a configuration file for pandas. I hate having to always place:

pd.set_option('display.latex.repr', True)
pd.set_option('display.latex.longtable', True)

at the start of my notebook especially since I always have to look it up.

Using an ipython startup file is not a solution since I don't want pandas to always import (I also don't want to hide the import. I just want to hide the settings that I use to export to pdf).

@benpayne
Copy link

I am looking into this implimenting this feature. I've reviewed the mentioned guidelines for implimenting based on matplotlibrc. It searches a few spots for the rc file that mostly make sense, but with a few exceptions.

Steps to find the config file.

  1. check local directory: Seems like a good idea for a projec to override settings
  2. checks a env varaible (MATPLOTLIBRC) and if it exists looks for a file at that path $MATPLOTLIBRC/matplotlibrc: Seems redundant to look for a file at that path. Why not just have the env variable point to the file. This would allow you to have several RC files in the same directroy and just change your env to get different behavior.
  3. Look at the users home dir and find a rc file there: No changes to this, makes sense.
  4. Check an install file location that will be over written every time the package is installed: This seems like a bad idea. While I think a example RC file in this location makes sense, something poeple can copy and modify for themselves. Also something that has comments documenting all the options. Making this file that has to be parsed for every user that imports pandas seems to be excesive. Furthermore if developers want to change some defaults of various options, change that in the code, not by a global config file. So I am planning to drop this step.

Please let me know if I am missing anything in this analysis.

Another feature that comes to mind when looking at this is if we should search for the first RC file and stop or this should be a bottom up stack approach? Basically would a user like to have global settings in there home dir for all projects and then have the ability for a local project to override some settings without replacing the global settings. Or would this get confusing? I'd impliment that by parsing every RC we find (3,2,1) and then calling pd.set_option for each setting in the files. That way if the same setting was set in two files, the last file parsed would be the setting we run with. I'd like to hear others thoughts on this.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 13, 2019 via email

@benpayne
Copy link

I agree that it would be nice to leverage something existing. The config file format proposed is not very standard (key: value). It is simple, but that could be a weakness or a strength down the road. As for parsing this with standard packages, python configparser won't work, shlex could work, but a simple line by line split on ":" would probably be the easiest way to parse this. Today the option system is designed to only handle a singe value any option. So the "key -> value" paramdim is good for our use. Unless you are enviosining that this will change soon?

I'm a fan of using JSON for config files. Parsing is easy, it's very flexible down the road and easy to understand no matter how experianced you are with it.

Is there some specific "prior art" you had in mind that I could look at to evaluate for our uses?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 15, 2019 via email

@t-makaro
Copy link

The Jupyter Project developed Traitlets which handles there configuration system. There config files are regular python files. Which is super awesome because I can use 1 config file that adapts nicely to multiple different systems that I manage.

One of the problems that I noticed with IPython startup files is that the startup file ran on any IPython environment that I used including ones that didn't have pandas installed. Something like a Pandas startup file that only loads on importing Pandas could work.

@benpayne
Copy link

I've created a tool to dump the currently support options in the file format proposed by @jtratner. See the attachment for this. So looking over the options that exist today, most are bool, int or strings. However one is a callback function (display.float_format). To actually support this in a config file the file would probably have to be python, like Jupyter. That certianly creates a powerful system. But the fact that someone could put any code in that file could create some interesting side effects.

The concern I have about something like Traitlets is that over laps with code that has already been created in pandas to register options, provide callbacks when changed and a framework for validators to be supported. It will also requier reworking code around each of the nearly 50 options that are supported today. However it would take these options from a centeralize place and put them in the code that uses them. Always good or eliminate centralization in code bases...

When starting this I was envisioning leaving that infrastructure in place and simply building a layer on top that reads setting from a config file and invokes the current, relativly robust, system for setting options. The fact is the feature could be as simple as adding code at startup to look for a config file (in python formating) and load that file. The File itself could be a series of set_option calls. This is probably 10 lines of code and has minimal overhead, espaecially if no file is found.

I've worked with Django and Flask before. They both use python files for configuration. I'm not sure under the covers if they are doing some centralized like we are today or more like Traitlets is. IPython uses Traitlets from what I've read. Dask I brefly looked into and learned that the file format propesed in this issue (key: value) is called YAML and there is a project that support this format (PyYAML).

So it seems the design decision are coming down to:

  • File Format: YAML vs Python
  • Internal Storage: Centralized vs Decerntalized.

If the concensus is to go with a decentralized approach like Traitlets then this will be a much bigger change to the codebase. If that is the case we might want to seperate this into two issue. Building out the config file as the issues describes and then a larger task to rework the stoarge of options into a decentralized maner.

pandasrc.txt

@jbrockmendel
Copy link
Member

Discussed this on today's dev call and the consensus was mostly-negative. @MarcoGorelli had some points about inevitable feature creep into people wanting overrides in command-line and in pyproject.toml files. If a champion steps up to implement+maintain this we can reconsider, but for now im closing as no action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants