-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH/DOC: stability guide #5027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@dengemann Two points:
|
Ok, great!
Good question. The bad news: Figuring out what works and and what doesn't is tedious and time-consuming. Often, for example, I needed to convince people that pandas is suitable backend for production code, simply because things just did not work the way expected across platforms / versions. Before we write polished docs just a few blockers / solutions / the most pertinent points
def check_pandas_version(min_version):
""" Check minimum Pandas version required
Parameters
----------
min_version : str
The version string. Anything that matches
``'(\\d+ | [a-z]+ | \\.)'``
"""
is_good = False if LooseVersion(pd.__version__) < min_version else True
return is_good
def check_line_index(lines):
"""Check whether lines are safe for parsing
Parameters
----------
lines : list of str
A list of strings as returned from a file object
Returns
-------
lines : list of str
The edited list of strings in case the Pandas version
is not recent enough.
"""
if check_pandas_version('0.8'):
return lines
else: # 92mu -- fastest, please don't change
return [str(x) + ' ' + y for x, y in enumerate(lines)] Took me quite some time to figure what's wrong. In one application this even lead me to dropping support for older pandas versions. But this is not good. In science update cycles are slower ...
I know this may sound slightly accusing, but my goal is to help people using pandas for production code + convince others this is a good idea (which I think is the case). |
Not sure what you mean by this. |
... A more systematic approach might be to establish a core set API tests that are validated across versions (unit tests that pass across let's say for the five last releases). Throwing thoughts ... |
@dengemann a couple of other questions:
|
@dengemann so are the issues in #368 actually resolved or not? |
I'm not sure we really need bugfix releases. It's more about making accessible what did not change across time ...
Sorry. I'm parsing discrete events from files, assemble lists of lines, and then use fast read_table parsing on StingIO objects. These functions serve as preprocessors to warrant the functionality for users using older pandas versions. |
throwing in my 2c here. pandas changed quite substantially in 0.8, so supporting less than this is going to be quite nightmarish. In the scientific community support for HDF5 became much more integrated starting in 0.10.1. Can you elaborate on who is your target audience here? (for < say 0.10.1) |
@dengemann if you're saying you need just one function to do compatibility, that doesn't seem too bad. Again, it'd be nice if you could offer more examples. We could put together a gist with instructions for use. |
Sorry, I've been wrong two times in succession, there was a typo and a mis-read. I was referring to #835 -- this is fixed with pandas > 0.8 I think |
@jreback I witnessed this ;-) This is also my reasoning. But still many people run Debian stable or EPD 7.3 which ship pandas 0.7.3 IRRC/AFAIK.
Absolutely, this is good news. But I don't want other people to loose one day or two to find out ;-)
I'll keep you posted. Will be an ongoing issues/
Yes, this was my idea. |
@dengemann well, you should really talk to @yarikoptic to see what can be done to get newer versions of pandas into Debian stable and/or if there are any blockers. |
Our release notes are quite extensive. While this doesn't tell you what hasn't changed, it's a useful starting point. |
This was rather dense description of accumulated tiny experiences. Let me try to unwrap my experience.
Definitely, we should include a related pointer in a forthcoming doc. |
would debian take things up faster if they were backported? We use new version numbers when we make major changes to the public API. |
@jtratner good question. --> ping @yarikoptic. |
@dengemann you bring up some good points, but the very fact that At the same time as we introduce new features, and squash bugs, I personally (and I know all of the other core dev people) go to great lengths to provide backward compat and notice of changes in the API. Here's another example, @jtratner and I worked extensively making sure that prior version pickles will work in 0.13, even though Another example, @cpcloud harasses the numexpr/PyTables people about a bug in their newly put out version (2.2.1 of numexpr) that completely broke PyTables 2.4! (which they acceded and are now issuing a new version). |
@jreback thanks, this is all great and I'm with you. I'm also aware of the efforts taken to maintain and stabilize functionality while promoting development. I tnink you all have done an awesome job with this, that's beyond question. But the task is highly non-trivial too and naturally issues remain. My point was rather to start a discussion on how to add some documentstion for long term support use cases and share a few experiences. I'm happy to make a DOC PR once all points are settled. I'd especially like to generate some more concrete reports + recommendations on dealing with ix, an issuw which I did not fully grasp at this point. There will be incoming news from my side since I'm currently developing a new project for which I chose pandas as backend. |
re Debian "taking it faster": I guess there is a bit of misunderstanding. Let me describe the life cycle of pandas in Debian:
But also, besides official Debian -- as soon as I upload to Debian unstable, I upload backport builds of pandas for EVERY compatible Debian and Ubuntu release to NeuroDebian: see http://neuro.debian.net/pkgs/python-pandas.html . So technically speaking -- fresh pandas release is nearly always available only few days after for every recent Debian and Ubuntu release, while official 'stable' releases of those distributions come with the 'matured' versions. To help with migration of pandas from unstable to testing, which is as currently, usually stalled by problems with building/testing on various less common architectures, I have setup a buildbot on my sparc box for pandas: http://nipy.bic.berkeley.edu/waterfall?category=pandas . As you might see it still needs some attention and I believe there are few outstanding issues here in the tracker to address failing tests on sparc. |
fair enough |
@yarikoptic that's what I thought, thanks for making that explicit :) That said, sparc issues are definitely frustrating. |
What actually could help if there was at least some releases of pandas which would be maintained for critical bug fixes in addition to bleeding edge new functionality releases. E.g. if there was 0.12.x branch on top of 0.12.0 release which absorbed all critical applicable fixes leading to 0.12.1, etc releases. Then foreseeing upcoming Debian stable release I might have preferred to assure it having this "stable" release in favor over more featureful 0.13.0. |
I guess the problem is that maintaining a separate stable branch requires quite a bit of time (as does trying to get things to work on sparc). |
I think this is relatively handled by the new policies doc. https://github.com/pandas-dev/pandas/blob/master/doc/source/development/policies.rst If there is anything not sufficiently described in that document, a new issue can be open clarifying points. |
I've had recently made the experience of implementing and maintaining different down-stream applications supporting or building upon pandas. One thing I've learned is that it's quite painful to write code that e.g. runs for pandas 0.7.3 up to current master. Although 0.7.3 seems rather old given the rapid dev-cycle and the vibrant community, one may not forget that 0.7.3 is not older than 1.5 - 2 years and hence still counts as stable version in quite a few distros (while user-studies show that many people, let alone institutions, are often years behind recent versions ...).
Which guidelines to follow to make life easier when supporting such use cases does not seem too well documented. What would people think about adding such a -- maybe growing -- collection of hints + tips to the docs?
The text was updated successfully, but these errors were encountered: