Skip to content

API: Make _libs.lib.no_default a singleton #40397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Mar 12, 2021 · 15 comments · Fixed by #40684
Closed

API: Make _libs.lib.no_default a singleton #40397

TomAugspurger opened this issue Mar 12, 2021 · 15 comments · Fixed by #40684
Labels
API Design Closing Candidate May be closeable, needs more eyeballs Enhancement Internals Related to non-user accessible pandas implementation
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 12, 2021

See #40397 (comment)


This makes it easier for users and other libraries to inspect pandas' behavior without importing private modules (xref dask/dask#7375).

I'm not sure what public namespace is best.

@jorisvandenbossche
Copy link
Member

Exposing it sounds good to me. From the existing public namespaces, pandas.api.types might make the most sense?

@jbrockmendel
Copy link
Member

Another alternative would be to adopt np._NoValue

@gjoseph92
Copy link

Turns out it's already exposed, I just didn't look carefully enough:

# Note: no_default is exported to the public API in pandas.api.extensions

So you can probably close this.

@jbrockmendel
Copy link
Member

@gjoseph92 "exposed" in this context does not include pandas._libs.lib. There is no guarantee that functions/objects in that file won't move between releases.

@gjoseph92
Copy link

gjoseph92 commented Mar 13, 2021

I was referring to the comment on the line I linked: it is actually exposed in the public API as pd.api.extensions.no_default.

@gjoseph92
Copy link

Another alternative would be to adopt np._NoValue

I think this would be beneficial to dask at least, since as a plain object, no_default isn't consistently pickleable or tokenizeable. So dask will have to use a different value (like "__no_default__") in graphs, then convert that at compute time into pd.api.extensions.no_default before passing into pandas functions.

>>> import pandas as pd
>>> import pickle
>>> pickle.loads(pickle.dumps(pd.api.extensions.no_default)) is pd.api.extensions.no_default
False
>>> import numpy as np
>>> pickle.loads(pickle.dumps(np._NoValue)) is np._NoValue
True

@TomAugspurger
Copy link
Contributor Author

Thanks for finding that. Let's repurpose this issue for the pickle issue then. Since this is a singleton it should define __reduce__ to be the string representation:

def __reduce__(self):
return "NA"
(it'd be good to add a repr to it too).

@TomAugspurger TomAugspurger changed the title API: Publicly expose pandas._libs.lib.no_default API: Make _libs.lib.no_default a singleton Mar 13, 2021
@jbrockmendel
Copy link
Member

giving it a type other than object would be helpful for mypy (which doesnt handle singletons nicely)

@gjoseph92
Copy link

Dask's approach of just using the string "__no_default__" would address the dask tokenization, pickle*, repr, and mypy issues, but I imagine there are others reasons why using a magic string is undesirable.

* as long as we checked x == no_default vs x is no_default... which would cause an error when x is pd.NA.

@jorisvandenbossche
Copy link
Member

For reference, numpy's implementation is here: https://github.com/numpy/numpy/blob/2832caab3bea702bdabe944ec21487bc93e94d59/numpy/_globals.py#L57-L91 and basically is

class _NoValueType:

    __instance = None
    def __new__(cls):
        # ensure that only one instance exists
        if not cls.__instance:
            cls.__instance = super().__new__(cls)
        return cls.__instance

    # needed for python 2 to preserve identity through a pickle
    def __reduce__(self):
        return (self.__class__, ())

    def __repr__(self):
        return "<no value>"


_NoValue = _NoValueType()

My preference would be to keep no_default as an actual object/singleton (and not make it a string).

@jorisvandenbossche
Copy link
Member

@jbrockmendel can you customize the repr of an Enum field?

I don't know directly (maybe it's easy to do), but otherwise I am slightly -1 on using an enum for this. The repr now is "<NoDefault.no_default: 'NO_DEFAULT'>" (3x is a bit too much of "no default").

Would a custom class (as discussed above) not satisfy the needs of mypy?

@jbrockmendel
Copy link
Member

can you customize the repr of an Enum field?

Yes. What would you like it to say?

For the curious: I made this an enum bc it seemed like the dominant solution in this thread.

@jorisvandenbossche
Copy link
Member

Yes. What would you like it to say?

A single time "<no_default>" instead of 3 times?

@jbrockmendel
Copy link
Member

Is this closable?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Jun 6, 2021
@simonjayhawkins
Copy link
Member

Is this closable?

did #40684 address the issue?

I'll close this since it's on the 1.3 milestone.

any subsequent changes should be a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Closing Candidate May be closeable, needs more eyeballs Enhancement Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants