Skip to content

ENH: Extend to_numeric to Convert Hexadecimal, Octal, and Binary Strings with Prefixes #59207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
beci opened this issue Jul 8, 2024 · 1 comment
Open
1 of 3 tasks
Assignees
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@beci
Copy link

beci commented Jul 8, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Extend the pandas.to_numeric function to support the conversion of strings representing hexadecimal, octal, and binary numbers when they start with the corresponding prefixes (0x, 0o, 0b).

s = pd.Series(["1.0", "2", -3, "0x32"])
pd.to_numeric(s)  # , errors="coerce")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File lib.pyx:2391, in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "0x32"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[39], [line 2](vscode-notebook-cell:?execution_count=39&line=2)
      [1](vscode-notebook-cell:?execution_count=39&line=1) s = pd.Series(["1.0", "2", -3, "0x32"])
----> [2](vscode-notebook-cell:?execution_count=39&line=2) pd.to_numeric(s)  # , errors="coerce")

File oSDH5rfs-py3.11\Lib\site-packages\pandas\core\tools\numeric.py:232, in to_numeric(arg, errors, downcast, dtype_backend)
    [230](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:230) coerce_numeric = errors not in ("ignore", "raise")
    [231](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:231) try:
--> [232](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:232)     values, new_mask = lib.maybe_convert_numeric(  # type: ignore[call-overload]
    [233](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:233)         values,
    [234](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:234)         set(),
    [235](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:235)         coerce_numeric=coerce_numeric,
    [236](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:236)         convert_to_masked_nullable=dtype_backend is not lib.no_default
    [237](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:237)         or isinstance(values_dtype, StringDtype)
    [238](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:238)         and not values_dtype.storage == "pyarrow_numpy",
    [239](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:239)     )
    [240](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:240) except (ValueError, TypeError):
    [241](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:241)     if errors == "raise":

File lib.pyx:2433, in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "0x32" at position 3

Feature Description

pandas.to_numeric is a versatile function for converting various data types to numeric values. However, it currently does not support the direct conversion of strings representing numbers in different bases (hexadecimal, octal, and binary) that use standard prefixes. Adding this feature would enhance the function's utility and align it with the conversion capabilities found in core Python functions and PEP standards.

Modify the pandas.to_numeric function to detect strings starting with 0x (hexadecimal), 0o (octal), and 0b (binary) and convert them to their corresponding integer values.

  • Hexadecimal Strings:
    • Prefix: 0x or 0X
    • Example: 0x1A should convert to 26.
  • Octal Strings:
    • Prefix: 0o or 0O
    • Example: 0o32 should convert to 26.
  • Binary Strings:
    • Prefix: 0b or 0B
    • Example: 0b11010 should convert to 26.

Alternative Solutions

import pandas as pd

def extended_to_numeric(series):
    def convert_value(value):
        if isinstance(value, str):
            if value.startswith(('0x', '0X')):
                return int(value, 16)
            elif value.startswith(('0o', '0O')):
                return int(value, 8)
            elif value.startswith(('0b', '0B')):
                return int(value, 2)
        return pd.to_numeric(value, errors='coerce')
    
    return series.apply(convert_value)

# Example usage
data = pd.Series(['0x1A', '0o32', '0b11010', '42', 'invalid'])
numeric_data = extended_to_numeric(data)
print(numeric_data)

Additional Context

No response

@beci beci added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 8, 2024
@AnaDenisa
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
2 participants