Skip to content

ENH: Extend to_numeric to Convert Hexadecimal, Octal, and Binary Strings with Prefixes #59207

Open
@beci

Description

@beci

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Extend the pandas.to_numeric function to support the conversion of strings representing hexadecimal, octal, and binary numbers when they start with the corresponding prefixes (0x, 0o, 0b).

s = pd.Series(["1.0", "2", -3, "0x32"])
pd.to_numeric(s)  # , errors="coerce")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File lib.pyx:2391, in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "0x32"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[39], [line 2](vscode-notebook-cell:?execution_count=39&line=2)
      [1](vscode-notebook-cell:?execution_count=39&line=1) s = pd.Series(["1.0", "2", -3, "0x32"])
----> [2](vscode-notebook-cell:?execution_count=39&line=2) pd.to_numeric(s)  # , errors="coerce")

File oSDH5rfs-py3.11\Lib\site-packages\pandas\core\tools\numeric.py:232, in to_numeric(arg, errors, downcast, dtype_backend)
    [230](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:230) coerce_numeric = errors not in ("ignore", "raise")
    [231](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:231) try:
--> [232](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:232)     values, new_mask = lib.maybe_convert_numeric(  # type: ignore[call-overload]
    [233](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:233)         values,
    [234](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:234)         set(),
    [235](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:235)         coerce_numeric=coerce_numeric,
    [236](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:236)         convert_to_masked_nullable=dtype_backend is not lib.no_default
    [237](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:237)         or isinstance(values_dtype, StringDtype)
    [238](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:238)         and not values_dtype.storage == "pyarrow_numpy",
    [239](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:239)     )
    [240](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:240) except (ValueError, TypeError):
    [241](file:///oSDH5rfs-py3.11/Lib/site-packages/pandas/core/tools/numeric.py:241)     if errors == "raise":

File lib.pyx:2433, in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "0x32" at position 3

Feature Description

pandas.to_numeric is a versatile function for converting various data types to numeric values. However, it currently does not support the direct conversion of strings representing numbers in different bases (hexadecimal, octal, and binary) that use standard prefixes. Adding this feature would enhance the function's utility and align it with the conversion capabilities found in core Python functions and PEP standards.

Modify the pandas.to_numeric function to detect strings starting with 0x (hexadecimal), 0o (octal), and 0b (binary) and convert them to their corresponding integer values.

  • Hexadecimal Strings:
    • Prefix: 0x or 0X
    • Example: 0x1A should convert to 26.
  • Octal Strings:
    • Prefix: 0o or 0O
    • Example: 0o32 should convert to 26.
  • Binary Strings:
    • Prefix: 0b or 0B
    • Example: 0b11010 should convert to 26.

Alternative Solutions

import pandas as pd

def extended_to_numeric(series):
    def convert_value(value):
        if isinstance(value, str):
            if value.startswith(('0x', '0X')):
                return int(value, 16)
            elif value.startswith(('0o', '0O')):
                return int(value, 8)
            elif value.startswith(('0b', '0B')):
                return int(value, 2)
        return pd.to_numeric(value, errors='coerce')
    
    return series.apply(convert_value)

# Example usage
data = pd.Series(['0x1A', '0o32', '0b11010', '42', 'invalid'])
numeric_data = extended_to_numeric(data)
print(numeric_data)

Additional Context

No response

Metadata

Metadata

Assignees

Labels

EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions