Skip to content

ENH: Add serialization options to to_csv #48478

Closed
@adrien-berchet

Description

@adrien-berchet

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Sometimes we store Python objects (e.g. numpy.ndarray objects) in a column of a DataFrame and we want to store it as CSV file. In this case, the array is converted to string using the __str__ method of the object, which is not the best format for later parsing. I suggest to add an option similar to the cls parameter of json.dumps which allows to encode a specific type in a custom format.

Feature Description

The user can define an encoder:

def csv_encoder(obj):
    if isinstance(obj, numpy.ndarray):
        return obj.tolist()
    else:
        return obj

Then we can pass it to the to_csv method which is applied to each element of the DF:

df.to_csv("/tmp/file.csv", element_encoder=csv_encoder)

Internally, we could just call

df.applymap(element_encoder)

just before saving the file.

Alternative Solutions

Another solution is to format manually before each call to to_csv():

df = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"], "c": [np.array(range(0, 3)), np.array(range(1, 4)), np.array(range(2, 5))]})
formatted_df = df.copy()
formatted_df["c"] = formatted_df["c"].apply(lambda x: x.tolist())
formatted_df.to_csv("/tmp/file.csv")

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO CSVread_csv, to_csvNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions