Skip to content

API/BUG: support for operations with custom objects / object dtype #64107

@jorisvandenbossche

Description

@jorisvandenbossche

Consider the following dummy example of a custom Python object that supports some arithmetic operations:

class MyObject:
    def __init__(self, val):
        self.val = val
    def __add__(self, other):
        if hasattr(other, "dtype"):
            return NotImplemented
        return MyObject(self.val + other)
    def __radd__(self, other):
        if hasattr(other, "dtype"):
            return NotImplemented
        return MyObject(other + self.val)
    def __repr__(self):
        return f"<MyObject({self.val})>"

Working with such objects in pandas containers generally works, i.e. we defer to the scalar operation and assemble the results:

# operation of scalar/arr/series with numeric other
arr = np.array([MyObject(1), MyObject(2)])
ser = pd.Series([1, 2])

print(ser + arr[0])
print(ser + arr)
print(ser + pd.Series(arr))
print(arr[0] + ser)
print(arr + ser)
print(pd.Series(arr) + ser)

When the other operand were strings, this worked in pandas 2.x as well:

# operation of scalar/arr/series with string (object) other
arr = np.array([MyObject("a"), MyObject("b")])
ser = pd.Series(["1", "2"])

print(ser + arr[0])
print(ser + arr)
print(ser + pd.Series(arr))
print(arr[0] + ser)
print(arr + ser)
print(pd.Series(arr) + ser)

However, this case no longer works in pandas 3.0 with the default str dtype.

We specifically fixed this for pathlib.Path objects for the scalar case (#61940 / #62229), but then got a report for such objects in a Series (#63832). But in the end, while we can "fix" this again specifically for Path objects, this is a more general issue with any generic Python object.

I think in general pandas has always been quite flexible in supporting custom objects, and IMO we should continue to do that (yes, you can define ExtensionDtypes for full control over handling of custom objects, but that is often overkill). Once we detect something we cannot infer to a non-object dtype, we can use a slower element-wise code path. And IMO we should keep doing that also for newer dtypes (such as now str dtype).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Numeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions