Consider the following dummy example of a custom Python object that supports some arithmetic operations:
class MyObject:
def __init__(self, val):
self.val = val
def __add__(self, other):
if hasattr(other, "dtype"):
return NotImplemented
return MyObject(self.val + other)
def __radd__(self, other):
if hasattr(other, "dtype"):
return NotImplemented
return MyObject(other + self.val)
def __repr__(self):
return f"<MyObject({self.val})>"
Working with such objects in pandas containers generally works, i.e. we defer to the scalar operation and assemble the results:
# operation of scalar/arr/series with numeric other
arr = np.array([MyObject(1), MyObject(2)])
ser = pd.Series([1, 2])
print(ser + arr[0])
print(ser + arr)
print(ser + pd.Series(arr))
print(arr[0] + ser)
print(arr + ser)
print(pd.Series(arr) + ser)
When the other operand were strings, this worked in pandas 2.x as well:
# operation of scalar/arr/series with string (object) other
arr = np.array([MyObject("a"), MyObject("b")])
ser = pd.Series(["1", "2"])
print(ser + arr[0])
print(ser + arr)
print(ser + pd.Series(arr))
print(arr[0] + ser)
print(arr + ser)
print(pd.Series(arr) + ser)
However, this case no longer works in pandas 3.0 with the default str dtype.
We specifically fixed this for pathlib.Path objects for the scalar case (#61940 / #62229), but then got a report for such objects in a Series (#63832). But in the end, while we can "fix" this again specifically for Path objects, this is a more general issue with any generic Python object.
I think in general pandas has always been quite flexible in supporting custom objects, and IMO we should continue to do that (yes, you can define ExtensionDtypes for full control over handling of custom objects, but that is often overkill). Once we detect something we cannot infer to a non-object dtype, we can use a slower element-wise code path. And IMO we should keep doing that also for newer dtypes (such as now str dtype).
Consider the following dummy example of a custom Python object that supports some arithmetic operations:
Working with such objects in pandas containers generally works, i.e. we defer to the scalar operation and assemble the results:
When the other operand were strings, this worked in pandas 2.x as well:
However, this case no longer works in pandas 3.0 with the default
strdtype.We specifically fixed this for
pathlib.Pathobjects for the scalar case (#61940 / #62229), but then got a report for such objects in a Series (#63832). But in the end, while we can "fix" this again specifically for Path objects, this is a more general issue with any generic Python object.I think in general pandas has always been quite flexible in supporting custom objects, and IMO we should continue to do that (yes, you can define ExtensionDtypes for full control over handling of custom objects, but that is often overkill). Once we detect something we cannot infer to a non-object dtype, we can use a slower element-wise code path. And IMO we should keep doing that also for newer dtypes (such as now
strdtype).