1

In pandas, if I use series.apply() to apply a function with an inner function definition, for example:

def square_times_two(x):
  def square(y):
    return y ** 2
  return square(x) * 2

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(data)

df["col_3"] = df.col_1.apply(square_times_two)

is the inner function redefined for each row? Would there be a performance impact to having many inner functions in a function applied to a large series?

Ethan
  • 1,625
  • 8
  • 23
  • 39
Alex
  • 13
  • 3
  • This seems better suited to StackOverflow. But you're also almost to a timing script: just generate a much larger frame, probably using `numpy.random`, and use `timeit` or something similar. – Ben Reiniger Aug 29 '22 at 13:14

1 Answers1

0

The function will only be compiled once, but there may be a small overhead. This should be neglegible though, since the inner function does not use vars from the outer one.

Yet, for the same reason, there does not seem to be the necessity to define the inner function there, right? You could just move it to the same level as the outer one.

def square(y):
  return y ** 2

def square_times_two(x):
  return square(x) * 2

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(data)

df["col_3"] = df.col_1.apply(square_times_two)
```
buddemat
  • 138
  • 7
  • I suspected as much, i have a coworker who claims to do it for scoping purposes, but I feel like a small bit of overhead multiplied over 16-18 million records becomes a noticeable difference. My instinct would have been to do as you suggested – Alex Sep 03 '22 at 20:02