Fast Replace with Bottleneck

Looking at the Bottleneck library for ideas to speed up pandas Series/DataFrame.replace, I see a set of posted benchmarks with bottleneck’s replace at roughly 4 times as fast as the implementation using numpy.putmask (and numpy.isnan to create an intermediate masking array). Just to verify, I ran my own benchmarks. A little bit of a surprise to me, but the speed up only occurs if the value to be replaced is NaN.

Here’s the setup with a 1,000,000 element floating point array with a bunch of elements set to NaN and 0 which we will replace:

import bottleneck as bn
import numpy as np
arr = np.random.randn(1e6)
arr[np.random.randint(0, 1e6-1, 4e5)] = np.nan
arr[np.random.randint(0, 1e6-1, 4e5)] = 0

Look for NaN and replace:

a = arr.copy()
%timeit np.putmask(a, np.isnan(a), -1)
>> 100 loops, best of 3: 3.31 ms per loop

a = arr.copy()
%timeit bn.replace(a, np.nan, -1)
>> 1000 loops, best of 3: 783 us per loop

Look for other values and replace:

a = arr.copy()
%timeit np.putmask(a, a == 0, -2)
>>  1000 loops, best of 3: 1.62 ms per loop

a = arr.copy()
%timeit bn.replace(a, 0, -2)
>>  100 loops, best of 3: 2.47 ms per loop

I thought maybe it was numpy.isnan that’s the difference, but that’s not the case:

a = arr.copy()
mask = np.isnan(a)
%timeit np.putmask(a, mask, -1)
>>  100 loops, best of 3: 2.29 ms per loop

Have to look deeper at the Bottleneck/Numpy code but as it stands, bottleneck replace useful for fillna, but probably not for replacing arbitrary values.

Advertisements

About Chang She

Engineer @ Cloudera. Ex-cofounder/CTO @ DataPad. Builder of data tools. Recovering financial quant.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s