Try to search your question here, if you can't find : Ask Any Question Now ?

Pandas – multiple condition lookup speed

HomeCategory: stackoverflowPandas – multiple condition lookup speed
Avatarbhawya asked 5 months ago

I’m working with some historical baseball data and trying to get matchup information (batter/pitcher) for previous games.

Example data:

import pandas as pd

data = {'ID': ['A','A','A','A','A','A','B','B','B','B','B'],
        'Year' : ['2017-05-01', '2017-06-03', '2017-08-02', '2018-05-30', '2018-07-23', '2018-09-14', '2017-06-01', '2017-08-03', '2018-05-15', '2018-07-23', '2017-05-01'],
        'ID2' : [1,2,3,2,2,1,2,2,2,1,1],
       'Score 2': [1,4,5,7,5,5,6,1,4,5,6],
       'Score 3': [1,4,5,7,5,5,6,1,4,5,6], 
       'Score 4': [1,4,5,7,5,5,6,1,4,5,6]}
df = pd.DataFrame(data)

lookup_data = {"First_Person" : ['A', 'B'],
             "Second_Person" : ['1', '2'],
             "Year" : ['2018', '2018']}

lookup_df = pd.DataFrame(lookup_data)

Lookup df has the current matchups, df has the historical data and current matchups.

I want to find, for example, for Person A against Person 2, what were the results of any of their matchups on any previous date?

I can do this with:

history_list = []
def get_history(row, df, hist_list):
    #we filter the df to matchups containing both players before the previous date and sum all events in their history
    history = df[(df['ID'] == row['First_Person']) & (df['ID2'] == row['Second_Person']) & (df['Year'] < row['Year'])].sum().iloc[3:]
    #add to a list to keep track of results
    hist_list.append(list(history.values) + [row['Year']+row['First_Person']+row['Second_Person']])

and then execute with apply like so:

lookup_df.apply(get_history, df=df, hist_list = history_list, axis=1)

Expected results would be something like:

1st P  Matchup date 2nd p   Historical scores
A      2018-07-23     2     11 11 11
B      2018-05-15     2     7  7  7

But this is pretty slow – the filtering operation takes around 50ms per lookup.

Is there a better way I can approach this problem? This currently would take over 3 hours to run across 250k historical matchups.

1 Answers
Best Answer
AvatarMatthias answered 5 months ago
Your Answer

8 + 19 =

Popular Tags

WP Facebook Auto Publish Powered By : XYZScripts.com