Monday, March 14, 2016

How to effiiciently rebuild pandas hdfstore table when append fails

Leave a Comment

I am working on using the hdfstore in pandas to data frames from an ongoing iterative process. At each iteration, I append to a table in the hdfstore. Here is a toy example:

import pandas as pd from pandas import HDFStore import numpy as np from random import choice from string import ascii_letters alphanum=np.array(list(ascii_letters)+range(0,9)) def hdfstore_append(storefile,key,df,format="t",columns=None,data_columns=None):     if df is None:         return     if key[0]!='/':         key='/'+key     with HDFStore(storefile) as store:         if key not in store.keys():             store.put(key,df,format=format,columns=columns,data_columns=data_columns)         else:             try:                 store.append(key,df)             except Exception as inst:                 df = pd.concat([store.get(key),df])                 store.put(key,df,format=format,columns=columns,                           data_columns=data_columns)  storefile="db.h5" for i in range(0,100):     df=pd.DataFrame([dict(n=np.random.randn(),                        s=''.join(alphanum[np.random.randint(1,len(alphanum),np.random.randint(1,2*(i+1))]))],index=[i])     hdfstore_append(storefile,'/SO/df',df,columns=df.columns,data_columns=True) 

The hdfstore_append function guards against the various exceptions hdfstore.append throws, and rebuilds the table when necessary. The issue with this approach is that it gets very slow when the table in the store becomes very large.

Is there a more efficient way to do this?

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment