Thursday, March 9, 2017

Set value of first item in slice in python pandas

Leave a Comment

So I would like make a slice of a dataframe and then set the value of the first item in that slice without copying the dataframe. For example:

df = pandas.DataFrame(numpy.random.rand(3,1)) df[df[0]>0][0] = 0 

The slice here is irrelevant and just for the example and will return the whole data frame again. Point being, by doing it like it is in the example you get a setting with copy warning (understandably). I have also tried slicing first and then using ILOC/IX/LOC and using ILOC twice, i.e. something like:

df.iloc[df[0]>0,:][0] = 0 df[df[0]>0,:].iloc[0] = 0 

And neither of these work. Again- I don't want to make a copy of the dataframe even if it id just the sliced version.

EDIT: It seems there are two ways, using a mask or IdxMax. The IdxMax method seems to work if your index is unique, and the mask method if not. In my case, the index is not unique which I forgot to mention in the initial post.

3 Answers

Answers 1

I think you can use idxmax for get index of first True value and then set by loc:

np.random.seed(1) df = pd.DataFrame(np.random.randint(4, size=(5,1))) print (df)    0 0  1 1  3 2  0 3  0 4  3  print ((df[0] == 0).idxmax()) 2  df.loc[(df[0] == 0).idxmax(), 0] = 100 print (df)      0 0    1 1    3 2  100 3    0 4    3 

df.loc[(df[0] == 3).idxmax(), 0] = 200 print (df)      0 0    1 1  200 2    0 3    0 4    3 

EDIT:

Solution with not unique index:

np.random.seed(1) df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4]) print (df)    0 1  1 2  3 2  0 3  0 4  3  df = df.reset_index() df.loc[(df[0] == 3).idxmax(), 0] = 200 df = df.set_index('index') df.index.name = None print (df)      0 1    1 2  200 2    0 3    0 4    3 

EDIT1:

Solution with MultiIndex:

np.random.seed(1) df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4]) print (df)    0 1  1 2  3 2  0 3  0 4  3  df.index = [np.arange(len(df.index)), df.index] print (df)      0 0 1  1 1 2  3 2 2  0 3 3  0 4 4  3  df.loc[(df[0] == 3).idxmax(), 0] = 200 df = df.reset_index(level=0, drop=True)  print (df)      0 1    1 2  200 2    0 3    0 4    3 

EDIT2:

Solution with double cumsum:

np.random.seed(1) df = pd.DataFrame([4,0,4,7,4], index=[1,2,2,3,4]) print (df)    0 1  4 2  0 2  4 3  7 4  4  mask = (df[0] == 0).cumsum().cumsum() print (mask) 1    0 2    1 2    2 3    3 4    4 Name: 0, dtype: int32  df.loc[mask == 1, 0] = 200 print (df)      0 1    4 2  200 2    4 3    7 4    4 

Answers 2

Consider the dataframe df

df = pd.DataFrame(dict(A=[1, 2, 3, 4, 5]))  print(df)     A 0  1 1  2 2  3 3  4 4  5 

Create some arbitrary slice slc

slc = df[df.A > 2]  print(slc)     A 2  3 3  4 4  5 

Access the first row of slc within df by using index[0] and loc

df.loc[slc.index[0]] = 0 print(df)     A 0  1 1  2 2  0 3  4 4  5 

Answers 3

import pandas as pd import numpy as np df = pd.DataFrame(np.random.rand(6,1),index=[1,2,2,3,3,3]) df[1] = 0 df.columns=['a','b'] df['b'][df['a']>=0.5]=1 df=df.sort(['b','a'],ascending=[0,1]) df.loc[df[df['b']==0].index.tolist()[0],'a']=0 

In this method extra copy of the dataframe is not created but an extra column is introduced which can be dropped after processing. To choose any index instead o the first one you can change the last line as follows

df.loc[df[df['b']==0].index.tolist()[n],'a']=0 

to change any nth item in a slice

df

          a   1  0.111089   2  0.255633   2  0.332682   3  0.434527   3  0.730548   3  0.844724   

df after slicing and labelling them

          a  b 1  0.111089  0 2  0.255633  0 2  0.332682  0 3  0.434527  0 3  0.730548  1 3  0.844724  1 

After changing value of first item in slice (labelled as 0) to 0

          a  b 3  0.730548  1 3  0.844724  1 1  0.000000  0 2  0.255633  0 2  0.332682  0 3  0.434527  0 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment