LaVOZs

The World’s Largest Online Community for Developers

'; indexing - Why does pandas change the index value in this example? - LavOzs.Com

First we create a raw dataset with MultiIndex-

In [166]: import numpy as np; import pandas as pd 

In [167]: data_raw = pd.DataFrame([ 
     ...: {'frame': 1, 'face': np.NaN, 'lmark': np.NaN, 'x': np.NaN, 'y': np.NaN}, 
     ...: {'frame': 197, 'face': 0, 'lmark': 1, 'x': 969, 'y': 737}, 
     ...: {'frame': 197, 'face': 0, 'lmark': 2, 'x': 969, 'y': 740}, 
     ...: {'frame': 197, 'face': 0, 'lmark': 3, 'x': 970, 'y': 744}, 
     ...: {'frame': 197, 'face': 0, 'lmark': 4, 'x': 972, 'y': 748}, 
     ...: {'frame': 197, 'face': 0, 'lmark': 5, 'x': 973, 'y': 752}, 
     ...: {'frame': 300, 'face': 0, 'lmark': 1, 'x': 745, 'y': 367},  
     ...: {'frame': 300, 'face': 0, 'lmark': 2, 'x': 753, 'y': 411},  
     ...: {'frame': 300, 'face': 0, 'lmark': 3, 'x': 759, 'y': 455}, 
     ...: {'frame': 301, 'face': 0, 'lmark': 1, 'x': 741, 'y': 364},   
     ...: {'frame': 301, 'face': 0, 'lmark': 2, 'x': 746, 'y': 408},   
     ...: {'frame': 301, 'face': 0, 'lmark': 3, 'x': 750, 'y': 452}]).set_index(['frame', 'face', 'lmark'])

Next we calculate the z-scores for each lmark -

In [168]: ((data_raw - data_raw.mean(level='lmark')).abs()) / data_raw.std(level='lmark')            
Out[168]: 
                         x         y
frame face lmark                    
1     NaN  NaN         NaN       NaN
197   0.0  1.0    1.154565  1.154672
           2.0    1.154260  1.154665
           3.0    1.153946  1.154654
           4.0         NaN       NaN
           5.0         NaN       NaN
300   0.0  1.0    0.561956  0.570343
           2.0    0.549523  0.569472
           3.0    0.540829  0.568384
301   0.0  1.0    0.592609  0.584329
           2.0    0.604738  0.585193
           3.0    0.613117  0.586270

The index values don't change, as expected. Now we filter out records where lmark > 3 -

In [170]: data_filtered = data_raw.loc[(slice(None), slice(None), [np.NaN, slice(3)]),:]

In [171]: data_filtered                                                                          
Out[171]: 
                      x      y
frame face lmark              
1     NaN  NaN      NaN    NaN
197   0.0  1.0    969.0  737.0
           2.0    969.0  740.0
           3.0    970.0  744.0
300   0.0  1.0    745.0  367.0
           2.0    753.0  411.0
           3.0    759.0  455.0
301   0.0  1.0    741.0  364.0
           2.0    746.0  408.0
           3.0    750.0  452.0

and recalculate the z-scores -

In [172]: ((data_filtered - data_filtered.mean(level='lmark')).abs()) / data_filtered.std(level='lmark')                                                                                       
Out[172]: 
                         x         y
frame face lmark                    
1     NaN  1.0         NaN       NaN
197   0.0  1.0    1.154565  1.154672
           2.0    1.154260  1.154665
           3.0    1.153946  1.154654
300   0.0  1.0    0.561956  0.570343
           2.0    0.549523  0.569472
           3.0    0.540829  0.568384
301   0.0  1.0    0.592609  0.584329
           2.0    0.604738  0.585193
           3.0    0.613117  0.586270

Why has the value of the first record's lmark index changed from NaN to 1.0?

I think it seems bug.

Solution is use MultiIndex.remove_unused_levels:

data_filtered.index = data_filtered.index.remove_unused_levels()
a = ((data_filtered - data_filtered.mean(level='lmark')).abs()) / data_filtered.std(level='lmark')
print (a)
                         x         y
frame face lmark                    
1     NaN  NaN         NaN       NaN
197   0.0  1.0    1.154565  1.154672
           2.0    1.154260  1.154665
           3.0    1.153946  1.154654
300   0.0  1.0    0.561956  0.570343
           2.0    0.549523  0.569472
           3.0    0.540829  0.568384
301   0.0  1.0    0.592609  0.584329
           2.0    0.604738  0.585193
           3.0    0.613117  0.586270
Related
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
Set value for particular cell in pandas DataFrame using index
Change data type of columns in Pandas
Selecting a row of pandas series/dataframe by integer index
Deleting DataFrame row in Pandas based on column value
How to convert index of a pandas dataframe into a column?
How to avoid Python/Pandas creating an index in a saved csv?
Python Pandas: Get index of rows which column matches certain value