LaVOZs

The World’s Largest Online Community for Developers

'; python - Create pandas MultiIndex Dataframe without NaNs - LavOzs.Com

I feel like I'm missing something fundamentally here. I have a Pandas DataFrame like this:

df = pd.DataFrame(list(range(3)).T
df.columns = ['a.first', 'a.second', 'b']

#    a.first  a.second  b
# 0        0         1  2

What I would like to create is a MultiIndex DataFrame where I can use df.a, df.a.first and df.b. What I got so far is the str split method:

a.columns = a.columns.str.split('.', expand=True)
#        a            b
#    first  second  NaN
# 0      0       1    2

So obviously the NaN is a problem here, because to access value b, one would need to call df.b[np.nan], which feels obviously wrong.

Starting from here, all the solutions that come to my mind start feeling like workaround where I iterate over the columns and try to replace the NaNs with empty strings. I imagine that there must be a much more straightforward way, as I guess that this is a pretty common problem, no?

Edit: The least ugly solution that came to mind so far is the following:

def apply_multiindex(df, hier_sep='.'):
    depths = df.columns.str.split(hier_sep).map(len)
    add_hiers = max(depths)-depths
    df.columns = [column + hier_sep*add_hier[c]
                  for c, column in enumerate(df.columns)]
    df.columns = df.columns.str.split(hier_sep, expand=True)

apply_multiindex(a)
#        a          b
#    first  second  
# 0      0       1  2

I'm still looking forward to a more cleaner solution :)

For me working rename with missing value, because fillna for MultiIndex is not implemented:

df = pd.DataFrame([list(range(3))], columns = ['a.first', 'a.second', 'b'])
df.columns = df.columns.str.split('.', expand=True)

df = df.rename(columns = {np.nan:''})
print (df)
      a         b
  first second   
0     0      1  2
Related
How do I check whether a file exists without exceptions?
How can I safely create a nested directory?
Selecting multiple columns in a pandas dataframe
Renaming columns in pandas
Adding new column to existing DataFrame in Python pandas
Delete column from pandas DataFrame
Creating an empty Pandas DataFrame, then filling it?
How to iterate over rows in a DataFrame in Pandas?
Select rows from a DataFrame based on values in a column in pandas
Get list from pandas DataFrame column headers