12

I have a pandas dataframe df that looks like this

name    value1     value2
A       123         1
B       345         5
C       712         4
B       768         2
A       318         9
C       178         6
A       321         3

I want to convert this into a dictionary with name as a key and list of dictionaries (value1 key and value2 value) for all values that are in name

So, the output would look like this

{
 'A': [{'123':1}, {'318':9}, {'321':3}],
 'B': [{'345':5}, {'768':2}],
 'C': [{'712':4}, {'178':6}]
}

So, far I have managed to get a dictionary with name as key and list of only one of the values as a list by doing

df.set_index('name').transpose().to_dict(orient='list')

How do I get my desired output? Is there a way to aggregate all the values for the same name column and get them in the form I want?

sfactor
  • 223
  • 1
  • 2
  • 6
  • Any chance you could copy and paste the code which you used to create `df`? – ignoring_gravity May 29 '18 at 16:20
  • @Lupacante The dict is created from a file. so this is as it is read from it. The data needs to be in a format like that for some further processing which is sadly on in my control. – sfactor May 29 '18 at 16:24

4 Answers4

10

Does this do what you want it to?

from pandas import DataFrame

df = DataFrame([['A', 123, 1], ['B', 345, 5], ['C', 712, 4], ['B', 768, 2], ['A', 318, 9], ['C', 178, 6], ['A', 321, 3]], columns=['name', 'value1', 'value2'])

d = {}
for i in df['name'].unique():
    d[i] = [{df['value1'][j]: df['value2'][j]} for j in df[df['name']==i].index]

This returns

  Out[89]: 
{'A': [{123: 1}, {318: 9}, {321: 3}],
 'B': [{345: 5}, {768: 2}],
 'C': [{712: 4}, {178: 6}]}
ignoring_gravity
  • 763
  • 4
  • 14
4

The to_dict() method sets the column names as dictionary keys so you'll need to reshape your DataFrame slightly. Setting the 'ID' column as the index and then transposing the DataFrame is one way to achieve this.

The same can be done with the following line:

>>> df.set_index('ID').T.to_dict('list')
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}

Better to use the groupby,

df.groupby('name')[['value1','value2']].apply(lambda g: g.values.tolist()).to_dict()
Aditya
  • 2,440
  • 2
  • 15
  • 34
3
df.groupby('name')[['value1','value2']].apply(lambda g: g.values.tolist()).to_dict()

if you need a list of tuples explicitly:

df.groupby('name')[['value1','value2']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
Stephen Rauch
  • 1,783
  • 11
  • 21
  • 34
nemo
  • 31
  • 1
0

Building on @nemo's answer (above) which will be faster than the accepted solution, this will give the same output that you want:

def formatRecords(g):
    keys = ['value1', 'value2']
    result = []
    for item in g.values.tolist():
        item = dict(zip(keys, item))
        result.append(item)
    return result

df_dict = df.groupby('name').apply(lambda g: formatRecords(g)).to_dict()
cyram
  • 101
  • 1