1

I have a pandas dataframe as follows, I want to convert it to a dictionary format with 2 keys as shown:

    id              name                        energy             fibre    
0   11005   4-Grain Flakes                          1404            11.5    
1   35146   4-Grain Flakes, Gluten Free             1569             6.1    
2   32570   4-Grain Flakes, Riihikosken Vehnämylly  1443            11.2     

I am expecting the result to be of

 nutritionValues = {
  ('4-Grain Flakes', 'id'): 11005,
  ('4-Grain Flakes', 'energy'): 1404,
  ('4-Grain Flakes', 'fibre'):  11.5,
  ('4-Grain Flakes, Gluten Free', 'id'): 11005,
  ('4-Grain Flakes, Gluten Free', 'energy'): 1569,
  ('4-Grain Flakes, Gluten Free', 'fibre'):  6.1,
  ('4-Grain Flakes, Riihikosken Vehnämylly', 'id'): 32570,
  ('4-Grain Flakes, Riihikosken Vehnämylly', 'energy'): 1443,
  ('4-Grain Flakes, Riihikosken Vehnämylly', 'fibre'):  11.2}

foods, fiber = multidict({
  '4-Grain Flakes': 11.5,
  '4-Grain Flakes, Gluten Free':   6.1,
  '4-Grain Flakes, Riihikosken Vehnämylly':   11.2})

How can I achieve this?

n1k31t4
  • 14,663
  • 2
  • 28
  • 49
KHAN irfan
  • 411
  • 1
  • 7
  • 16
  • Have you tried the DataFrame's `to_dict` method? – bradS Apr 18 '19 at 11:10
  • yes, but I couldnot achieve the multi-key dictionary I require as mentioned above. `df = df.set_index(["item1", "item2"]) # Columns for dict keys df_dict = df.to_dict("index") # Turn into dict ` It gives error – KHAN irfan Apr 18 '19 at 11:12
  • @bradS can you create the dictionary? – KHAN irfan Apr 18 '19 at 12:10
  • Does this help: https://stackoverflow.com/questions/52192177/convert-pandas-dataframe-to-dictionary-with-multiple-keys ? – bradS Apr 18 '19 at 13:48
  • 1
    Why do you want to do this? There might be a nicer data structure to use. The original dataframe is already very useful as it is, in my opinion. If you want to check combination of `name` and other variables, you could e.g. use the [`DataFrame.groupby()`](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) method to then do something for each value of `name`. – n1k31t4 Apr 19 '19 at 00:12
  • @n1k31t4 I want to use gurobi optimization as in https://www.gurobi.com/documentation/8.1/examples/diet_py.html I want to convert into dcitionary format – KHAN irfan Apr 19 '19 at 01:22
  • @KHANirfan - ok, I understand. Please see my answer for a way to do it :) – n1k31t4 Apr 19 '19 at 14:50
  • In general this is unnecessary, probably a bad idea, and Gurobi doesn't strictly need it anyway: `nutritionValues = { ('hamburger', 'calories'): 410, ...`. Why not simply use a nested defaultdict-of-dict, that's what people generally do? But yeah you could use pandas multi-index. Whatever data structure you create may be a pain to use/export/import/pickle outside pandas. – smci Dec 09 '19 at 01:26

1 Answers1

1

In order to be able to create a dictionary from your dataframe, such that the keys are tuples of combinations (according to your example output), my idea would be to use a Pandas MultiIndex. This will then generate a dictionary of the form you want.

First I just recreate your example dataframe (would be nice if you provide this code in the future!):

import pandas as pd

# Create the example dataframe
df = pd.DataFrame(["4-Grain Flakes", "4-Grain Flakes, Gluten Free", "4-Grain Flakes, Riihikosken Vehnämylly"])
df["id"] = [11005, 35146, 32570]
df["energy"] = [1404, 1569, 1443]
df["fibre"] = [11.5, 6.1, 11.2]
df.columns = ["name"] + list(df.columns[1:])

print(df)
                                     name     id  energy  fibre
0                          4-Grain Flakes  11005    1404   11.5
1             4-Grain Flakes, Gluten Free  35146    1569    6.1
2  4-Grain Flakes, Riihikosken Vehnämylly  32570    1443   11.2

Now we can create the combinations of each value in "name" with each of the other column names. I will use lists, within a list comprehension, where I bundle up the values together into tuples. We end with a list of tuples:

names = df.name.tolist()
others = list(df.columns)
others.remove("name")         # We don't want "name" to be included

index_tuples = [(name, other) for name in names for other in others]

We can create the MultiIndex from this list of tuples as follows:

multi_ix = pd.MultiIndex.from_tuples(index_tuples)

Now we can create a new dataframe using out multi_ix. To populate this dataframe, notice that we simple need to row-wise values from columns ["id", "energy", "fibre"]. We can do this easily by extracting as an n * 3 NumPy array (using the values attribute of the dataframe) and then flattening the matrix, using NumPy's ravel method:

df1 = pd.DataFrame(df[others].values.ravel(), index=multi_ix, columns=["data"])

print(df1)

                                                  data
4-Grain Flakes                         id      11005.0
                                       energy   1404.0
                                       fibre      11.5
4-Grain Flakes, Gluten Free            id      35146.0
                                       energy   1569.0
                                       fibre       6.1
4-Grain Flakes, Riihikosken Vehnämylly id      32570.0
                                       energy   1443.0
                                       fibre      11.2

Now we can simply use to to_dict() method of the datframe to create the dictionary you are looking for:

nutritionValues = df1.to_dict()["data"]

print(nutritionValues)

{('4-Grain Flakes', 'energy'): 1404.0,
 ('4-Grain Flakes', 'fibre'): 11.5,
 ('4-Grain Flakes', 'id'): 11005.0,
 ('4-Grain Flakes, Gluten Free', 'energy'): 1569.0,
 ('4-Grain Flakes, Gluten Free', 'fibre'): 6.1,
 ('4-Grain Flakes, Gluten Free', 'id'): 35146.0,
 ('4-Grain Flakes, Riihikosken Vehnämylly', 'energy'): 1443.0,
 ('4-Grain Flakes, Riihikosken Vehnämylly', 'fibre'): 11.2,
 ('4-Grain Flakes, Riihikosken Vehnämylly', 'id'): 32570.0}

It is also possible to get your final example of a multidict, directly from the multi-indexed dataframe. You need to just use multi-index slicing:

fibre_df = final_df.loc[(slice(None), ["fibre"]), :]
print(fibre_df)

                                                 0
4-Grain Flakes                         fibre  11.5
4-Grain Flakes, Gluten Free            fibre   6.1
4-Grain Flakes, Riihikosken Vehnämylly fibre  11.2

You can then generate a dictionary as before:

d = final_df.loc[(slice(None), ["fibre"]), :].to_dict()[0]
print(d)

{('4-Grain Flakes', 'fibre'): 11.5,
 ('4-Grain Flakes, Gluten Free', 'fibre'): 6.1,
 ('4-Grain Flakes, Riihikosken Vehnämylly', 'fibre'): 11.2}

And you can drop the "fibre" value from the tuple-keys with a simple dictionary comprehension:

final_dict = {k[0]: v for k, v in d.items()}
print(final_dict)

{'4-Grain Flakes': 11.5,
 '4-Grain Flakes, Gluten Free': 6.1,
 '4-Grain Flakes, Riihikosken Vehnämylly': 11.2}
n1k31t4
  • 14,663
  • 2
  • 28
  • 49