1

The Wolfram Language has a Query function that can traverse data structures and apply functions at different levels of the structure. I am working with multi-level JSON structures and need a function that has similar functionality as that of Query in the Wolfram Language.

Which Python package and function(s) best replicates this?

For a minimal working example, say I have the following JSON structure. (String escapes omitted for simplicity)

x = {
    "Dims1":[
        {
            "Apple":{
                "Baking":[
                    "Pie",
                    "Tart"
                ],
                "Plant":"Tree",
                "Tons":{
                    "2017":1.23e1,
                    "2018":1.12e1
                }
            }
        },
        {
            "Tomato":{
                "Cooking":[
                    "Stew",
                    "Sauce"
                ],
                "Plant":"Vine",
                "Tons":{
                    "2017":8.1,
                    "2018":8.3
                }
            }
        },
        {
            "Banana":{
                "Name":"Banana",
                "Baking":[
                    "Bread"
                ],
                "Cooking":[
                    "Fried"
                ],
                "Plant":"Arborescent",
                "Tons":{
                    "2017":0.8,
                    "2018":0.5
                }
            }
        }
    ],
    "Dims2":[
        {
            "Apple":{
                "Name":"Apple",
                "Baking":[
                    "Pie",
                    "Tart"
                ],
                "Plant":"Tree",
                "Tons":{
                    "2017":1.31e1,
                    "2018":1.01e1
                }
            }
        },
        {
            "Sweet Potato":{
                "Cooking":[
                    "Fried",
                    "Steamed"
                ],
                "Baking":[
                    "Pie"
                ],
                "Plant":"Vine",
                "Tons":{
                    "2017":1.11e1,
                    "2018":1.91e1
                }
            }
        }
    ]
}

In Wolfram Language I can

a = GeneralUtilities`ToAssociations@ImportString[x, "JSON"]
<|
 "Dims1" ->
  {
   <|"Apple" ->
     <|"Baking" -> {"Pie", "Tart"}, "Plant" -> "Tree",
      "Tons" -> <|"2017" -> 12.3, "2018" -> 11.2|>|>
    |>,
   <|"Tomato" -> 
     <|"Cooking" -> {"Stew", "Sauce"}, "Plant" -> "Vine",
      "Tons" -> <|"2017" -> 8.1, "2018" -> 8.3|>|>
    |>,
   <|"Banana" ->
     <|"Name" -> "Banana", "Baking" -> {"Bread"}, 
      "Cooking" -> {"Fried"}, "Plant" -> "Arborescent",
      "Tons" -> <|"2017" -> 0.8, "2018" -> 0.5|>|>
    |>
   },
 "Dims2" ->
  {
   <|"Apple" ->
     <|"Name" -> "Apple", "Baking" -> {"Pie", "Tart"}, 
      "Plant" -> "Tree",
      "Tons" -> <|"2017" -> 13.1, "2018" -> 10.1|>|>
    |>,
   <|"Sweet Potato" ->
     <|"Cooking" -> {"Fried", "Steamed"}, "Baking" -> {"Pie"}, 
      "Plant" -> "Vine",
      "Tons" -> <|"2017" -> 11.1, "2018" -> 19.1|>|>
    |>
   }
 |>

and then with Query

Query[All, All, All, {"Baking"}]@a
<|"Dims1" -> 
   {<|"Apple" -> <|"Baking" -> {"Pie", "Tart"}|>|>, 
    <|"Tomato" -> <|"Baking" -> Missing["KeyAbsent", "Baking"]|>|>, 
    <|"Banana" -> <|"Baking" -> {"Bread"}|>|>}, 
  "Dims2" -> 
   {<|"Apple" -> <|"Baking" -> {"Pie", "Tart"}|>|>, 
    <|"Sweet Potato" -> <|"Baking" -> {"Pie"}|>|>}
|>

and include functions such as

Query[All, Join /* Flatten /* DeleteDuplicates, Values, "Baking" /* DeleteMissing]@a
<|"Dims1" -> {"Pie", "Tart", "Bread"}, "Dims2" -> {"Pie", "Tart"}|>

and

Query[All, Merge[Total] /* DateListPlot, All, "Tons", 
  KeyMap[DateObject[{FromDigits@#}, "Year"] &]]@a

enter image description here

How is this done with JSON in Python?

Edmund
  • 695
  • 5
  • 15
  • There is no such a thing in Python out of the box. – Istvan Jan 28 '19 at 14:42
  • @Istvan Is there a package that enables such functionality? Perhaps loading the JSON into a hierarchical `pandas.DataFrame`; if such a thing exists. – Edmund Jan 29 '19 at 10:28
  • @Istvan I found this post ([15306448](https://stackoverflow.com/questions/15306448)) on the [pynq](https://github.com/heynemann/pynq/wiki) package. However, this seems limited to selection only and not able to apply or map function chains at specific levels while selecting. Is there something that can work with this package to add mapping/applying function chains along the levels while selecting? – Edmund Feb 04 '19 at 16:28
  • I am not familiar anything in Python that resembles what you need. I need that Clojure Spectre does something similar but no Python. – Istvan Feb 08 '19 at 10:05
  • @Istvan So how does one navigate, operation on, and visualise hierarchical datasets and subsets with Python? – Edmund Feb 08 '19 at 11:52
  • usually you do not use complex data structures and just use simple hashmaps which are called dictionaries in Python (or use other flat data structures) In case you need nested processing power you can do many things like writing a handler for yourself or flatten out the nested structure like this https://towardsdatascience.com/how-to-flatten-deeply-nested-json-objects-in-non-recursive-elegant-python-55f96533103d – Istvan Feb 13 '19 at 21:23
  • @Istvan That is severely less than ideal. I would lose the ability to do basic hierarchical query & summary operations at different levels; not to mention the all the string parsing needed to perform the most basic queries. The structure above is a minimal example; it would be a nightmare to explore and operate on the real JSON with such a flattening. – Edmund Feb 13 '19 at 21:32

1 Answers1

0

ObjectPath is a query language for semi-structured data, including JSON, and has a Python API.

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102
  • Thanks, for your answer. This looks somewhat promising except that it appears the the `ObjectPath` is in `Global`. I have multiple hierarchical JSON sources that need to be loaded at the same time. Also, it only appears to work with the specific functions that have been coded for it. I need something as flexible as [`Query`](http://reference.wolfram.com/language/ref/Query.html) which works with any built-in/package/user defined function available. `ObjectPath` does not seem able to allow for these two requirements – Edmund Feb 04 '19 at 17:25
  • In Python, you can have multiple, independent json objects in memory at the same time. See document https://pypi.org/project/objectpath/ – Brian Spiering Feb 04 '19 at 19:18
  • Ah, the `Tree` function. I was going from the [reference page](http://objectpath.org/reference.html) where that function is not presented and it appears that everything is running from a Global `$` operator. The arbitrary function chain applied at different levels of the hierarchy requirement is all that remains. Thanks. – Edmund Feb 04 '19 at 20:43