Subscribe via Email

June 22, 2019 | 6 minutes to read

Parsing Nested JSON Records in Python

JSON is the typical format used by web services for message passing that’s also relatively human-readable. Despite being more human-readable than most alternatives, JSON objects can be quite complex. For analyzing complex JSON data in Python, there aren’t clear, general methods for extracting information (see here for a tutorial of working with JSON data in Python). This post provides a solution if one knows the path through the nested JSON to the desired information.

Motivating Example

Suppose you have the following JSON record:

{
  "employees":
    [
      {
        "name": "Alice",
        "role": "dev",
        "nbr": 1
      },
      {
        "name": "Bob",
        "role": "dev",
        "nbr": 2
      }
    ],
  "firm":
    {
      "name": "Charlie's Waffle Emporium",
      "location": "CA"
    }
}

This record has two keys at the top level: employees and firm. The value for the employees key is a list of two objects of the same schema; each object has the keys name, role, and nbr. The value for the firm key is an object with the keys name and location.

Suppose you want to extract the names of the employees. This record will give problems for approaches that just search through key names, since the name of the firm will be returned as well.

Solution

Calling the extract_element_from_json function on the above record delivers the desired result:

data = {"employees":[{"name": "Alice", "role": "dev", "nbr": 1}, {"name": "Bob", "role": "dev", "nbr": 2}], "firm":{"name": "Charlie's Waffle Emporium", "location": "CA"}}

extract_element_from_json(data, ["employees", "name"])
>> ['Alice', 'Bob']

Under the Hood

This function nests into the record(s) in obj according to the keys specified in path to retrieve the desired information. When a list is encountered as the value of a key in path, this function splits and continues nesting on each element of the encountered list in a depth-first manner. This is how both ‘Alice’ and ‘Bob’ are returned; since the value of employees is a list, the nesting is split on both of its elements and each of the values for name are appended to the output list.

If obj is a single dictionary/JSON record, then this function returns a list containing the desired information, and if obj is a list of dictionaries/JSON records, then this function returns a list of lists containing the desired information.

If any element of path is missing from the corresponding level of the nested dictionary/JSON, then this function returns a None .

Below is the full function (inspired/motivated from what’s discussed here):

def extract_element_from_json(obj, path):
    '''
    Extracts an element from a nested dictionary or
    a list of nested dictionaries along a specified path.
    If the input is a dictionary, a list is returned.
    If the input is a list of dictionary, a list of lists is returned.
    obj - list or dict - input dictionary or list of dictionaries
    path - list - list of strings that form the path to the desired element
    '''
    def extract(obj, path, ind, arr):
        '''
            Extracts an element from a nested dictionary
            along a specified path and returns a list.
            obj - dict - input dictionary
            path - list - list of strings that form the JSON path
            ind - int - starting index
            arr - list - output list
        '''
        key = path[ind]
        if ind + 1 < len(path):
            if isinstance(obj, dict):
                if key in obj.keys():
                    extract(obj.get(key), path, ind + 1, arr)
                else:
                    arr.append(None)
            elif isinstance(obj, list):
                if not obj:
                    arr.append(None)
                else:
                    for item in obj:
                        extract(item, path, ind, arr)
            else:
                arr.append(None)
        if ind + 1 == len(path):
            if isinstance(obj, list):
                if not obj:
                    arr.append(None)
                else:
                    for item in obj:
                        arr.append(item.get(key, None))
            elif isinstance(obj, dict):
                arr.append(obj.get(key, None))
            else:
                arr.append(None)
        return arr
    if isinstance(obj, dict):
        return extract(obj, path, 0, [])
    elif isinstance(obj, list):
        outer_arr = []
        for item in obj:
            outer_arr.append(extract(item, path, 0, []))
        return outer_arr

Update

This post is featured in Issue #374 of PyCoder’s Weekly.

Topics: Helpful
Written on June 22, 2019 Buy me a coffeeBuy me a coffee