D3 Hierarchical Data

Intro

In our Sunburst Chart post we learned how to put together a Sunburst chart, which is a type of chart that displays hierarchical data.

These types of charts are wonderful, but it takes some massaging to get our data into the proper format for these charts to function.

Let’s explore some methods here of getting data into the proper hierarchical format.

Target Format

Our goal is to get our data into a hierchical format as shown in this sample, which is taken from our sample data in the Sunburst chart post.

const data = {
      name: "flare",
      children: [
        {
          name: "analytics",
          children: [
            {
              name: "cluster",
              children: [
                { name: "AgglomerativeCluster", size: 3938 },
                { name: "CommunityStructure", size: 3812 },
                { name: "HierarchicalCluster", size: 6714 },
                { name: "MergeEdge", size: 743 },
              ],
            },
            {
              name: "graph",
              children: [
                { name: "BetweennessCentrality", size: 3534 },
                { name: "LinkDistance", size: 5731 },
                { name: "MaxFlowMinCut", size: 7840 },
                { name: "ShortestPaths", size: 5914 },
                { name: "SpanningTree", size: 3416 },
              ],
            },
            {
              name: "optimization",
              children: [{ name: "AspectRatioBanker", size: 7074 }],
            },
          ],
        },
        // ...
      ]
}

Raw Data

Our raw data has the following format. We have three separate arrays full of objects. The objects reference each other as parents and children with id numbers. Here are simplified examples. There is currently no quantitative data in the layers.

layer1
[
    {
        family: 111,
        name: "mammals"
        },
    {
        family: 222,
        name: "fish"
    },
    {
        family: 333,
        name: "birds"
    }    
]
layer2
[
    {
        family: 111,
        species: 1111,
        name: "dogs"
    },
    {
        family: 111,
        species: 1112,
        name: "cats"
    },
    {
        family: 222,
        species: 1113,
        name: "shark"
    },
    {
        family: 222,
        species: 1114,
        name: "tuna"
    },
    {
        family: 333,
        species: 1115,
        name: "eagle"
    },
]
layer3
[
    {
      family: 111,
      species: 1111,
      breed: 11111,
      name: "labrador",
    },
    {
      family: 111,
      species: 1111,
      breed: 11112,
      name: "pitbull",
    },
    {
      family: 111,
      species: 1112,
      breed: 11113,
      name: "tabby",
    },
    {
      family: 111,
      species: 1112,
      breed: 11114,
      name: "jaguar",
    },
    {
      family: 111,
      species: 1113,
      breed: 11115,
      name: "tiger",
    },
    {
      family: 222,
      species: 1113,
      breed: 11116,
      name: "great white",
    },
    {
      family: 222,
      species: 1114,
      breed: 11117,
      name: "blue fin",
    },
    {
      family: 222,
      species: 1114,
      breed: 11118,
      name: "yellow fin",
    },
    {
      family: 333,
      species: 1115,
      breed: 11119,
      name: "bald",
    },
]

d3.stratify()

The key to accomplishing this task is the stratify method provided by D3.

📘 D3: Stratify

And based on the documentation for that method we can see that we will want to combine all of our data into one array and then run the stratify method on it. However we need to be able to identify which key is the id and parentId for each dataset.

However we can’t just specify that the species key is the parentId for all datasets, because the species key is only the parentId for the third dataset. In the second dataset the parentId is the family key.

We can solve this issue by running a mapping function on each dataset to identify the id and parentId respectively for each set.

Mapping Functions

Let’s create a set of new arrays using the map method to identify the id and parentId for each dataset.

// data must be mapped to identify node id and parent id for stratification
const layer1Map = layer1.map(({ family, name }) => ({
  parentId: null,
  id: family,
  name,
}));

const layer2Map = layer2.map(({ family, species, name }) => ({
  parentId: family,
  id: species,
  name,
}));

const layer3Map = layer3.map(({ family, species, breed, name }) => ({
  parentId: species,
  id: breed,
  name,
}));

And if we then combine those arrays into one and run it through the stratification method we will hit a problem.

const maps = [...layer1Map, ...layer2Map, ...layer3Map]

d3.stratification(maps)

// error: multiple roots

We have created a structure that has multiple roots!

Creating a Base Node

In D3 a hierarchy structure can only have one root node. In this case we started with three (mammals, fish and birds).

We need to go ahead and create a base node from scratch.

const layer0 = [
  {
    parentId: null,
    id: "base",
    name: "animals"
  }
]

and then update our layer1 map to identify “base” as the parentId for each layer1 node.

const layer1Map = layer1.map(({ family, name }) => ({
  parentId: "base",
  id: family,
  name,
}));

Stratify and Hierarchy

Now we have a proper dataset that we can run through stratification.

const maps = [
  ...layer0,
  ...layer1,
  ...layer2,
  ...layer3,
];

const stratify = d3.stratify();
const stratified = stratify(maps);

Once it has been run through stratification it is already in hierarchical format, we do not need to also run d3.hierarchy on it also.

Handling Orphan Nodes

It is possible that your dataset will contain orphan nodes. Nodes that do not have a parentId defined. In that case you can create an orphanage node and run a loop on your maps to identify orphan nodes and set their parentId to the orphanage node.

const orphanageNode = {
  id: "orphanage",
  parentId: "base",
  name: "orphanage",
};
// add to layer1 array
layer1.push(orphanageNode);
layer2.forEach((node) => {
  if (!node.parentId) {
    node.parentId = "orphanage";
  }
});