Over 90% of Hierarchy Data Problems Fall Into These Categories

Posted by Tony Brownlee on Sep 8, 2016

You're at a party, striking up a conversation with your friends and colleagues, and what do you talk about?  Sports. Politics. Business. Hierarchy data?  While hierarchy data may not always be the first topic discussed, I've been to a few events with chief data officers where it does come up.  If it comes up at your next cocktail party, I want you to be ready to contribute to the conversation. And if I’m in attendance, I’ll join you in the conversation.

Joking aside, for data professionals, hierarchy data is growing in importance.  Sometimes referred to as relationship data, family tree data, legal or corporate hierarchy, this data topics is about the relationships between legal entities that indicate ownership, control, or influence of one entity over another.  Hierarchy data problems fall into four categories

My passion for hierarchy data started in the 2003 time-frame solving global hierarchy data problems related to issuers of securities across 140 countries for public accounting firms.  As 2008 rolled around and issues in the financial markets hit, many banking and capital markets institutions and insurance companies started to realize the importance of hierarchy data for risk purposes.  Then, as regulations emerged, relationship data became a must have for regulatory reporting, risk aggregation, capital adequacy, and many other use cases.  Now, we're seeing many global companies look at the importance of hierarchies for understanding supplier business relationships, analyzing revenue and pricing strategies, and assessing cross-border client relationships.  

In the utility space, particularly with the Legal Entity Identifier (LEI), we'll soon see level 2 relationship data (AKA hierarchies) show up as early as 2017 from local operating units such as DTCC and others.  As this data in the global LEI system grows, firms will have more access to hierarchy data around the world, but still have the age old question of how to use it within legacy systems and processes.  

Two things are common throughout the industry when it comes to hierarchies - the data is usually filled with inaccuracies and hierarchies can be very expensive to manage.  From our years of experience with hundreds of millions of records, we've seen it all.  In our assessment a majority of the issues related to hierarchy data - more than 90% - fall into the following four categories.  Read through them and be ready to join the conversation at your next social gathering. (You're welcome)

  1. "Broken Trees":  This involves entities that are either missing immediate or ultimate parents, were set-up quickly without any parent information, or haven’t been maintained appropriately with the correct parent information. Oftentimes called stand-alone or orphaned entities, these records are easy for users to set up, but difficult to identify. This category of hierarchy data is the most common.
  2. "Inconsistent Rules":  Everyone wants a golden copy or single version of the truth, but that's easier said than done when it comes to relationship data.  It's easy to create an immediate parent field, but over years and years of data accumulation, we see a significant amount of inconsistencies in the rules used to populate these fields. The biggest hitter here is often fund or special purpose vehicle related entities, but we also see problems when it comes to less than 50% ownership and what to do when an entity has multiple owners, but only one field for the data.  Additionally, it's common to see a need for a sales hierarchy vs. a risk hierarchy, which leads to inconsistencies in the data as enterprises try to move toward those golden copy data strategies.  
  3. "Ultimate Parents":  It should be pretty easy to roll up entities to an ultimate parent, but challenges start to emerge as the data changes. As duplicate entities enter a system and are de-duped, the surviving ultimate parent relationships are often not analyzed as part of the de-duplication process. Additionally, as mergers or acquisitions occur, if the ultimate parent field is stored independently from a full hierarchy lineage (very common in legacy systems), the result of merging two or more larger hierarchies together often leads to many ultimate parent contradictions or inconsistencies. The result?  The ultimate parents, seemingly the easy entity to get right, simply don't make sense in the context of the whole universe of legal entity data.   
  4. "Circular Relationships":  The least common but most impactful of this list, a circular relationship exists quite simply when entity A is listed as the parent of entity B and entity B the parent of entity A. While this should be easy to identify, it becomes more complicated as multiple levels of relationships are inserted between two entities and as entities are named similarly or as duplication exists. Controls can help these situations, but when considering the other categories of issues listed above, these circular relationships become a bit more challenging than you might expect.  

So what can you do? The simple answer points to the basics of any good data governance program. You need to assess the situation and quantify where you're at. Then you need consistent rules, consistent ownership, and technical controls to guide your way forward. And last but not least, you need data maintenance to ensure that the hierarchy data is kept up to date. It seems simple, but the inherent complexity of hierarchy data makes this one of the top challenges for the coming years.  

With the holiday season just around the corner, I hope you're ready to engage in those interesting conversations and show off your data know-how.  And if you need a good hierarchy data joke...give me a call. Let's talk.  

 

Topics: Legal Entity Data, Data Quality, Data Governance, Big Data, Hierarchy data