Not the snappiest title ever, but any reference to the Princess Bride has to be cut a certain amount of slack! As must I, based on the previous blog post being published some eighteen months ago. In my defence, the last couple of years have been crazy busy which has also generated a whole load of content I’m keen to share.

Starting with a oldie but a goodie, chosen only because I finally fixed the Business Glossary template linked to in this article. If you’ve not read that, it’s offers lots of good advice on the how and why of generating, maintaining and using a glossary as part of your data governance (DG) efforts.

Today, we’re going to focus a bit more on the generation phase- specifically on common mistakes and a little best practice. The picture at the head of this post explains the problem. Language is messy, it has evolved over many generations and while there are agreed rules and taxonomies, many words and phrases fall outside of agreed norms. I’m not expert on language evolution, but from a DG point of view, two common themes emerge:

  1. Synonyms:We all know what we mean by a student don’t we?’ That’ll be a no then! There’s a danger in thinking what we call something is universally adopted or even understood. Reasons are legion- different experiences, cultures, specialities, nationalities and even age. From our picture we can see Villa has four valid synonyms for a roofed structure people mingle in*. The precise definition for each one, however,  is very different. This is the classic  ‘describe a fork’ to which someone answers ‘a piece of cutlery to eat with‘ leaving us to ponder if they mean spoon, fork, knife, spork, etc.
  2. Cohorts: “How many students do we have?“. Never have the words ‘it depends’ done quite so much heavy lifting! There is much complexity in how we count things and that has implications for a trusted glossary. We need agreed definitions to help us determine cohorts: ‘part time, full time, funded, domicile, learning path, withdrawn, deferred, etc, etc, etc’. But it’s easy to fall into the trap of sub division by edge case. FTE is another great example. We need enough definitions to get us through the day, but not so many they becomes unusable.

This is not a definitive list! Antoyms are far less common but no less problematic. Reverse engineering data dictionary terms brings many issues when trying to consolidate into a single definition. Attempting to manage external and internal definitions of ‘the same thing‘ is not trivial. Even knowing who the right people are to craft a definition isn’t easy.

Some best practice can help here.

  1. Prioritise. We’ll be back to the importance/process of identifying material/critical data in a future post. Right now, it’s enough to know we need to pick those terms that are going to help us with problems we already have, or projects we’re embarking on. Data Futures is a great example of both of those! It’s also a useful exercise in understanding when to use an external or internal definition. We advocate using the external definition UNLESS there is a good reason not too. It’s often a cohort/counting issue but not always. Important to reference all terms / synonyms / in every version of the term and where they are used.
  2. Write great definitions. Sounds obvious but it isn’t easy.  There is a lot of advice on line from dodgy youtube videos to full on paid for courses. Our advice is to understand all the places a term is used, get those people in a room and generate something that works for you. There are rules/good practice (eg do not use system terms, avoid statuses, be unambiguous, etc) but mostly it’s about getting something that’s good enough.
  3. Consult widely but set a deadline. Business Glossaries fail for many reasons. Two of the most common are a lack of prioritisation and death by edge case. Not every use case can be accommodated so make that clear in the definitions and move on. We try and iteratively publish on – say – a monthly cycle prioritising as we go. Sign off should be with the Data Owner and definitions the purview of the Stewards. Trying to create a glossary without recognised DG roles (eg built out of technology groups) is rarely successfull.
  4. Publish and be damned. Or ‘Sunlight is the best disinfectant’. Two key points here; publish with a robust and transparent challenge process. It is far easier to manage disagreement than apathy. Secondly do not abandon all your good work to a spreadsheet. There are so many ‘free‘ tools we have access to now that offer far better interfaces and access. SharePoint, Wiki’s, whatever is new in Teams this week. The tool isn’t important but it’s prominence and usability is.

This is a big topic and we’ve barely scraped the surface here. Hopefully, though, it provides the start of a framework to deal with the inevitable issues creating a university wide asset that has stakeholders with many different views. Trust us here, it is worth the effort!

A respected business glossary is one of the four pillars of unleashing data’s superpower – that of utility: create once, use many. The other three are quality, literacy and culture. We’ll be back to all of those later this year!

*Except for Villa from Blakes7. That’s just me being geeky!