5 Basic Rules for a Good Metadata Scheme

Have you ever looked something up – let’s say a work instruction – but you couldn’t decide whether the information you found was still up to date? Maybe you were lucky and there was metadata linked to the document, telling you the period of validity or the like. Metadata is really helpful, isn’t it? And it is even more helpful if it has the same structure for each document, as this allows the search engine to present only valid documents to you; or you could transfer all outdated documents to the archive with a single click.

Batch processing like this is only possible if there is a clearly defined structure for the metadata – a metadata scheme. This article will tell you why this is of such importance and what you should consider when designing your own scheme.

Why Do I Need a Metadata Scheme?

You don’t want to do the boring, repetitive stuff on your own, do you? These are things that machines can handle for you. But machines aren’t that capable of processing unstructured data. You need to provide a structure from which they can gather the data’s context and meaning. The more precise the definition of the structure, the easier the job becomes for the computer, and the better the results will be. Basically, a metadata scheme is nothing more than a definition of context and meaning.

Of course, the scheme doesn’t just help the machines. If there are rules in place for which data is to be stored where and in what form, typing errors can be detected during input. Given a good scheme, much of the metadata can even be captured or generated automatically.

To sum it up: A metadata scheme…

enables effective automated data processing and management;
enhances the metadata’s quality and thus, its worth;
reduces the effort in capturing metadata.

What Makes a Good Metadata Scheme?

The best scheme is the one that best supports and simplifies data capturing and processing. By following some basic rules, you can develop a scheme perfectly suited for your data and its use.

Liste mit abgehakten Kästchen - ein pinker Textmarker setzt Häkchen.

1. Determine the Scope

The scheme must match all data processed together – and only this data.

A scheme matching all data at hand makes it possible to process all of it using the same automation. On the other hand, very different kinds of data usually share few attributes. Thus, a scheme that is too general won’t help you much. So, think about which data will be processed (managed, searched for) together. These should share a scheme. You don’t need to take other data into account. Of course, you can reuse parts of your scheme for the next set of data.

2. Select the Right Fields

Create only the fields you will be using and divide the information into as many fields as possible.

A metadata scheme consists of fields, each of which contains one specified piece of information. It is worth investing some time in the concept. The key question is: What will the metadata be used for? Defining a field that is not needed at all or that will be empty in most cases due to difficulties in compiling the data is a waste of time.

Break the information down into the smallest parts possible, as combining two fields is easier and causes less trouble than splitting up the content of a single field. Thus, for every field, make sure there is no combination of multiple independent pieces of information. If there are combinations you need all the time, you may save the combined value in an additional field – but make sure this field is generated automatically in order to prevent contradictions.

3. Do Not Reinvent the Wheel

Prefer a standard scheme that fits well over a self-development that fits perfectly.

Many fields have been using metadata for a long time now. The odds are good that there is already a robust scheme or exchange format for your field of business. Using a standard scheme comes with a lot of advantages. You can use data you get from others without adapting it if the same standard scheme is used. Widespread schemes come with tools and interfaces, simplifying maintenance even more. And last but not least, you won’t need to invest the effort in defining your own scheme. So, if iiRDS, Dublin Core or MODS provides all you need, you will benefit from using one of these instead of your own carefully fitted scheme.

4. As Precise as Possible

Prevent all unnecessary variance in the data.

The fewer alternatives there are in the scheme, the better. Each alternative is an opportunity to choose the wrong option. Define exactly which information will be stored in which field in which form. Data types, selection lists and Regular Expressions (a language for describing strings) are a great help in accomplishing this. They prevent typing errors and ensure that the same information will always be stored in the same way. But there are more simplistic tools that will help you a lot as well. For example, for the field “grade”, allow only the numbers 1 to 6 (or letters A to F); for numerical values, define whether a dot or a comma is to be used to separate decimal places. Even a short explanation for which information should be entered in a given field will help.

5. Optional vs. Mandatory

All fields should be mandatory – unless there is a serious reason to make it optional.

If your metadata will be captured automatically or by specialists, it should be mandatory to fill in every field applying to every instance. Every person has a name, every file a format, every digital text its encoding. Empty fields will make the dataset inconsistent and thus harder to process.

However, if the data is entered by people who do not manage the data, it may be a good idea to set only the most important fields as mandatory. Entering too much information is time-consuming, leading to inattention and thus to thoughtless, erroneous or even random entries. In such cases, you must balance between reasonable effort and required data quality.

Of course, optional fields aren’t worthless even when using automated data capture – as long as the empty field constitutes information in and of itself. E.g. the empty field “last renovation” in a house’s metadata would mean that the house was never renovated.

Beside these 5 rules, there is also the rule of practicability, of course. If implementing the optimal scheme takes too long, downstrokes at the precision may be unavoidable. Have you already experienced this problem? Or do you see the main issue somewhere else? Let us know in the comments!

Done creating the metadata scheme? Time for the next step: Capturing! Or rather, creating?

About
Latest Posts

Isabell Bachmann

IT Information Managerin at avato consulting ag

Isabell Bachmann studied Digital Humanities and Philosophy at the University of Würzburg and has been working as an IT Information Manager at avato consulting ag since 2018. Her focus is on data and information modeling as well as AI-based text analysis and processing. In addition, she deals with the creation and administration of IT documentation, terminology maintenance and gamification.

Latest posts by Isabell Bachmann (see all)