How to clean up your shared drives, Frank’s approach

by Frank 22. August 2014 06:00

In my time in this business (enterprise content management, records management, document management, etc.) I have been asked to help with a ‘shared drive problem’ more times than I can remember. This particular issue is analogous with the paperless office problem. Thirty years ago when I started my company I naively thought that both problems would be long gone by now but they are not.

I still get requests for purely physical records management solutions and I still get requests to assist customers in sorting out their shared drives problems.

The tools and procedures to solve both problems have been around for a long time but for whatever reason (I suspect lack of management focus) the problems still persist and could be described as systemic across most industry segments.

Yes, I know that you can implement an electronic document and records management system (we have one called RecFind 6) and take away the need for shared drives and physical records management systems completely but most organizations don’t and most organizations still struggle with shared drives and physical records. This post addresses the reality.

Unfortunately, the most important ingredient in any solution is ‘ownership’ and that is as hard to find as it ever was. Someone with authority, or someone who is prepared to assume authority, needs to take ownership of the problem in a benevolent dictator way and just steam-roll a solution through the enterprise. It isn’t solvable by committees and it requires a committed, driven person to make it happen. These kind of people are in short supply so if you don’t have one, bring one in.

In a nutshell there are three basic problems apart from ownership of the problem.

1.     How to delete all redundant information;

2.     How to structure the ‘new’ shared drives; and

3.     How to make the new system work to most people’s satisfaction.

Deleting redundant Information

Rule number one is don’t ever ask staff to delete the information they regard as redundant. It will never happen. Instead, tell staff that you will delete all documents in your shared drives with a created or last updated date greater than a nominated date (say one-year into the past) unless they tell you specifically which ‘older’ documents they need to retain. Just saying “all of them” is not an acceptable response. Give staff advance notice of a month and then delete everything that has not been nominated as important enough to retain.  Of course, take a backup of everything before you delete, just in case. This is tough love, not stupidity.

Structuring the new shared drives

If your records manager insists on using your already overly complex, hierarchical corporate classification scheme or taxonomy as the model for the new shared drive structure politely ask them to look for another job. Do you want this to work or not?

Records managers and archivists and librarians (and scientists) understand and love complex classification systems. However, end users don’t understand them, don’t like them and won’t use them. End users have no wish to become part-time records managers, they have their own work to do thank you.

By all means make the new structure a subset of the classification system, major headings only and no more than two levels if possible. If it takes longer than a few seconds to decide where to save something or to find something then it is too complex. If three people save the same document in three different places then it is too complex. If a senior manager can’t find something instantly then it is too complex. The staff aren’t to blame, you are.

I have written about this issue previously and you can reference a white paper at this link, “Do you really need a Taxonomy?”

The shared drives aren’t where we classify documents, it is where we make it as easy and as fast as possible to save, retrieve and work on documents; no more, no less. Proper classification (if I can use that term) happens later when you use intelligent software to automatically capture, analyse and store documents in your document management system.

Please note, shared drives are not a document management system and a document management system should never just be a copy of your shared drives. They have different jobs to do.

Making the new system work

Let’s fall back on one of the oldest acronyms in business, KISS, “Keep It Simple Stupid!” Simple is good and elegant, complex is bad and unfathomable.

Testing is a good example of where the KISS principle must be applied. Asking all staff to participate in the testing process may be diplomatic but it is also suicidal. You need to select your testers. You need to pick a small number of smart people from all levels of your organization. Don’t ask for volunteers, you will get the wrong people applying. Do you want participants who are committed to the system working, or those who are committed to it failing? Do you want this to succeed or not?

If I am pressed for time I use what I call the straight-line-method. Imagine all staff in a straight line from the most junior to the most senior. Select from both ends, the most junior and the most senior. Chances are that if the system works for this subset that it will also work for all the staff in between.

Make it clear to all that the shared drives are not your document management system. The shared drives are there for ease of access and to work on documents. The document management system has business rules to ensure that you have inviolate copies of important documents plus all relevant contextual information. The document management system is where you apply business rules and workflow. The document management system is all about business process management and compliance. The shared drives and the document management system are related and integrated but they have different jobs to do.

We have shared drives so staff don’t work on documents on ‘private’ drives, inaccessible and invisible to others. We provide a shared drive resource so staff can collaborate and share information and easily work on documents. We have shared drives so that when someone leaves we still have all their documents and work-in-process.

Please do all the complex processes required in your document management system using intelligent software, automate as much as possible. Productivity gains come about when you take work off staff, not when you load them up with more work. Give your staff as much time as possible so they can use their expertise to do the core job they were hired for.

If you don’t force extra work on your staff and if you make it as easy and as fast as possible to use the shared drives then your system will work. Do the opposite and I guarantee it will not work.

Do you really need a Taxonomy/Classification Scheme with a Records Management System?

by Frank 26. October 2013 06:00

Background

Classification schemes are a way to group or order data; the objective being to group ‘like’ objects together. Classification schemes have been in use for tens of thousands of years, probably beginning when man first realized that there were different types of animals and plants.

We use classifications schemes both to make things easier to find and to add value to a group of objects. By adding value I mean that a classification (describing a group) may provide more information about the members of that group that is obvious from an analysis of a member; this could be referred to as semantics.

Classification schemes are used in all walks of life, for example; in business, in science, in academia and in politics. Are you a liberal or a conservative? Is it a mammal? If it is, is it a marsupial or a monotreme or a placental mammal? This last example illustrates the usual hierarchical arrangement of classification schemes.

In business, we have long used classification schemes to order business documents, that is, records of business transactions. We are all familiar with file folders and filing cabinets; these things are tools of a classification scheme. They make implementing a classification scheme easier as do numbering systems, colors, barcodes and Lektrievers.

With the first commercial availability of mainframe computers in the early 1960s came our first attempts to computerize filing systems. It was also in the 1960s that we saw the first text indexing systems and the first sophisticated search algorithms.

The advent of text indexing and search algorithms allowed us to do a much better job of classifying data but more importantly, they allowed us to do a much better job of finding data.

Let’s not get in a debate about terminology and acronyms

Our industry (information management to use an all-encompassing term) is often its own worst enemy. It creates terms and acronyms at will with both confusing and overlapping definitions. Then it wonders why normal end–users exhibit first bewilderment and then disinterest. Let’s look at a few examples, e.g., RIMS, RMS, DMS, EDRMS, IAMS, CMS, ECM and KMS.

Do you realize that the process of records management is part of each of the preceding acronyms?

For my part I will stick with my old friend the world records management standard, ISO 15489. It tells us that records are evidence of a business transaction and that records are in any form including paper, electronic documents and emails (I know emails are electronic documents but the world generally differentiates them because emails are ‘different’).

So as far as I am concerned the term Records Management System or RMS includes everything we do and is easily recognized and understood so this is the term and acronym I will use in this paper.

Browsing versus searching

Classification systems are very good at making it easier for us to find information by browsing but not very helpful when we are searching.

Most classification systems require you to first ‘browse’ before finding the exact information you want; you usually have to examine multiple objects before you find the one you want. But this is what classifications systems are very good at; because they organize data in a logical (to a human being) way, we usually know where to begin looking. This is why a classification scheme works so well with a manual filing system (multiple cabinets or multiple shelves of file folders)

Classification schemes are great for physical data and, I would say, absolutely necessary for physical data; how else would you organize fifty-thousand file folders (containing seven and a half million pages) in a huge filing room with hundreds of shelves?

However, with computers I don’t need to browse through multiple objects to find the one I want. By using techniques more appropriate to the computer than the filing room, I can search for and find exactly what I want almost instantly. I do not need to leaf through the file folder, I can go directly to the page or directly to the word. I can use the power of the computer.

The following statement will be probably seen as heresy by most practicing records managers but we actually don’t need a classification system (Taxonomy) when computerizing records. We just need a way to index and then search for information.

We need to organize our data so an ordinary end-user can easily find what they need without having to be a trained, professional records manager.

Indexing versus classifying

Now I know my interpretation of these two terms will not thrill everyone but the differentiation is an important part of my hypothesis.

Let’s start by looking at two kinds of books, a reference book and a work of fiction. Both have tables of content (a classification system usually called a TOC) but only one (the reference book) has an index (usually).

The TOC for the reference book is both useful and often used. The TOC for the work of fiction is both not useful and rarely used (readers rarely need more than a bookmark).

The TOC for the reference book is way to organize information into a logical form grouping ‘like’ information together in chapters and sections. A TOC for the work of fiction is just a list of chapters; it serves little or no purpose for the typical ‘end-user’, the reader.

All the reader of a fiction book really needs is two things; a bookmark and a ‘memory’ of the author, title, cover combination so he/she doesn’t accidentally buy it again at the airport bookshop before that dreaded long and boring flight.

The reader of the reference book actually needs both the TOC and the index for browsing (the TOC) and searching (the index).

A work of fiction doesn’t usually have nor need an index because the end-user doesn’t require it. A reference book usually has an index and it is often used to go direct to a page (or pages) and locate something very specific.

Drawing parallels with our broader topic, some information needs both a classification system and an index, some information needs just an index and some doesn’t require either (e.g., works of fiction).

Generally speaking, scientific collections require a classification system (a scientific taxonomy); for example, the study of plant species and the study of animal species (e.g., using a phylogenetic classification system). Scientists simply could not communicate with each other without having a detailed and exact classification system in place. But, most end-users are not scientists; they are just people trying to find the best place to store something and want to find it again with the least amount of effort and pain.

My contention is that we can solve all ‘content management’ and records management needs with a solution based on the application of a sensible, simple and self-evident (read that as easy to use or human-oriented) indexing system plus the required searching capabilities (i.e., covering both Metadata and full text). There is a better way.

What indexing system?

Whenever I consult with customers who are contemplating the capture and organization of data (hopefully into information) I always give the same advice. That is, “When you are thinking about how to index data first think about how you will find it later.” Ask this key question of your end-users, “When you are about to search for information what do you usually know about it?” For example:

  • Do you know the last name?
  • Do you know the first name?
  • Do you know the date of birth?

A good indexing scheme reflects real life usage of the system; it reflects how ordinary humans work and ‘see’ information. Put simply, it indexes the information people will later need to search on. It indexes the information people understand and are comfortable with because it is self-evident.

Indexing Emails

An email is usually described as an unstructured document (the same way a Word or Excel document is described as being ‘unstructured’) but in fact it does have structure. Even better, everyone is familiar with an email’s structure so we have very little to teach end-users; that is, we have a simple and self-evident ‘natural’ set of Metadata items to index.

  1. Date of email
  2. Sender
  3. Recipient
  4. CC
  5. BCC
  6. Subject
  7. Text of the body of the email
  8. Text of any attachments

For any normal end-user trying to find an email this is how they would envision an appropriate search.  They wouldn’t care that the email has been classified down to 6 hierarchies using the world’s most sophisticated Business Classification Scheme (BCS).

Understanding what end-users typically ‘know’ before they do a search determines what elements you have to index. This is the key to implementing a successful indexing system.

The above 8 elements of an email are self-evident insomuch as, “Of course I need to be able to search on the sender or recipient or subject….”

Indexing Electronic Documents

Now let’s look at ordinary electronic documents (i.e., not emails) because they are much less structured. We all know there are ways to add a common structure using features of MS Office like the information dialog box (asking for keywords etc) and templates and smart tags but these things are rarely and inconsistently used.

With shared drives we usually find some form of ‘evolved’ classification system because managing electronic documents in shared drives is akin to managing millions of pieces of paper in tens of thousands of file folders in hundreds of filing cabinets. Unfortunately, the good intentions and purity of design of the original architects of the shared drives folder/sub folder naming conventions (a classification system) are soon corrupted as users make uncoordinated changes and the structure soon becomes unwieldy and incomprehensible.

In my opinion shared drives are OK for the creation of documents (i.e., a work area) but not OK for the management of documents. In fact I would say shared drives are absolutely hopeless for the management of documents as history and practice will attest.

Once again we need an appropriate indexing system and once again we need to ask, “What do people know at the time of the search?” For example:

  1. Original filename
  2. Original path/filename
  3. Type/suffix – e.g., .DOC, .XLS, .PDF, etc
  4. Author
  5. *Subject

Metadata and the Dublin Core

Let me quote from the Dublin Core website:

http://dublincore.org/

“The Dublin Core Metadata Element Set is a vocabulary of fifteen properties for use in resource description. The name "Dublin" is due to its origin at a 1995 invitational workshop in Dublin, Ohio; "core" because its elements are broad and generic, usable for describing a wide range of resources.”

To quote Wikipedia:

http://en.wikipedia.org/wiki/Dublin_Core

“It provides a simple and standardized set of conventions for describing things online in ways that make them easier to find. Dublin Core is widely used to describe digital materials such as video, sound, image, text, and composite media like web pages.”

The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 elements.

  1. Title
  2. Creator
  3. Subject
  4. Description
  5. Publisher
  6. Contributor
  7. Date
  8. Type
  9. Format
  10. Identifier
  11. Source
  12. Language
  13. Relation
  14. Coverage
  15. Rights

To my mind the Dublin Core is an excellent set of elements for describing almost any ‘record’ because it is both simple and appropriate to both computers and ‘normal’ end-users. As a professional, I like the elegance of the Dublin Core.

I also like the basic principle because it fits in with my hypothesis. That is, there is a better way to store, index and find records than a complex and unwieldy Taxonomy.

The Full Solution?

  • We need an application that stores documents of all types, i.e., all types of content.
  • We need an application that indexes both Metadata and full text.
  • We need an application with a customer configurable Metadata model.
  • We need an application that allows you to search on both Metadata and full text in a single search.
  • We need a search that combines BOOLEAN and numeric operators, e.g., AND, OR, NOT, =, <, >, etc.
  • We need a ‘standard’ Metadata definition (a Class if you will) that includes a simple (not more than 20 in my estimation) set of data elements that includes all of the elements necessary to index all of the types of documents (including file folders and paper) that you manage.
  • We need an application that includes all types of data capture, e.g., from the file system, from the native application, from a scanner, etc.
  • We need an application with a comprehensive security system.
  • We need an application with all reporting options, e.g., both standard reports and ad hoc reports.
  • We need an application with a configurable audit trail.
  • We need an application with comprehensive import and export capabilities.

 

The standard Metadata definition (Master Metadata Class)

I have come up with a limited set of elements that I believe can be used to index and find any type of record, paper or electronic. I have borrowed heavily from the Dublin Core because it makes good sense to do so; there is no need to reinvent the wheel.

#

Element

Explanation

1

Title

A name given to the record. Typically, a Title will be a name by which the record is formally known.  Text, e.g., "Business Plan for 2010"

2

Author(s)

The sender or author, E.g., Mark Twain or f.mckenna@k1corp.com

3

Dated

The original date of the document or published date

4

Date Received

Date received by the recipient or recipient's organization, whichever is the earlier

5

Original Name

e.g., filename or file\pathname for electronic documents  - C:\franks stuff\sample.xls

6

Primary Identifier

An unambiguous reference to the record within a given context. E.g., The file number

7

Secondary Identifier

An unambiguous reference to the record within a given secondary context. E.g., The case number or contract number or employee number

8

Barcode

Barcode number or RFID tag

9

Subject

The topic of the record. Typically, the subject will be represented using keywords or key phrases. Recommended best practice is to use a controlled vocabulary.

10

Description

An account of the record. Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the record.

11

Content

Words or phrases from the text content of the main document and attached documents

12

Contents

Description of contents if the document is a container, e.g., an archive box

13

Recipient(s)

Addressed to, sent to etc. People or organizations.

14

CC recipient(s)

CC and BCC recipients

15

Publisher

An entity responsible for making the record available.  Company or organization that either published the document or that employs the author

16

Type

The nature or genre of the record, usually from a controlled list, e.g., complaint, quotation, submission, application, etc.

17

Format

The file format, physical medium, or dimensions of the record. E.g., Word, Excel, PDF, etc

18

Language

e.g., English, French, Spanish

19

Retention

 The retention code determining the record’s lifecycle

20

Security

Access rights, security code, etc

 

My contention is that by using an ‘index set’ like the above 20 Metadata elements you can index, manage and retrieve any ‘record’ regardless of form and content.

What about all the standards ‘out there’?

There is a plethora of local, state, federal, industry and international standards pertaining to the management of records. Examples are DoD 5015, MoReq2, Dublin Core, ISO 15489, VERS etc and literally thousands of standards for Metadata.

The problem with most of these standards is that they are extraordinarily difficult to read and understand (even the Dublin Core documentation can be heavy going). I would draw a parallel back to the times when the Bible was in Latin but Christians were supposed to order their lives by its teachings. The problem being that only about 0.025% of Christians spoke Latin. Ergo, how do you order your life by a book you can’t read?

My assertion is that most records managers do not fully understand the standards they are charged with enforcing.

The problem isn’t with the records managers; it is with the people who write the standards. The standards are not written for records managers, they are written for academics and technical people (i.e., systems engineers who are experts in XML).  Just like the Latin Bible, they are not written in the language of the intended user.

And even when you do think you have a grasp of the fundamentals there are always multiple points to be clarified (as to the exact meaning) with the standards authority.

What about Retention/Disposal schedules?

This should probably be the subject of another paper because retention schedules have also become way too complex, unwieldy and difficult to understand and apply.

The question will be, “How can I do away with my classification system when my retention codes are linked to it?”

I have looked at hundreds of retention schedules and every single one has been way too complicated for the organization trying to use it. Another problem is that very few of the authorities that compile retention schedules do so with computers in mind. This means that we end up with lots of very vague conditional statements that are almost impossible to computerize.

Most retention schedules are written for archivists to read, not for computers to process. This is the heritage of retention schedules; they assumed an appraisal process by a trained and expert archivist.

The Continuum model or ‘Whole of Life’ model or File Plan model all assume we will allocate a retention code at the time the record is created, not during a later appraisal process. This made much more sense and allowed us to better manage the record throughout its life cycle. However, many such schemes also linked the retention code to a classification term or embedded the retention codes within the classification system. This of course made the classification system even more complex and difficult to understand and apply.

To my mind no organization needs more than ten retention codes (shortest period, longest period and eight in between) and three life cycles (e.g., active, inactive, destroyed). This is also probably heresy to a lot of the records management profession but, I would ask them to think about the proposition that something that was entirely appropriate to the manual world is not necessarily entirely appropriate to the computerized world. There is an easier and simpler way to manage retention and there is no need to embed retention codes into the classification system just as there is no need for a classification system in any modern, computerized records management system.

What about File Folders and Archive Boxes?

This is the classic stumbling block. This is when the records manager tells you that all the standards require you to use the same taxonomy for emails and electronic documents that he/she uses for traditional file folders and archive boxes.

You need to explain that the classification from the manual paper handling world is inappropriate to the computerized world, that it is an anachronism. You need to explain that all it will add is complexity, massive cost, confusion and a seriously negative attitude to end-users. You should say it is time to discard techniques and tools from the eighteenth century and adopt techniques from the twenty-first century. You should say you have a much better way. Then you should probably duck and run. Failing all else, blame me and give them my email address.

 

 

Month List