Using Genealogy to Solve a 40-Year-Old Document Problem

2022-11-16

The following is a guest post written by Dean Lester, CEO of FoundFusion.

Since the invention of the first word processors 40 years ago, information workers have created and shared trillions of documents. Documents are the lifeblood for all of us, so why is it still so hard to find and make sense of them? We at FoundFusion are solving that problem by using techniques familiar to genealogists who trace your ancestry to create your lineage, only we create a family tree of your documents.

What is a document?

We define a document as a piece of written, printed, or electronic matter that provides information or evidence, or that serves as an official record.

Documents are one of the primary work products for the over 400 million information workers (sometimes referred to as knowledge or learning workers) in the world today. Documents include contracts, financial plans, white papers, blog posts, and even simple emails.

The value in these documents comes is from professional expertise, experience, and training and the conversations you have along the way with your colleagues, team members, and clients. Whether you are a lawyer, accountant, or real estate agent, people pay you for your services because you have a specialized knowledge they need, a knowledge that is often passed along in the form of some document.

How is a document created?

In the old days before computers, documents would be a physical paper that was handwritten or typed. But in the modern era (beginning in 1979 with WordStar, one of the first successful word processors for microcomputers) documents are now almost exclusively created, edited, and shared electronically.

Microsoft Office and Google Workspace (formerly G Suite) are currently the most popular tools for general document creation. The number of document types in daily use worldwide is enormous, including text, spreadsheets, presentations, technical drawings, and more (Adobe estimates there are 2.5 trillion PDFs in existence alone!). And, chances are, you’ve created and lost your fair share of that number.

How is a document reused?

Given that documents are such an essential component for almost half a billion professionals around the world, why is it still so hard to find and make sense of the various versions? IDC estimates each knowledge worker spends an average of 8 hours a week searching for and organizing documents (IDC’s Information Worker Survey, results explained here). Many of us are frustrated by just how much time we spend on these activities.

Why is it so hard to organize and find our documents? While documents can be created from scratch, more often they start from other existing work that we recycle. Document automation is often used. We know that sections of other documents can be reused to create our new documents: things like boilerplate, context setting, particular contractual clauses, and even complete templates. The bulk of these documents are standardized and the only elements changed might be names, addresses and dates. We copy and paste info from one document to another, or we simply create a copy of the entire document, with some new filename that we hope will remind us of the relationship to the parent document.

Some of those 8 hours are spent simply finding the one clause from that one document we remember creating and naming three weeks ago.

A document is a conversation

After a document is initially drafted, we often collaborate with others to get their feedback—a colleague whose opinion and input we respect, or a party on the other side of a negotiation who needs to change elements of an agreement.

A document then is also a conversation comprised of multiple iterations of a collection of documents that combine your expertise, conversations and disagreements you have had with clients, edits you have received from colleagues and more. Even for a single writer, multiple drafts of a document reflect the evolution of their understanding of that document.

All the versions of a single document can therefore represent points in time during the conversation of all the collaborators. As the conversation continues, the document evolves.

The document lifecycle

We call this conversation—create, review, edit, negotiate, edit, approve—the document life cycle.

Document lifecycle graphic

(Document life cycle as defined by FoundFusion)

These changes and edits that are part of the document life cycle are what lead to multiple versions of a single document, multiple versions that must be named, organized, managed, shared and later found all over again.

How do we share documents?

When we collaborate we must somehow share our document. Sharing a document can be challenging because there are so many ways to share a document today and so many people—both internal and external—that we share with.

We might share a document as an email attachment, or we might use a file-sharing system such as Dropbox or OneDrive. We might use a document management system, such as NetDocuments, an instant messaging platform such as Slack, or a basic network folder that is shared among users. And as crazy as it may sound, there are still organizations that use fax machines to share! Often, the method we use to share a document is determined by the person we want to share it with: who has permission to use which tool?

While knowledge workers like to think about their workflow as standardized on one tool, the truth is we are rarely 100% monogamous when it comes to our documents. Third parties share documents with us through their own “standard system”, which might not be the same as our “standard system.”

It is often easier just to use the lowest common denominator: email. We are willing to bet your inbox still has plenty of emails with paperclips even though you might also use a document management system.

Why is it so hard to find and organize our documents?

Think about how many possible document locations there are—from your own hard drive with hundreds of folders, to email (perhaps multiple accounts), Dropbox, OneDrive, SharePoint, Box, Slack, Teams, Google Drive, Zoom—in addition to any dedicated document management systems your company might use plus real-time collaboration systems, such as Google Workspace. 

Between the multitude of locations where documents can reside and the various names they have, you can see the genesis of the problem: There are too many places to search and too many filenames to search for, even if you have a document management system.

We often find ourselves having to repeat searches in different systems. After looking in our document management system we look in Outlook for the attachment. Then we might have to try OneDrive, and perhaps after that, we look in our file system to see if there was a copy on our desktop or folders. Even after finding a few matches, we might also have to wrack our brains to remember what we called the exact version we need (“Final” or “Version 4” or “edits from Dean”).

The invention of new sharing systems actually makes this harder because we have ever more places to search: Slack and Microsoft Teams provide tools that are helpful for teams, but they also create one more place to search for a critical document. The problem is getting worse, not better.

What do documents have to do with genealogy?

FoundFusion thinks of all these new document versions that are derived from some previous, original document as a family tree or a lineage.

The original version of a document is the parent. The edited version that was created from the parent is a child and the second generation of the parent’s lineage. The version that was produced from that child is a grandchild, the third generation, and so on.

And just like with human ancestry, when new children are created, the name is often changed to help differentiate them from the parent: “version 2,” “Keith’s edits,” “final,” and, of course, “final final.”

What if we approached the “locating a document” problem by leveraging genealogy.

When genealogists research your family to create your family tree or lineage, they might start with DNA matches or with your parents’ full names and places of birth. Using those clues, genealogists can show a relationship between two people.

DNA for humans; DNA for documents

DNA matches are a quick start for genealogists because they can’t be lost in a disaster like paper records, or confused by name changes due to adoption, marriage, or immigration.

Services like 23andMe and Ancestry.com take your DNA sample and compare it to other samples in their databases. These sites return suggested relationships based on the closeness of the match. A parent will have more DNA that matches yours than a fourth cousin.

What if we could devise a system that could analyze all our documents from the perspective of their similarity, just like using similar DNA to determine familial relationships? If we could recognize documents that were related to one another at a DNA-level (i.e., the document contents) then we would not have to worry about deciphering the document names. And neither would you.

23andMe knows who my parents are no matter if our family names changed. What if we could determine that Proposed Contract.docx and Document Lineage Contract—Steve’s edits.docx and Final Contract-Final.docx were all related, even though the filenames were very different

Even better, what if, just like 23andMe, we could bring all the related documents together in one place in a simple timeline to help you find the right version exactly when and where you need it?

Document DNA tells the story

Remember that idea that a document is a conversation? Often, we need to be able to easily assemble and organize all the versions of a document so that we can more easily see the whole of the conversation and how it evolved, especially when we need to resolve some problem with the document or simply to check we are working on the latest version. Think of it kind of like “document forensics”.

As an information worker myself, I ran into this problem repeatedly, but no one was addressing it effectively, even while I worked as a general manager in Microsoft’s Office engineering organization in Redmond.

The thinking I had all along, even before joining Microsoft, was if you could look at a (very!) long list of all your thousands of documents, you could open each one and probably identify the documents that were versions of one another by reading their contents. The only problem is this would take days or even weeks to do.

The solution would need to embrace the reality that even with diligence and organizational effort (and document management systems), our work lives can still be untidy with documents scattered across multiple workspaces and differently named versions orphaned from one another.

I imagined a theoretical document management system that applied this genealogical approach along with all the associated benefits: 

  1. It would unify all the document locations in use to provide an instant all-encompassing search — eliminate the need to repeat the same search multiple times in different places
  2. It would understand the history of any document without requiring any special tags or naming — no need to change how someone prefers to work today.
  3. It would present the lineage timeline for any document exactly when it was needed, including what was changed, when it was changed, and by who.
  4. It would be able to determine related documents even if the format had changed (Word docx exported to PDF, for example) since the words would still be related.
  5. It would automatically find previous occasions someone had used similar wording and suggest those clauses — why re-invent (or re-type) the wheel?
  6. It would reduce the time wasted filing documents in different folders — the system would unify the related documents in their families no matter where each individual version resides.
  7. It would make archiving and sharing easier and faster since it already understood the complete document family and could copy them as a unit in one step.
  8. It would notify someone if a specific clause had changed since the Document DNA would have been altered between different versions.
  9. It would reduce risk by recommending better versions of wording since it could offer pre-approved versions of clauses in use in a new document.
  10. It would help with Data Loss Management since it could recognize pieces of Document DNA.
Document lineage example

So rather than yet another search engine, what knowledge workers really need is a “results engine” that makes sense of the documents rather than just providing a long list of scattershot search hits and leaving them to work through them one at a time.

We all have ideas from time to time that we think could make things better, but we don’t often get the opportunity to make them real. Rather than just philosophize about how this approach could save time and frustration, the real test was to work on delivering it as a system people could use every day.

FoundFusion’s Document Lineage System compares your documents, determines their relationships, and creates a visual timeline of a document’s lineage to help you make sense of the document you are viewing. It's even able to understand and correctly find a related document whose format has changed—such as a Word document exported as a PDF, or a PDF that contains scanned text as a picture (a signature page for example).

This process is all totally automatic, quietly working in the background as new documents arrive, be it a new email with an attachment (even if you haven’t opened the email yet!), a new document shared in a Dropbox, a new Word document you just saved to your desktop.

With your documents easily identifiable and accessible, your time will be well protected and you'll have one less thing to worry about. Try the system for free and get those 8 hours back each week.

Tags:
Business Education
Written by Dean Lester

Dean Lester is CEO and a co-founder of FoundFusion. He previously spent 15 years at Microsoft as a General Manager for Microsoft Office and Microsoft Windows.

Take the faster path to growth.
Get Smith.ai today.

Affordable plans for every budget.

Take the faster path to growth.
Get Smith.ai today.

Affordable plans for every budget.