Data Rivers​ — Looking for gold in an ocean of silver

How your compliance program can be optimized with data science

Editor’s Note: Fresh Perspectives is an exclusive series of The Compliance Report that features expertise across Convercent. Each week we will feature a different Convercent expert, capturing their opinion and unique voice. Fresh Perspectives will be published weekly on Fridays.

When data is brought up to talk about compliance, most people go into panic mode. Experts in the field, like Convercent’s VP of Innovation & Research Greg Ebert, are constantly working to help take the massive amounts of data that are living and breathing in the world and devise a way that compliance executives can use with confidence.

Think of a data river in a similar way as you do the Internet – an intangible place that makes searching for topics much easier than say heading to the library. The Internet has topics and resources categorized by keywords using algorithms and network connections to bring you only what you’re seeking information about. It simplifies the process of going to the library and searching for a resource in the card catalog. More often than not, the resource you find isn’t exactly what you need. You then have to start the process over. It takes more time to find the resources than it does to analyze the information inside the resources and use it effectively.

This idea of data lakes and rivers automates a rather laborious and time-consuming process, which creates more time in the day to be productive and provide attention to your compliance program; just like the Internet does when you are conducting in-depth research, for example.

Data rivers is to finding the right kinds of data for your compliance program as the Internet is to the card catalog system at a library.


SOURCE: PwC, Data lakes and the promise of unsiloed data, 2014.
SOURCE: PwC, Data lakes and the promise of unsiloed data, 2014.

You may have already heard of data lakes and the value and diversity of data they can store.  In case you’re just catching up on the topic or this is the first time hearing of it, a data lake is something very different than a conventional data warehouse. With a global push and dependence on large volumes of data, that is the application of big data, data lakes take out the manual aspect of accessing and integrating these large sums of incoming information. What is not typically talked about is the enormous amount of unstructured text that exists along with the contextual relevance to the subject.

Data repositories like this have great value when you know how to get data into them, are properly understood and utilized correctly. Here’s how to accomplish all of that to start using data rivers in your compliance program.

If you have ever tried to pan for gold in a body of water, you know that there are a lot of useless bits and pieces of matter to sift through before you find that one shiny piece of gold.

It’s a similar process when you need to sift through volumes of text.  So much of it doesn’t matter, but that small percentage that does matter can have great impact.

As Convercent’s head of innovation and research, I find myself nose deep in interesting findings centralized around panning the information flows for those shiny nuggets of data. In this space, we handle terabytes of data on daily basis for clients, which leads me to ask: how can I apply what I do to what a compliance professional does every day? How can this data help them?

Think of the concept metaphorically – something that represents the constant flow of time sensitive or temporally relevant data and information in the open digital universe. These congruently moving “rivers” can be tapped into to extract freely flowing formation. The rivers of information move in the form of content that is actively created and syndicated at an extraordinarily fast pace. This content is timely and newsworthy and takes the form of current events.

For example, think of the constant flow of information that is coming out of news companies like the AP, Bloomberg or CNN — 365 days a year, seven days a week, 24 hours a day. Or consider blogs that are providing new information relevant to compliance and regulatory changes that may only be jurisdictionally relevant such as the Wall Street Journal’s “Compliance & Risk Report” that is delivered to your inbox each morning.

Because of these factors and their constant rate of change and the velocity that it travels – it greatly complicates measuring compliance risk programs. Without this, the rivers cannot adapt to environmental influencers. If you consider some of the various types of information that are flowing around in the rivers, and the direct or indirect influence they have over the day-to-day operations of your company, you may take a deeper look at what you sift out of the river.

For each case, subject matter and influencers are interwoven into unstructured content, such as:

  • Laws and Regulations
  • Geo-Political Environments
  • Cultural and Socioeconomic Climates
  • Broad and Specific Economic Conditions
  • Industry Specific Terminology

In the large scheme of operational visibility to the things that can inform decisions, there are large amounts of valuable and relevant information that can move within those terabytes of content. That’s a lot of data for one person or even a team of people to stay current with.

Information flowing in these rivers is being created, curated and syndicated by various organizations and individuals.  It can be rich in industry or be more mainstream. And to make things more interesting, the information can take a bias to an industry or profession or demographic. Multiple sets of content from RSS feeds and blogs can speak to the same subject in very different ways. The collective intelligence of those conversations may hold more or less value depending on the person receiving it.

So how can you find those valuable pieces of information that are in this constant and ever changing information flow?

Unstructured information is organic, purely from the nature in which it is created. As an example of some subtle, yet important, aspects of organic information is that the content’s uniformity may not be consistent and may have a narrow perspective based on the author’s style of writing.

Many streams of information may be on the same subject matter, but come from different sources. The subject matter is the common denominator.

When you search for information, you want to get value from that information without spending all day reading. Ultimately, you want to apply that information directly and connect it to matters of interest – which in turn, simplifies the information’s overlay to your policies or risk management programs.

In other words, you capture what you care about from the data rivers, and then use it for analysis in real-time.

If we were only computers that can do this one task all day, but yet, we are not. However, modern software solutions allow a platform to process high volumes content at high speeds and without rest. You don’t need to read through the minutiae and hope you find the information you looking for in the fifth paragraph of syndicated blog.

Software algorithms can look for the compliance keywords, ideas or topics you care about and find the subject matter that is important and most impactful to your compliance program.

A real-world example of this idea that has recently impacted data transfer between the United States and the European Union is the new framework for transatlantic data flows through the EU-US Privacy Shield. (Read the recent announcement made by the European Commission to learn more.)

The ripples of that potential impact started before the Court of Justice of the European Union (CJEU) deemed the EU Safe Harbor agreement invalid in the fall of last year. The outcome of this judgment has a direct impact on some 4,000 companies that relied on data transfer between these governments. Consequently, the laws and regulations that are created in the wake of the declaration have both strategic and tactical impact to the way in which the river freely flows and how organizations mine data.

The news sources, blogosphere and corporate legal channels started to syndicate the beginning of the regulatory ripples long before the CJEU invalidated the EU Safe Harbor Agreement. Each one of those sources had a common subject or common dominator: the EU Safe Harbor agreement, even if the jargon surrounding it was disseminated in various contexts.

If you’re running a compliance program, the relevant content of that initial unstructured information flow can be used to identify and elevate the risk probability of data privacy that may occur in your organization. The best part: this can happen while you are traveling, prepping for a board meeting or busy focusing on other aspects of your compliance program.

No longer do we have to sit around and fantasize about the day we can ask a computer to help you find new information on-demand and have it connect to your compliance program. Today is the day to get your feet wet and your swimming arms strong – the data river is the next best thing to happen to your compliance program.