All posts by defdel

Index Engines Announces Program to Jumpstart Readiness for the GDPR

New Three-Month Program Helps Organizations Support Unstructured Data Classification in Support of Compliance with Governance and Privacy Initiatives.

HOLMDEL, NJ – Information management software company Index Engines announced the Governance Readiness Starter Bundle Wednesday, a rapidly deployable 3-month software and services solution that empowers clients to get started in classifying unstructured user data in support of industry regulations and compliance requirements, including the upcoming GDPR.

The Governance Readiness Starter Bundle includes metadata classification of unstructured files where data is organized based on value, including the tagging of redundant obsolete and trivial (ROT) content that no longer has business value.  Following this three-month jumpstart, additional software and services can be deployed to provide more comprehensive solutions, including data migration and minimization as well as full-content search and archiving.

“This bundle is designed to help organizations understand their data assets, so they can clean and classify them to prepare for the GDPR and other governance initiatives,” Index Engines Vice President Jim McGann said. “With the GDPR just months away and many organizations unsure of where to start, this program is that starting point and can easily be expanded to satisfy the challenges faced with respect to unstructured data management.”

Index Engines’ engineers remotely manage the project and work with the organization throughout the 90-day process.

This 3-month term license provides all the software and services you need to get started and manage a 50TB ($18,150), 100TB ($27,200) or 250TB ($49,300) project within the data center. Following this 3-month term clients can transition to additional capacity and licensing terms.

This program is divided into six easy-to-implement steps:

  • Project Prep/Kickoff: Plan engagement with customer project team
  • Install/Training: Install, validate and train on Index Engines software
  • Indexing Setup: Configure indexing jobs and schedules
  • Index Monitoring: Monitor and tune Index Engines processing jobs
  • Report/Consultation: Develop data classification reports and review with client
  • Project Wrap-up: Review project and develop a plan for next steps

“When we talk to organizations who want to start a governance initiative, they never know where to start. The Governance Readiness Start Bundle empowers organizations to look at high-risk file servers like the finance server or a file share server and understand what data exists and start the cleanup process,” McGann said. “Our software along with the professional services bundled in makes compliance possible at a reasonable price without taxing company personnel.”

Learn more about Index Engines’ Governance Readiness solutions by emailing, visiting  or registering for our upcoming webinar

Deletion has to be Defensible, even for the IRS

The painful lesson learned when ignoring backup tapes as part of your defensible deletion and data governance policies

DefDel email

by, Jim McGann

Lois Lerner’s emails are gone. We know this, but more than a server issue or hard drive crash, the backup tapes that archived the untampered with and complete records of those emails were destroyed.

Now, it could cost IRS Commissioner John Koskinen his job. 18 US Congressman are seeking impeachment against Koskinen on the grounds of his “failure to check Lerner’s cell phone and backup tapes that contained missing emails related to the scandal.”

According to a Wall Street Journal article, there are a few points that Koskinen is being accused of, all which could have been avoided with a proper data governance policy and documentation of the policy.

  1. “In February 2014 Congress instructed Koskinen to supply all emails related to Lerner… A few weeks after the subpoena, IRS employees in West Virginia erased 422 backup tapes, destroying up to 24,000 Lerner emails.”

Tapes need to be incorporated into governance policies. Had these tapes been part of a defensible deletion or information governance policy, they likely would have been managed properly and treated as records or defensibly deleted as a part of the normal IT process.

  1. “The second charge cites “a pattern of deception” and three “materially false” statements Koskinen has made to Congress, under oath, including his assurances that no Lerner emails had been lost. In fact Lerner’s hard drive had crashed and employees erased tapes.”

After disaster recovery, tapes can become a defacto archive. Once a tape is no longer useful for disaster recovery, it’s nothing more than a snapshot of data. Despite any legal claim stating otherwise, they serve no other purpose except for a defacto archive and should be treated as such. Financial burden and inaccessibility arguments are also becoming null and void.

  1. “A final charge accuses Koskinen of incompetence, noting how despite his insistence that his agency had gone to “great lengths” to retrieve lost Lerner emails, the IRS failed to search disaster backup tapes, a Lerner BlackBerry and laptop, the email server and its backup tapes. When the Treasury Inspector General did his own search, he found 1,000 new Lerner emails in 14 days.”

Data – email included – never dies (easily). When creating policy, it’s important to understand where the data goes: desktop, secondary hard drive, server, backup tapes, disk, archive. By understanding this and creating (and auditing) policy restricting portable devices, PSTs and other places data can go, an organization can more effectively create an enforceable policy and manage risk and liability.

Data, including what is archived on backup tapes, must be properly audited and managed. When data is deleted without an understanding of why, how and when, problems inherently arise, especially if this data is at the heart of high profile litigation. All data – especially data on backup tapes – should have a governance policy surrounding it to make it defensible and avoid the pitfalls of the IRS.

Exercise defensible deletion with Index Engine’s Catalyst software

The hardest part of creating and enforcing defensible deletion policies around data retention within your data center is understanding what and where data exists. PSTs, former employee files and duplicate data get lost over time and become near impossible to gauge.

Data profiling creates the map so you can construct information management, defensible deletion or other retention policies. Built-in capabilities within the catalyst engine make the process simple, defensible and automated.

Many organizations find that using data profiling to determine disposition, significant actions can occur which result in cost and resource savings. Many organizations find that as much as 22 percent of data is abandoned, 14 percent aged and has had no access in more than 3 years, 24 percent duplicate content, 6 percent personal multimedia files (iTunes, video, vacation pictures, etc) that have no business value and 18 percent risk based data for legal and compliance.

Using Catalyst’s data profiling module a comprehensive understanding of what exists is available on your network. Working with the business users a plan can be defined to determine the disposition of the content.

Built-in disposition capabilities include:

Deletion with validation – Manage the defensible deletion of unstructured data using validation to ensure the content has not changed since it was profiled. Validation checks the modified date or optionally the signature of the document.

Copy – migrate or tier data to any network share. Data will be copied and stored on a less expensive or more appropriate location.
Move – copy data to a new location, validating that it was migrated correctly, followed by a delete from the original location will result in a data move. This process ensure that content is migrated accurately and reliably.
Defensible audit logs – As disposition of the data is performed, including deletion, logs will be maintained that detail the date and disposition of the document, including the user that executed the disposition.
Output listings – full path and filename listings are available allow the use of 3rd party tools and utilities to manage disposition. This would include options to encrypt data.

Defensible deletion of backup tapes in the cloud

Backup tapes hold a snapshot of your data center in case of emergency, but they can also be the key to gaining a better understanding of your data environment. The latest tape set holds the key to where duplicate records, abandoned data, PSTs, personally identifiable information and more are hiding on your server.

Through the Index Engines cloud lab (or one of our technology providers) organizations can quickly and easily take the latest backup of the storage environment, index the backups offline and create a data profile of the server(s). This could be the first step in a defensible deletion plan.

Once the indexing is performed a set of reports will be generated to provide a profile of servers and a secure hosted session can be established so the customer can produce custom reports and perform more in-depth analysis:
• Determine what data can be cleaned up to reclaim capacity (e.g. duplicates, by access time, non-business data like iTunes, movies…)
• Audit existing IT, Security or Information Governance policies (e.g. PII data, personal PST files..)
• Manage departmental charge backs, and more.

Through this, organization can gain an efficient, low-cost, non-intrusive look into their servers and can see where any waste has accumulated, where security risks could be hiding, data that should be on legal hold or data that can be moved to secondary/cloud storage.

Profiling your backup tapes in the cloud allows organizations to:
• Eliminate on-site visits with the use of backup tapes,
• Get a metadata or full-text view into server data,
• Streamlined, convenient access to your data, and
• Full custom reporting capabilities to understand data.


Why defensible deletion matters: PII still found in Enron Data Set

Taken from in the name of defensible deletion.

Enron’s republished PST data set still contains numerous personally identifiable information violations despite Nuix’s ‘efforts,’ Index Engines finds

The Enron PST data set has been a point of controversy for the legal community and the latest self-touting of this data set being cleansed by information management company, Nuix, has rekindled the discussion – why facilitate and publish a data breach?

The Nuix-cleansed and republished document is still littered with many social security numbers, legal documents and other information that should not be made public as found after a simple review by Index Engines.

Index Engines indexed the cleansed data set through its Catalyst Unstructured Data Profiling Engine and ran one general PII search which looks for number patterns and different variations of the words “social security number.”

After a cursory review of the responsive hits it was easy to find many violations. Understanding that some could be false positives, a review of the first 100 records found dozens of confirmed data breaches. These breaches were buried deep in email attachments, sent folders and Outlook notes.

Examples of the missed breaches are below – but we took the liberty of blacking out PII. You don’t serve dinner on partially cleaned plates because people can get sick. You don’t release a partially cleaned data set because people’s identity can be stolen.

The most troubling part of how much PII Index Engines still found is the risk of identity theft these people face from having their information published. Already having their name, former employer and social security number, a quick search of social media can show their marital status, town, college, friends, current employer and make them an easy target for identity theft. If I was one of those people – I’d call a lawyer.

Then, there’s the troubling thought, legally, that even when you think your data’s clean, is it? In this case it wasn’t and should make companies, law firms and service providers question the tools they use for eDiscovery and litigation readiness.

In case you missed it, according to Nuix’s press release, they, along with EDRM, took the well known Amazon Web Services Public Data Set and used a series of investigative workflows to uncover and remove PII. The findings returned 60 credit card numbers, 572 social security or other national identity numbers and 292 birth dates, the release said, the uncovered items were then removed and a cleansed data set was republished.

It’s truly a scary thought when technology is supposed to do a job and can’t.











Leveraging Data Profiling For Achievable Projects

More than ever before, organizations want to know what kinds of information are stored within the IT infrastructure. Why? Because this information bogs down critical production systems like email and collaborative document management, costs money to store, and presents massive risk if not managed correctly. Few organizations truly understand the makeup of their digital landfills. But, that is soon to change. According to a recent eDiscoveryJournal survey, more than 50% of organizations plan shared drive migration and clean-up projects – more than any other named information governance projects.

These projects aim to defensibly delete unnecessary, outdated, or duplicative information while keeping valuable knowledge or content that is on Legal Hold. This is not just a nice corporate “house keeping” idea; it is now a necessity due to the high growth rate forecast by business analysts. McKinsey Global Institute, for example, projects data to grow at 40% per year; thus making it virtually impossible to effectively and economically store and manage organizational information without some form of culling.

In order to do this, organizations need to efficiently profile data. This insight into information can help get past analysis paralysis to clear digital landfills. In this webinar, eDJ Analyst Greg Buckles and XYZ will examine practical approaches to data profiling and how to set organizational goals. We will example the approaches to information governance, such as managing data in-place, dealing with Legal Holds, selecting targets for profiling, and information classification. We will examine case studies focused on PST audits and profiling for disposition. Finally, this webinar will offer pragmatic advice on how to use data profiling to achieve immediate results today while building out a larger information governance strategy and plan.

Space is limited. Register now.

data breach

7 Signs a Data Breach Could Be Looming

Data breaches have made the headlines much too often lately and left many IT, legal and compliance departments to wonder how they would react to a breach.

But instead of reacting, you can proactively assess your risk of a data breach and work to solve any vulnerable areas during a self audit. Look to see if any of these red flags live in your data environment.

  1. Mystery data. Do you know the type of data located on every server, backup tapes and even hidden email files such as PSTs? Different custodians within the organization create and maintain different types of data at different levels of sensitivity. By not knowing who created what and where it is, it leaves the door open for files to get lost and fall into the wrong hands.
  2. Poor archiving. Do you practice value-based archiving or an archive everything strategy? The latter leaves your important, sensitive data lost among a network of junk. Data gets lost and forgotten about until misplaced.
  3. Duplicates. How do you manage your duplicate data and do you know where your duplicates are? It doesn’t make much sense to protect one document when hundreds of copies of it exist in the enterprise. Understand and manage duplicate data.
  4. Personally Identifiable Information. Does your sales or service team routinely handle credit cards, Social Security numbers or other PII? Could any of that information have been sent over email by someone who does not understand the risks? Audit your system for PII.
  5. Un-interpretable data. Un-interpretable data is data that belonged to an ex-employee and was created a number of years ago likely has little business value, but it is a compliance risk. It can no longer be properly interpreted in its original context. Jokes can be crimes. Misunderstandings can become lawsuits. How much turnover does your business have?
  6. PSTs. These sensitive little email files don’t live with the rest of the emails, often creating copies or mini archives that go unmanaged. Where do they live, who owns them and when were they last accessed?
  7. Executive data. How the former CEOs email is handled and how last summer’s interns email is handled  should be dramatically different. Are they held in an archive on retention policies with a set expiration dates or still on the computer they used?

You likely recognized at least one flag that exists in your data center and if you found four or five, you’re with the majority of large companies. There’s help out there. Email for more information or visit:

Defensible deletion reason #372: Unmanaged, unstructured emails are a fire waiting to start

Over time, email piles up in massive servers, archives, even users desktops and it becomes like a matchbook underneath a child’s bed. Alone, it causes no threat and just sits there, waiting. They can go years and even a lifetime without ever causing a problem.

While no one would leave a matchbook underneath a child’s bed, as it’s completely unfathomable, few think twice about their email servers.

But, why such a visceral reaction to leaving a matchbook in a kid’s room? The matches are not going to burst in to flames, they won’t just spark old comic books and baseball cards, and matches are not the easiest thing to start – even as an adult. We take precautions because of what could happen if those matches got into the wrong little hands.

So why do we just hoard email on servers, desktops and even on legacy backup tapes when there are harmful matches among them? Within the millions of email are Social Security numbers, contracts, legal documents, regulatory compliance papers and emails that can no longer be properly interpreted. Like the matchbook, this dark data just sits there. They don’t just expose themselves, they don’t just jump through firewalls and they aren’t just going to send themselves.

Yet, all it takes is one set of wrong hands and a fire can quickly develop. Thieves search for personally identifiable information that can cause loss of customers, FTC interference and identity theft. Legal and regulatory documents can’t be found or end up in the wrong hand causing fines and penalties. Plus, don’t forget all the money needed to repair and upgrade fire walls and pay legal fees associated with breaches.

Just like a parent sets the rules, compliance, legal, IT, records managers or another guardian needs to set policies surrounding emails. Retention policies, containing both archiving and deletion policies, should be in place to govern data. One leading analyst group recently estimated that less than one percent of companies actively have and enforce an information governance policy.

Much of this goes back to the tools – how do you set policy around data when you don’t know what exists or where? It’s near impossible to understand unstructured data and uncover all those pesky, hidden PST files. But now the technology exists in the form of unstructured data profiling.

Data profiling, sometimes called file analysis, is a process where all forms of unstructured files and email are analyzed and the user is provided a searchable ‘map’ and comprehensive summary reports of the metadata including type of information that exists, where it is located, who owns it, if its redundant, and when it was last accessed.

Optionally data profiling can look beyond metadata and go deep within documents and email for content supporting eDiscovery keyword searches or even personally identifiable information (PII) audits for sensitive content such as Social Security or credit card numbers.

Not only does the technology exist, but it exists at a price point that makes it affordable to deploy, leaving no room for excuses why the matches in the email server and hoping the wrong pair of hands doesn’t find it. Even for those that don’t want to throw out or move the matches – it’s imperative that you at least know the matches are there so they aren’t left next to the comic books.

Unfortunately, many won’t find the motivation to find, expose and isolate the matches until after a breach, but those that see the proactive importance of simply knowing what data is being stored, visit or contact

Data profiling: Bridging the gap between Legal and IT

One of the key challenges, as you know, is getting legal and IT to communicate. They have not had a common language – language that allows them to understand each other and build policies. This language is based on knowledge – knowledge of data assets. Without this knowledge they have nothing to discuss. ata profiling is the knowledge or language that allows IT and legal to communicate and build sound polices.

Check out this column on Bridging the Gap Between Legal and IT in Legal IT Professional.

Defensible Deletion Webinar: Tame Risk Hidden in Legacy Archives

Join Index Engines and Vedder Price Thursday, March 14 for a look into defensible deletion

Is old data jeopardizing your organization?

Massive volumes of content are created every day and as this content ages it fades into the background and becomes a challenge and a risk to manage.

Backup tapes represent a major aspect of this legacy data. They embody archives of user content, from sensitive email communications to critical contracts and agreements.

But there are innovative, automated ways to manage defensible deletion policies, manage data risks and control costs. Find out more during an exclusive webinar presented by Index Engines and Vedder Price, Thursday March 14 at 1 pm ET. Register now.

It’s a real problem that’s costing organizations millions in litigation and eDiscovery costs.  Stockpiles of hidden data contain unknown risk and liabilities.

Managing this data and ensuring it complies with current information governance policies is an ongoing, complex challenge.

This webinar will give real-world new approaches towards reducing legacy data, controlling its cost, and managing the content that can be enacted immediately. Register now. 

Bruce Radke, Shareholder, Vedder Price
Jim McGann, Vice President Marketing, Index Engines

A blog by Index Engines