Email is perhaps the most common document type involved in a large-scale disclosure exercise.

A single email server or backup tape, may contain millions of separate emails. It is very likely that duplicates of the same email will exist, in which case it is essential that they are de-duplicated. This ensures that only one of the two identical documents is reviewed.

Metadata can also be used to sort and filter the emails for potential relevancy using the following criteria:

  • Named individuals who had access to emails, whether they were sent, received or copied;
  • Domains of named individuals or ‘generic’ junk-mail or personal web-mail domains;
  • Time parameter, for example, between 1 July 2003 and 31 December 2003;
  • Key words or phrases in the subject line;
  • Key words or phrases in the body of the email text or in email attachments.

This type of filter can dramatically reduce the number of emails that require manual review.

PLT also uses concept technology or fuzzy logic to return emails that are likely to be relevant, despite not containing the exact word included in the relevancy search list. The word ‘dog’, for example, may be highlighted during a search for the word ‘pet’.