Friday, November 20, 2009

Need a ton of email data (10’s of gig’s)? Need it in PST form? Need it to be public data? Want to look behind the curtain into Enron? The EDRM Data Set Project is for you…

The Electronic Discovery Model (EDRM)Data Set Project

Mission:

The mission of the EDRM Data Set Project is to compile a 100 gigabyte data set that can be used to test various aspects of electronic discovery software and services.

Enron Downloads:

Posted here are the EDRM Enron PST files. They are organized in 32 zipped files, each less than 700 MB in size. Also posted is a spreadsheet listing the zipped files and 168 .pst files contained in the zipped files.

The total size of the compressed files is approximately 19 GB. The total size of the uncompressed files is approximately 43 GB.

Most files may be downloaded by anyone. Except where otherwise noted, use of files is subject to a Creative Commons Attribution 3.0 United States License. To provide attribution, please cite to “EDRM (edrm.net).”  …

image …”

I know that most of you won’t give a Rats As… um… Butt about this, but I thought it was cool and since it’s my blog… :p

I also have some “history” related to Enron and this also directly helps me in my day job, so.. yeah. The data is not a perfect conversion (because conversion it appears to be and they can be rarely perfect, for example it doesn’t appear to contact rich text), and every email has a “attribution” footer in it, but still attachments are hooked up (unlike some past Enron EML downloads I’ve seen) and the folder structure seems to be pretty much preserved.

Here’s a snap of one of the PST’s;

image

1 comment:

Unknown said...

We're building an new kind of email client at www.inbox2.com.

We're always on the lookout for test email data, so far the exchange 2007 trial VHD has been our primary testbed but this also seems like something very useful.

Thanks for the tip!