You Can Write, But You Can't Hide: Big Data Knows Your Writing Quirks
As I wrote recently, data scientists have been able to decode unstructured data to accurately predict where violence will occur in Afghanistan. Now, they can also mine unstructured data to determine the identity of a document’s writer. All of us, it seems, have a “write-print” as unique as our fingerprint.
According to forensic linguists, the experts who investigate a text’s originator, if they have an individual’s known writings, they can detect with up to 95% accuracy that person’s authorship of any other document. Forensic experts have been called as witnesses in the high profile lawsuit by Paul Ceglia, who has sued Mark Zuckerberg, claiming he owns half of Facebook. They’ve also been expert witnesses in murder trials.
While the field of forensic linguistics predates the advent of big data, the sheer volume of data being generated on the Internet is opening new business opportunities for automating the analysis. A company pursuing these opportunities claims it can pinpoint a document’s author and determine everything from the gender, age, and education of a writer to the veracity of the document’s content.