Posted by: swhee1er | April 10, 2014

Ethical questions loom large for Big Data

The unique ethical challenges posed by the internet become especially knotty when applied to the analysis of large data sets, or Big Data. Take the issue of informed consent. Instead of a “mailing list with 100 or 1000 subscribers” (Eysenbach & Till 2001, p. 2), Big Data researchers deal with subject populations many times that number. Since asking researchers to obtain consent from every user who comprises these data sets appears to be unrealistic, does that make any research conducted using Big Data ethically questionable?

Similar questions arise if one turns to the issue of harm. Given their size and diversity, accurately diagnosing the risks in researching large data pools remains a difficult task at best. Users’ comfort levels with their online activities being analyzed are bound to differ, and determining those users who find it acceptable and those who do not may prove no more viable than obtaining consent from every user involved. Even if researchers manage to resolve (or sidestep) this hurdle and anonymize the data, it has been demonstrated that it is not only possible but also relatively easy to “de-anonymize” that same data.[1] Considering the potential for injury, this last point seems especially damning, not just of studies using Big Data specifically, but also of internet studies in general.

One potential solution would be to scrub the data of all possible identifiers, which would presumably protect all subjects but also limit the data’s utility. Is this an acceptable solution, or does a better one exist?

[1] For a fascinating (and troubling) look at how easily data can be “de-anonymized”, see

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: