Recently, the University of Amsterdam organized a seminar about big data, privacy and ethics during the World Championship Econometrics. Yet these three, different notions seem to be each other’s enemies: how do you combine privacy and ethics with the massive collection of personal data?
How do you deal with the fact that big data is often used to find new correlations between different datasets and therefore might lead to unexpected use of the data, while privacy laws oblige you to be clear from the start about the purposes for which you want to use certain data? How do you deal with the legal privacy principle of data minimization (do not collect more than you need) while the real value of using big data analytics lies within the fact that you collect a huge amount of data in which you try to find small correlations? Inevitably, not all collected data will turn out to be relevant…
Compatible or not: it is clear that big data is here to stay: in our data driven society, the use of analytics helps us to innovate and to deliver new more tailored products and services. Timothy Prescott, previously involved in the Obama campaign in 2007 and 2012 and a panelist in Amsterdam, provided great insight in how elections can be influenced by using big data analytics. As soon as personal data are involved, the question is whether the price that needs to be paid for innovation isn’t too high.
During the discussion, the audience kept asking for more laws. We do however already have strict data protection laws in Europe. These laws will become even stricter with the upcoming entry into force of the General Data Protection Regulation, which will apply to all companies operating in the EU and is therefore expected to have a spill-over effect in other jurisdictions.
The European Commission recently published a factsheet on big data and the GDPR. The GDPR is seen as an enabler for big data services in Europe: enhancing legal certainty and a high level of data protection will in the view of the Commission create consumer trust and thereby economic growth. What’s missing is the discussion on how to deal with the essentials like data minimization and purpose limitations.
One of the solutions often mentioned is to anonymize the data: when the data are no longer considered to be personal data, the restrictions of the privacy legislation will no longer apply. However, true anonymization is difficult. Therefore, agreement on the level of anonymization that would be acceptable in order to ‘escape’ from the restrictions while taking the fundamental rights of all persons from whom the data are collected into account, should be reached.
Nevertheless, privacy is not the only concern. There is also a risk of discrimination, profiling, exclusion and loss of control, which risks were also addressed in two recent White House reports. It should be kept in mind that algorithmic systems used for big data analytics are not infallible: if you put incorrect information, non-objective information in, you cannot expect correct, objective results to come out. On the other hand, analytics can also be used to detect bias and prevent discrimination.
It will therefore be important to start discussions on how we use it in a way that respects civil rights but does not prevent innovation. This discussion should not be limited to the legal domain; what’s legally allowed might from an ethical perspective or in society not be perceived as socially desirable. The guidance and reports from the European Commission and the White House do offer a great starting point to have more extensive discussions on the risks but certainly also the opportunities involved in big data analytics. The best is therefore yet to come!