Data protection: Proving discrimination

Machine learning, which often underpins algorithms and AI, poses a particular problem as algorithms can “learn” discrimination by studying tainted data. 


Transparency, algorithms and machine learning

A worrying example of where discrimination can perhaps be “learnt” is in relation to facial recognition technology where academic research has concluded that darker skinned females are less likely to be accurately identified.  In so far as facial recognition technology is being deployed by the police in relation to the prevention of crime, Part 3 of the DPA 2018 would likely apply.

There is similar research in the field of online targeted advertising which revealed that when searching for a “black-identifying name”, a user was more likely to be shown personalised ads falsely suggesting that the person might have been arrested than in comparison to a “white-identifying name”. The type of algorithmic processing at the heart of this type of discrimination would be regulated by the GDPR and the DPA 2018.

Part of the problem with discriminatory machine learning and tainted data sets is the thorny issue of transparency.  As identified by a report of the House of Commons, Science and Technology Committee, human controllers may not be able to “see” let alone understand the basis upon which a machine learning algorithm is making decisions:

Transparency would be more of a challenge, however, where the algorithm is driven by machine learning rather than fixed computer coding. Dr Pavel Klimov of the Law Society’s Technology and the Law Group explained that, in a machine learning environment, the problem with such algorithms is that “humans may no longer be in control of what decision is taken, and may not even know or understand why a wrong decision has been taken, because we are losing sight of the transparency of the process from the beginning to the end”. Rebecca MacKinnon from think-tank New America has warned that “algorithms driven by machine learning quickly become opaque even to their creators, who no longer understand the logic being followed”. Transparency is important, but particularly so when critical consequences are at stake. As the Upturn and Omidyar Network have put it, where “governments use algorithms to screen immigrants and allocate social services, it is vital that we know how to interrogate and hold these systems accountable”. Liberty stressed the importance of transparency for those algorithmic decisions which “engage the rights and liberties of individuals” (footnotes removed)

https://publications.parliament.uk/pa/cm201719/cmselect/cmsctech/351/35106.htm

Whilst there are many documented examples of discriminatory technology, a good deal of these incidents have been exposed due to painstaking and no doubt expensive research. By way of example, journalists at ProPublica had to analyse 7,000 “risk scores” in the US to identify that a machine learning tool deployed in some states was nearly twice as likely to falsely predict that black defendants would be criminals in the future in comparison to white defendants. Most claimants will not have access to this level of resource. This is deeply problematic because ordinarily transparency is an important step towards understanding whether a system, technologically based or otherwise, is discriminatory. Indeed, the prohibition on discrimination in relation to fully automated decision making arising from a law enforcement purpose (s.49 DPA 2018) is all but meaningless unless transparency is guaranteed.


Transparency in relation to non law enforcement

The GDPR contains a principle of transparency as follows:

  • Personal data shall be … processed lawfully, fairly and in a transparent manner in relation to the Data Subject (‘lawfulness, fairness and transparency’) (Article 5 (1)(a)).
  • When personal data is collated, there is a duty to inform the Data Subject “in a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child” (Article 12 (1)).
  • A Data Controller shall, at the time when personal data are obtained, provide the Data Subject with the following further information necessary to ensure fair and transparent processing …. “the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject” (Article 13 (2)(f) / Article 14 (2)(g)).

At first blush, invoking the principle of transparency within the GDPR and DPA 2018 in relation to algorithms and machine learning looks promising.

However, the GDPR does not go so far as to dictate that algorithms or the basis for machine learning must be disclosed. This is confirmed by the Article 29 Data Protection Working Party document entitled “Guidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679” which is available here. Indeed, the ICO guidance entitled “Automated decision-making and profiling” suggests that the principle of transparency is fairly weak when it comes to algorithms. It provides the following commentary:


How can we explain complicated processes in a way that people will understand?

Providing ‘meaningful information about the logic’ and ‘the significance and envisaged consequences’ of a process doesn’t mean you have to confuse people with over-complex explanations of algorithms. You should focus on describing:

– the type of information you collect or use in creating the profile or making the automated decision;
– why this information is relevant; and
– what the likely impact is going to be/how it’s likely to affect them.

Example
An on-line retailer uses automated processes to decide whether or not to offer credit terms for purchases. These processes use information about previous purchase history with the same retailer and information held by the credit reference agencies, to provide a credit score for an online buyer.

The retailer explains that the buyer’s past behaviour and account transaction history indicates the most appropriate payment mechanism for the individual and the retailer.

Depending upon the score customers may be offered credit terms or have to pay upfront for their purchases.

https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/automated-decision-making-and-profiling/what-else-do-we-need-to-consider-if-article-22-applies/#id3

If the principle of transparency enshrined within the GDPR means simply that organisations are under an obligation to provide rather superficial “high level” explanation about the use of algorithms, it is highly unlikely that it will give rise to meaningful scrutiny. Certainly it seems unlikely that an organisation would provide sufficient information so as to allow a potential claimant to demonstrate that a particular algorithm was discriminatory. This is an area where urgent litigation is required so as to better understand the nature of the principle of transparency.


Transparency in relation to law enforcement

There are various rights afforded to Data Subjects under the DPA 2010 in relation to law enforcement data processing. One of which is a right to information about the law enforcement purpose as follows:

  • The purpose of the processing must be made available either generally to the public or in some other way (s.44 (1)(c)).
  • The Controller must give the Data Subject information so as to enable him or her to exercise their rights under the DPA 2018 (s.44 (2)-(3)).

Information must be provided in a suitable format including one that is readily intelligible (s.52 DPA 2018).

However, this right to information is heavily qualified.

The right to information does not apply where the processing is in the course of a crimination investigation or criminal proceedings, including proceedings for the purpose of executing a criminal penalty (s.43 (3)).

It follows that transparency is only available in relation to the following law enforcement activities: prevention and detection (s.31 DPA 2018).

The right to information also does not apply in relation to personal data contained in a judicial decision or in other documents relating to the investigation or proceedings which are created by or on behalf of a court or other judicial authority.

The Controller may also restrict, wholly or partly, the provision of information to the Data Subject where, having regard to the fundamental rights and legitimate interests of the Data Subject, it is necessary and proportionate to do so as to:

  • Avoid obstructing an official or legal inquiry, investigation or procedure (s.44 (4)(a)).
  • Avoid prejudicing the prevention, detection, investigation or prosecution of criminal offences or the execution of criminal penalties (s.44 (4)(b)).
  • Protect public security (s.44 (4)(c)).
  • Protect national security (s.44 (4)(d)).
  • Protect the rights and freedoms of others (s.44 (4)(e)).

The Data Subject must be informed if his or her right has been restricted (s.44 (5) DPA 2018) and records must be made (s.44 (7) DPA 2018).

Finally, there is no express entitlement to be informed of how any algorithm or machine learning process is being applied to the Data Subject’s personal data.

However, there is one potential silver-lining since, as explained above, the Controller must give the Data Subject information so as to enable him or her to exercise their rights under the DPA 2018 (s.44 (2)-(3)). Bearing in mind s.49 DPA 2018 prohibits discriminatory fully automated decision as explained here, one possible area to be litigated in the future is whether a Controller could be compelled to disclose the details of any algorithm so as to demonstrate compliance with this provision.


Shifting the burden of proof through a lack of transparency

Of course, if there is a lack of meaningful transparency, then this may take centre stage when it comes to challenging discriminatory technology. Discrimination lawyers will be very familiar with the line of European Authorities, such as C-109/88 Danfoss, which establish that a lack of transparency in a pay system can give rise to an inference of discrimination. The principle would equally translate to challenges to discriminatory technology. If it is not possible to explain how an algorithm is operating, then there is a real risk of a successful discrimination claim as the user of the technology will not be able to provide a non-discriminatory explanation for the treatment (see “Beginner’s Guide to key AI terms and concepts”).


Exposing tainted data

Alternatively, the DPA 2018 and the GDPR might be used to gain access to the data used by algorithms and as part of machine learning in the hope that this would at least indicate if discrimination might be happening.

In relation to processing unrelated to law enforcement, the Data Subject has a right to be told under Article 15 of the GDPR if personal data is being processed, and if so, have access to that data and the categories of personal data concerned.

The precise wording is as follows:

(1) The data subject shall have the right to obtain from the controller confirmation as to whether or not personal data concerning him or her are being processed, and, where that is the case, access to the personal data …

(3) The controller shall provide a copy of the personal data undergoing processing. For any further copies requested by the data subject, the controller may charge a reasonable fee based on administrative costs. Where the data subject makes the request by electronic means, and unless otherwise requested by the data subject, the information shall be provided in a commonly used electronic form.

https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:02016R0679-20160504&from=EN

In relation to processing for a law enforcement purpose the DPA 2018 also contains a right to access as follows:

  • Confirmation as to whether or not personal is data is being processed (s.45 (1)(a)).
  • Access to the personal data (s.45 (1)(b)) and information, which includes:
  1. Categories of personal data (s.45 (2)(b)).
  2. Communication of the personal data undergoing processing and of any information as to its origin (s.45 (2)(g)).

The Controller may also restrict, wholly or partly, the provision of access to the Data Subject where, having regard to the fundamental rights and legitimate interests of the Data Subject, it is necessary and proportionate to do so as to:

  • Avoid obstructing an official or legal inquiry, investigation or procedure (s.45 (4)(a)).
  • Avoid prejudicing the prevention, detection, investigation or prosecution of criminal offences or the execution of criminal penalties (s.45 (4)(b)).
  • Protect public security (s.45 (4)(c)).
  • Protect national security (s.45 (4)(d)).
  • Protect the rights and freedoms of others (s.45 (4)(e)).

The Data Subject must be informed if his or her right has been restricted (s.45 (5) DPA 2018) and records must be made (s.45 (7) DPA 2018).

Article 15 in the GDPR and s.45 in the DPA 2018 may allow potential claimants to understand if information concerning protected characteristics is being used by an algorithm or as part of machine learning, for example, race or gender.

Inevitably, group litigation where a number of claimants have pooled their resources and shared personal data might well be even more effective at demonstrating that data sets are discriminatory. It follows that the GDPR may assist claimants, to a limited extent, in understanding whether discrimination is occurring.