Friday, 24 May 2019

TOEIC English Language Test and Home Office flaky data




The National Audit Office has just published its Investigation into The Response to Cheating in English Language Tests. The report criticises the Home Office for failing to protect students wrongly accused of cheating in an English language test that they had to sit as part of a visa application process.

Approximately 2500 students have been forcibly removed from the United Kingdom after being accused of cheating and another 7200 left the country after being warned that they faced detention and removal if they remained. So far 12,500 appeals have been heard in the courts of which 3600 have been won.

The test was known as TOEIC (Test Of English for International Communication) and consisted of a written and verbal element. The aim was to ensure that overseas students applying for UK study had relevant levels of fluency in English. The Home Office contracted the task to a US operation called ETS. ETS in turn contracted with a number of UK language schools to act as test centres. The tests themselves were mediated by computer and the results were assessed back in the United States.
The Home Office received a stream of data from ETS which it then converted into a “look up tool” and officials used this in determining whether tests had been passed and if there had been cheating.

There is no doubt that cheating occurred, indeed a BBC Panorama programme in February 2014 showed it taking place as students were prompted to input “correct” answers. Large-scale frauds were being carried out by some of the test centres, presumably in return for covert fees. But large numbers of students taking the test were wholly capable of passing the test unaided. Everything depended, therefore, on the reliability of the data that ETS supplied to the Home Office.

The full NAO report describes the many obstructions that were put in the way of those who wished to appeal and how many applicants were detained or removed without having any significant opportunity to appeal.

In fact by 2016, if not before, it was clear that the ETS data was insufficiently reliable for the Home Office to be taking the actions that it did.

One element of the testing was about verbal skills and it was being suggested that large numbers of the voice files that were thereby collected were not of the actual applicant but as some proxy. Specialists in forensic voice analysis said that the tests that were used by ETS to link a voice file to a real person were insufficient to function reliably at the quantity of files that had to be examined. The point was not that in absolute terms the tests used were unreliable but that they were insufficient to be relied on solely for important decision making.

But there was a greater problem about the ETS data and this is where I became involved. I was asked by solicitors to look at the entire testing procedure in the light of the number of obvious anomalies in the results. How was it possible that large numbers of people who could obviously converse fluently in English were being failed? Moreover what was the explanation where individuals were being recorded as having taken a test at a particular time when they had clear evidence of alibis to show they were elsewhere?

By the time I was instructed the test centres which were suspected of acting fraudulently had been closed down and their records and computer systems had more or less vanished. Some of the principals were facing criminal charges.

What I wanted to do was to understand the procedures by which students were registered for tests, what happened when they attended for tests, what sort of computer records were created during the tests, and how that data was sent to ETS. After considerable effort by solicitors it was possible to get some of the operational manuals. Few of these documents had any dates associated with them and we came to understand that some of the processes changed over time. I am of course no expert in English language testing – what I was interested in the step-by-step processes and the controls against cheating. ETS were concerned about their own reputation and in any event in turned out that some of their computer processes were out-sourced to a third party.

The ETS arrangements anticipated that individual students might cheat but had not really thought through the possibility that much of the cheating would be carried out by the test centres. In my detailed report I looked at a variety of means by which such cheating could be enabled. It was theoretically possible that data files could be directly manipulated but one also had to acknowledge that this would require relatively high levels of computer skill. It had been seen in other investigations of test centres that use was made of remote control software so that whilst a student at a computer terminal that terminal would in fact be controlled by someone else using specialist software such as team viewer.

But one very likely method of cheating was probably at the stages where students initially registered and then later presented themselves for a test.  A further strong possibility was the test centre delaying the sending of test results to ETS in the US and using the delay to substitute faked results.  This second method had been highlighted by one of the ETS staff but apparently no follow-up action occurred.  

Alas the difficulty of getting hold of accurate and complete records either from the test centres or from ETS meant that one could not identify a definitive fraudulent method. What was also interesting was the difficulty of carrying out any cross checks on the data – were the test results correctly matched up to an authentic, properly identified, applicant? Applicants had a registration number but each individual test (each student took several) had its own identifying number as well – probably part of an arrangement so that ETS testers would be “blind” to the individuals. Everything then depended on the correct matching up of the tests to the applicants.   Simple low cost antifraud measures such as the use of webcams to capture the presence of individuals sitting at a particular terminal had not been used..

What one could conclude was that ETS data records, as with their methods of identifying students by voice pattern, were insufficiently reliable for the Home Office to be making the decisions that it did. As early as 2014 it was clear that a number of test centres were providing the means of cheating by the use of proxies.  But why would those aiding cheating limit themselves to just the one method when there were other loopholes ready to be exploited?

Once the presence of significant quantities of anomalies had been shown the Home Office should not have continued to use the ETS files alone as the basis of decision-making.   The most obvious next step was surely to permit those who wanted to to take a fresh test with stricter controls over the circumstances.  The costs, actual and political, would have been low.

The National Audit Office as the U.K.’s official spending watchdog make significant criticisms of the Home Office. I have seen through my own experience the considerable sums of money that were spent in various appeal proceedings. At one stage Home Office lawyers were seeking to prevent my giving evidence to a tribunal on the basis that this was a judicial review which would not normally hear “new” evidence. Their lawyers were more interested in legal procedure than in getting a just solution. Public money was spent not only directly on lawyers and officials employed by the Home Office but also in legal aid where applicants were able to secure it.

The National Audit Office does not involve itself in direct criticism of Home Office politicians and officials though no doubt others will.

An All Party Parliamentary Group (APPG) is now in existence and is chaired by Stephen Timms MP.  https://bit.ly/2wmnJRB.  I have been asked to give evidence to them on June 11. 

The NAO report is at: https://bit.ly/30ERKKe
My own detailed report, plus other statements and relevant law reports are at: https://bit.ly/2IZfN0f
Detailed press coverage in the Financial Times is at: https://on.ft.com/2DihTVK


Tuesday, 7 May 2019

Digital Forensics and Privacy: Conflicted?

If we are to use evidence from digital devices such as smartphones, PCs, corporate machines and the cloud we need it to be reliable.  But some of the methods used to achieve this can violate privacy as a digital forensic examiner may  initially have unlimited access and not know in advance what is relevant and what is not.  Digital examinations can be highly intrusive.  
This is a review of the law and practices.   Focused reform is needed but demands to "stop digital strip searches" are over-simplistic.  A way forward is to distinguish the situation where the entire content of a device is downloaded for the purpose of data preservation - freezing the scene at a definite point in time -  and the later situation when reasons and justifications for detailed examination become clearer.  I make some recommendations. 


The conflicts between providing reliable evidence from computer devices while supporting the privacy and human rights of their owners - complainants, accuseds and wholly innocent third parties -  has been an issue of important but limited debate – until now.  The National Police Chiefs’ Council (NPCC) and the Crown Prosecution Service (CPS)  announced they had agreed on a Victim’s Consent to Access form[1]. It resulted in a noisy campaign centred around the particular problems of  investigations of rape,  with protagonists speaking of “digital strip search” and the Association of Police and Crime Commissioners (APCC), asking for the form to be withdrawn.

I have been examining computers and computer-like devices and giving related expert evidence since the mid-1990s. I have written about the conflicts with privacy before,  most recently in the wake of the 2018 inquiry into Disclosure of Evidence in Criminal Proceedings by the House of Commons Justice Select Committee[2] but it seems a good time to update and expand.  Reforms in law and procedure are needed, but they are not the ones most commonly being demanded.

Importance of Digital Evidence
The importance of digital forensics parallels the importance of computers and digital devices in our lives.  Police say that the average UK home contains 7.4 digital devices most of which contain stored files but also software, configuration and system data which can be interpreted. The smartphone in particular has its very intimate relationship with its owner 60/60/24/7/365 and is recording activities second by second.  Others put the figure even higher, the Internet Advertising Bureau UK cites 8.3 devices per home and Gartner suggest that by 2020 each of us may have 5.1 connected devices on our person.  Digital evidence in one form or another appears in not less than 70% and up to 90% of crown court cases.

 Evidential Reliability
If you are going to use digital evidence it has to be reliable if miscarriages are to be avoided.  A key feature, as with other forms of evidence, is that it is capable of being tested.   As with other forms of physical evidence the way in which it is processed is critical to its reliability.  Police, lawyers and forensic practitioners refer to particular stages: Identification of sources likely to be helpful, Acquisition, Preservation, Analysis, Continuity, Disclosure to the Defence,  Presentation in Court.

·         Acquisition:   to ensure that the process of seizure does not alter or corrupt original material
·         Preservation:  to take care that once in police custody potential evidence does not become altered or corrupted by accident
·         Analysis:  examination and possible conclusions by a forensic practitioner together with a report that will go before the court and be available for testing by the defence
·         Continuity:  also known as chain of custody, so that when an item is presented in court there is a complete explanation and set of records to show how it has been handled since acquisition
·         Disclosure to the Defence:  since  the Criminal and Investigations Act 1996 (CPIA) prosecutors have had a duty to “disclose” to the defence – to make them aware of anything uncovered during an investigation which might strengthen the defence case or weaken the prosecution case.  Another feature of CPIA is that defendants are penalised if they fail to produce a defence case statement setting out their general approach.  The prosecutor’s duty to disclose is continuous and that includes making further disclosure dependent on the detail provided in the defence case statement.  Police and forensic examiners are under a duty to “reveal” their activities to the prosecutor to enable disclosure to be made
·         Presentation in Court:  court appearance of witnesses supporting their exhibits and findings from their examination of seized items, and to be available for cross-examination

Digital evidence has particular features. In addition to sheer volumes on even modest devices is its very high level of volatility. Simply viewing the contents of a file, or even asking for a directory of files will cause alterations. These may occur within the file or in its associated metadata such as time and date stamps or in log and configuration files. (For Windows geeks: in the Registry among other places).  


Forensic Image Copies
For a quarter of a century it has been the practice when dealing with evidence from digital devices  that a “forensic image copy” is made of the device at as early an opportunity as possible. (The procedures have been updated to deal with smart phones and acquisition from cloud-based services). This is done the several reasons. First, executed properly and using specialist hardware, software and a standardised procedure, data does not become altered in the course of acquisition.   Second, direct examination of a device is highly undesirable because in the course of it data will inevitably get altered; all examinations take place on the copy not the original. The original is available if necessary for reversion and checking.  Third,  in continuity terms it provides an explicit physical link between a device and the person responsible for it so that there can be attribution of its contents.  Fourth, it is all too easy for individual emails, social media postings, webpages, photographs, et cetera to be subject to forgery. But it is extremely difficult to forge an entire hard disk or the memory in  a phone because indications of the existence of a file can be found in several different locations on a device all of which would need to known about and then adjusted. Fifth, the forensic image will capture aspects of a disk or memory card which may not be immediately visible but which optimise the opportunities to recover deleted data (which may have rendered so deliberately but may also be a function of normal use and over-writing). Lastly, and this relates to subsequent examination and disclosure: if there are accusations that the prosecution have cherry-picked or suffered from “confirmation bias” – only seeing what their expectations have persuaded them to see – the existence of the forensic image means that such mistakes can be corrected.

Smart phones are even more vulnerable to volatility problems than regular computers. When powered up they are constantly awaiting the arrival of regular phone calls and SMS messages; they are also in contact with Internet based data sources either via the phone network or local Wi-Fi services. The data sources include social media and other messaging services, notifications of all kinds, the ability to conduct worldwide web browsing and also to receive numerous updates associated with various apps.  Many phones also can connect to the outside world and acquire data via Bluetooth.  The existence of a GPS signal may also cause data content alterations. Most smartphones have cameras and microphones. 

The forensic image provides essential provenance, authentication and continuity.  It freezes the scene at a particular point in time and forms the basis of any evidence that is produced - and contested - at trial.   The procedures were first written up in the ACPO Good Practice Guide to Computer-based Evidence in the 1980s and have been many times updated since.[3]



Evidence Seizure
Because of the volatility issue it is important that the forensic image is captured at the earliest possible moment. This applies to accuseds, complainants and any third parties (which might include businesses and other organisations as well as individuals).  

The accused will have had little choice but to give up their digital devices.   PACE (Police and Criminal Evidence Act, 1984) section 1 covers powers to stop and search, section 8 empowers a “justice of the peace” to authorise entry and search of premises, sections 19-22  cover powers of seizure and how they apply to computerised information. 

Complainants and third parties will usually have to give their consent unless they too have been the subject of an explicit warrant.  Police requests have to be compliant with the General Data Protection Directive (GDPR) and the UK implementation in the Data Protection Act 2018. Part 3 covers law enforcement processing and we will look at some of the protections shortly.


Evidence Preservation
But is important to stress that the forensic image copy is about evidence preservation and can be separated from the tasks of examination and analysis.  This is an area which could be made much clearer when it comes to dealing with privacy issues.

In practical terms preservation is achieved by the use of a digital fingerprint of the forensic copy image.  This involves the use of a small item of widely-available software – it creates a fingerprint consisting of a string of apparently random characters (a cryptographic hash),  even the slightest alteration in the copy image (or any other file subject to the process) will result in a different fingerprint being created[4].  A subsequent user  of the file simply uses the same digital fingerprint software to check that nothing has changed.  The standard software used to generate forensic copy images also creates a fingerprint at the end of the copy generating process.

It is also worth pointing out that the product of a forensic image copying exercise is not immediately readable without access to specialist software.  The copy is simply a series of initially opaque files.  It is possible, in the case of computer hard disks and memory cards, to use the forensic image copy to create a clone of the original onto similarly-sized hard-disks or memory cards but the more usual practice is to use the practitioners’ specialist tools such as EnCase,  FTK, X-Ways and others to carry out detailed examination.


Triage
At this early stage no one will be able to determine what is going to be relevant. Will any potential evidence be in email, a regular SMS message, the fact that phone calls had been exchanged, records of web browsing, photos, or the use of any of a number of messaging and social media services that eventually proves to be important?

Indeed one of the practical problems faced by the police is that the sheer quantity of devices which may need to be examined can be overwhelming; as a result they have attempted to develop triage systems to prioritise which devices should be seized and which can be safely discarded.

There are significant implications here for the obligation to disclose. The last thing a police investigator or prosecutor wants is that a defendant says that important potential evidence has never been captured or not captured in a timely fashion and is now lost. Defence counsel will then ask the judge to rule that a fair trial cannot now take place.  We will consider practical disclosure to the defence in a little while.



Examination and Analysis 
It is only when actual examination commences that the privacy of the owner and others of a specific device becomes an issue. In the first instance the examiner of a forensic image copy will have complete access to everything on that device. In practice, partly because of the operation of the law (which we will discuss shortly) but mostly because of the sheer volume of material,  the examiner will be guided by the steer given by whoever is the officer in the case (OIC).  The OIC will have initial expectations and suspicions and will communicate that to the examiner. The examiner will then use experience to determine which parts of the original device will be subject to scrutiny.

The most commonly used specialised digital forensics analysis tools offer the examiner an integrated environment in which to view files of all kinds including their metadata and to carry out complex searches. Typically the contents of an entire device are indexed so that the result of search requests are obtained almost instantly; often, where several devices have been seized, a master index – and hence a search capability – can operate over all the devices simultaneously. Some of the programs even allow examiners to devise their own specialised search procedures – to identify credit card numbers or particular glossaries of terms as used by particular communities of criminal. Use is also made of databases of hashes (digital fingerprints) of known “bad” files such as those associated with the sexual exploitation of children and terrorism.

There is no point in denying that accidental viewing of material that most people would consider private and which turn out to be wholly irrelevant to an investigation can and does take place. I can recall a computer seized from a suspected dealer in firearms; photos were scrutinised on the basis that there might be pictures of the accused with guns – what I found were pictures of him having sex (of an entirely normal kind) with a woman who was either his wife or regular partner. I can remember another case in which someone was accused as a co-conspirator in a gold bullion hijacking; he was found not guilty but his computer showed that he was a heavy user of escort services, which he might have wished to have concealed from his wife. In both these and other cases no reference was made to these “private” activities and in each instance I suspect that only I and one other person, another examiner, was ever aware of their existence.

Perhaps I should also say that digital forensic examiners are highly unlikely to know the owners of the devices they are asked to investigate, or any of their acquaintances. One aspect of “privacy” must be concern that private and confidential information becomes known to the world at large or at least to acquaintances.

In practice actual examination in a non-urgent case may not take place for some considerable time. This is a function of the lack of funding and resource available for digital forensics. "Urgent" cases will include situations where there is a threat to life, where there is an ongoing investigation and where co-conspirators may need to be identified and future events forecast and perhaps forestalled. Another “urgent” element may be that legislation requires that an accused is charged or brought before court within a particular timeframe. Almost everything else is “non-urgent”; that will include situations involving rape or collections of pictures of the sexual exploitation of children and where no further harm is expected. Delays of six or more months are not uncommon.  This can cause considerable distress.  Digital forensic examiners in the publicly funded criminal justice system are poorly paid compared with their colleagues who specialise in civil disputes or have transferred their skills into more general aspects of cyber security.  Even the leading digital forensic units, those that deal with the most interesting cases, have an unfortunately high turnover of staff.

Another limitation on privacy violation is that the rooms in which digital examinations take place are normally subject to strict physical access controls.  Non-specialist staff are present only by invitation.  The main reason for this is that a frequent task in digital forensics involves indecent images of children.  There is a strict liability offence of possession.  – s 160 Criminal Justice Act 1988 – where the onus is on an accused to say that they can benefit from a very limited range of defences.  There are also offences of “making” and “distributing” which also can be said to take place during an investigation.  Examiners have to rely on the protections available under s 46 Sexual Offences Act 2003[5] The response is to make examination rooms secure against unauthorised viewing.  It may be some comfort to those whose digital devices are being examined that free-for-all passing round of private information in a police canteen is very strongly discouraged by law.


Legal constraints on examination
The contents of a personally-owned computer or smartphone or a personally-run cloud service counts as personal data for the purposes of Data Protection legislation and although there are exceptions and provisions for law enforcement there are also significant limits.  Almost certainly one of the drivers for the production of the Consent Form referred to at the beginning was the need to comply with the Data Protection Act 2018 (DPA 2018).  This is the UK implementation of the General Data Protection Regulations (GDPR).  Police count as data controllers and data processors.

Detailed coverage of the law is beyond the scope of this blog, but Part 3 of DPA 2018 covers “law enforcement processing”.  Section 31 describes “law enforcement purposes” as “the prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties, including the safeguarding against and the prevention of threats to public security.” Part 3 Chapter 2 sets out six principles, the first of which states that processing must be strictly necessary for a specified law enforcement purpose  and that an appropriate policy document must be in place.  Other principles include that data processed  “must be adequate, relevant and not excessive in relation to the purpose for which it is processed.(third principle).  The fifth principle says that such data “must be kept for no longer than is necessary for the purpose for which it is processed.”  The sixth principle says: “that personal data processed for any of the law enforcement purposes must be so processed in a manner that ensures appropriate security of the personal data, using appropriate technical or organisational measures (and, in this principle, “appropriate security” includes protection against unauthorised or unlawful processing and against accidental loss, destruction or damage).”

Section 45 within Part 3 of DPA 2018 refers to the circumstances of the right of access by the “data subject” – in this instance the owner of digital device including a smartphone.  Other sections include the right to rectify. 

Part 6 of the Act covers the powers of the Information Commissioner to enforce the law including powers of inspection and the issuing of penalties.  There are also criminal offences, detailed in sections 170-173 and sections 196 and 197.  Section 200 requires the preparation and alteration of Codes of Practice issued under the Police and Criminal Evidence Act 1984.

Nearly all of these same limitations apply to those forensic examiners instructed by defence lawyers.

One area of ambiguity in the current law is the situation where a device is being examined for one specific purpose but there are indications of other separate and unexpected crimes;  an example would be: during an examination of a device on suspicion of fraud there are also photos of children being sexually exploited.  One would expect the police to investigate with a view to  prosecution in these circumstances,  but surely examiners and police should be declaring a separate investigation and requesting additional authority to pursue. 


Kiosks
The use of mobile phone extraction kiosks is a potential weakness as far as privacy is concerned.  They were developed to meet the twin challenges of ever-increasing quantities of devices and data to be examined and the shortage of highly skilled human examiners. They offer a first-stage downloading and examination facility which can be used after relatively modest levels of training. Police officers have to select from a limited number of physical connectors and thereafter the kiosk is able to detect the type of phone and provide opportunities for downloading standard types of data such as phone call records, SMSs, photos, substantive files, and the popular messaging and social media services.  The typical kiosk is plug-and-play.

As is implied by the name everything is encapsulated in a simple box/kiosk. Most of the major suppliers of professional digital forensic analysis software also provide kiosks, for example:  https://www.cellebrite.com/en/platforms/#kiosk;  https://www.msab.com/2015/07/06/introducing-new-msab-kiosk-mk2/https://www.magnetforensics.com/products/magnet-axiom/.

One of the key drivers for kiosk deployment has been the challenges of decision-making in triage – how does a police officer during a search decide which of many digital devices should be seized and how many can safely be left behind?

There are several concerns. Although kiosks are designed for first-stage activities lack of police resource and training can mean that they may be the only form of digital forensic analysis and there is no second stage carried out by a trained and experienced analyst.  (Increasingly digital forensic processes are supposed to be accredited and conform to international standards but the speed of change in the technologies and applications used in mobile phones means that the software in the kiosks needs constant updating; as a result appropriate levels of testing for reliability are likely to be bypassed in order to meet operational needs). A related problem is that important evidence may simply not be found.

But the greatest worry is the impact on privacy. The procedural and legal constraints referred to above as they apply to more traditional forms of digital forensic examination may simply be being bypassed in the interest of operational convenience. The important separation of data preservation from data examination can easily be lost. At the same time there seems to be only a limited opportunity for police officers to follow the data protection obligations of limiting their examination to material which is strictly relevant to a specific purpose. In effect users of such kiosks have unlimited access to very large aspects of the contents of a smart phone.

It is currently unclear to me what remedial and restricting steps are being taken by police forces to develop privacy-aware protocols for the use of kiosks.



Disclosure to Defence
As we have seen CPIA 1996 imposes an obligation on the prosecution to disclose material collected in the course of an investigation which might undermine prosecution arguments and assist the defence. This is material which is referred to as “unused”, as opposed to material which is formally served on the defence and the court as part of the prosecution case.   Among other things section 23 of the Act requires that there should be a detailed Code of Practice[6]. 

What is disclosed will depend heavily on the defence case statement.  Section 6A (introduced in 2003) says:

(1) For the purposes of this Part a defence statement is a written statement—
(a) setting out the nature of the accused’s defence, including any particular defences on which he intends to rely,
(b) indicating the matters of fact on which he takes issue with the prosecution,
(c) setting out, in the case of each such matter, why he takes issue with the prosecution,
(ca) setting out particulars of the matters of fact on which he intends to rely for the purposes of his defence,
(d) indicating any point of law (including any point as to the admissibility of evidence or an abuse of process) which he wishes to take, and any authority on which he intends to rely for that purpose.

In terms of digital evidence there may be a case for producing only a selection of files and in so doing the supporting privacy by limiting material to what is “relevant”. However if a defence case statement suggests that there may be files and other material which the prosecution have failed to consider or if there is a concern that there has been tampering of evidence or that the actual process of data preservation has been faulty then it is highly likely that the defence will demand access to an entire forensic image copy.  A forensic examiner may also claim that full analysis is only possible using the same or similar tools to those deployed by the prosecution and that as a result access to the full image copy is the best way for the quality and reliability of prosecution work can be tested.   It is not practically possible to redact parts of a forensic image copy.  The actual details of the extent of disclosure may be the subject of negotiation between prosecution and defence and, where they are unable to agree, they may need to take their arguments before the court.

Once in the hands of the defence a number of legal restrictions apply. Sections 17 and 18 of CPIA 1996 impose a duty of confidentiality,  breach of which is a contempt of court.  In effect material can only be referred to as it applies to specific charges that a defendant faces or if there is a likelihood of further charges.  Thus although a defence lawyer and people appointed by him to examine a forensic image copy may be able to see private but irrelevant material they cannot use it or refer to it.

Defence lawyers  are data controllers for the purpose of the Data Protection Act 2018 and examiners / experts they employ are data processors and possibly also data controllers.  As a result all the restraints referred to above as they apply to police investigations of digital devices apply to them also. 


Rape cases
Further legal restrictions apply in rape cases. The main protections are in the CPS Manual:  https://www.cps.gov.uk/legal-guidance/rape-and-sexual-offences-chapter-4-section-41-youth-justice-and-criminal-evidence

The specific safeguards are in the Youth Justice and Criminal Evidence Act 1999.  This is the Act which among other things provides for “ Special measures directions in case of vulnerable and intimidated witnesses”  (Part 2 Chapter 1). Section 41 provides for “restriction on evidence or questions about complainant’s sexual history”.   An accused – and their lawyer – can only introduce such evidence by the permission of the court.  The court will hear such arguments before trial,  often in a preliminary hearing , and a judge will limit the evidence to specific circumstances.  Section 41 (5)(b) says that if used the sexual history must go “no further than is necessary to enable the evidence adduced by the prosecution to be rebutted or explained by or on behalf of the accused.”

A particular feature of rape trials is that in the absence of third-party witnesses or indications of adverse physical impact on bodies or clothing, the court is faced with one person's word against another. This,  in addition to possible lack of sympathy on the part of investigating police officers, is why so few complaints mature into convictions.  A jury has to be "sure" - beyond a reasonable doubt - to find guilt.  Smartphone and other digital evidence has the potential to provide additional elements for a jury to take into consideration. 

What happens in practice
The law is clear enough and, with some exceptions so are the procedures. However as the reports of the Commons Justice Select Committee Disclosure of evidence in criminal cases (H859) in 2018 and the Lords Science and Technology Select Committee Forensic science and the criminal justice system: a blueprint for change (HL paper 333) in 2019 make clear resources for digital examination are under severe strain. 

Indeed the entire criminal justice system is under severe strain. Overall expenditure on forensic science in general was £120m in 2008 but only £50m in 2019, of which the police budget was £12.3m.  The National Audit Office estimates that between 2010/11 and 2018/9 overall police funding fell by 19%[7].  The Crown Prosecution Service staff budget fell from £738M in 2010-11 to £291m in 2015-16, a cut of more than 60%.  Cuts to legal aid funding (via the Legal Aid Authority) were introduced in the Legal Aid, Sentencing and Punishment of Offenders Act 2012 (LASPO).  One impact is that defendants must now pass a means test before legal aid is available.   Both barristers and solicitors have suffered cuts in fees with the result that many lawyers are leaving the publicly-funded sector. The Commons Justice Select Committee made this the subject of another report:  Criminal Legal Aid (HC1069).

The best-selling book The Secret Barrister: Stories of the Law and How It's Broken turns these statistics into a lively but dispiriting description of what happens in practice.

My own experience and, apparently that of witnesses to the various select committee reports,  is that the most important cases are dealt with properly by police,  CPS and prosecutors.  They are high profile,  likely to be reported in the media and will attract the attention of the best legal professionals.  The real difficulty is in “routine” Crown Court trials.  None of these cases are of course routine to complainants or, for that matter, to defendants.  But they are to the police, the Crown Prosecution Service,  prosecution and defence lawyers and forensic examiners.

It is here where the pressures of lack of funding result in deficiencies in the handling digital evidence and the unnecessary violation of privacy are most likely to appear and where remedy may be most difficult.


Proposals
These are some areas that merit attention:
  1. Separate out data preservation from data examination.  Data preservation can be justified to ensure that the "scene" on a digital device at a particular point in time in reliably frozen.  Individuals who are anxious about a "digital strip search" can be re-assured that data examination will only take place later against defined purposes.  This may require changes to the substantive law and will require revised Codes of Practice and procedures under the Police and Criminal Evidence Act, 1984 and elsewhere.
  2. Where digital devices have been seized in the course of a legitimate search of premises or street stop-and-search, an additional authorisation should be required before examination of digital devices take place, though without prejudice to the need to preserve data (see above).  In emergency situations authorisations could be obtained after the event.  Where an examination reveals the possibility of separate and unexpected wrong-doing, further additional authorisation,  accompanied by explanations and justifications should be required. 
  3. The precise mechanism for authorisation to examine is up for discussion.  One route would be to require that a senior police officer of appropriate rank and not otherwise involved in an investigation authorises after evaluating explanations why a data examination is necessary.  This in part echoes the procedures that were until recently used for law enforcement to obtain retained communications data (who called who, when, for how long and in the case of cellphones,  approximate location but not the content of a call) from telephone companies and Internet Service Providers. (Under Parts 3 and 4 of the Investigatory Powers Act, 2016).  Alternatively there could be a separate body to authorise,  similar to the Office for Communications Data Authorisations (OCDA) which is a sister organisation to the Investigatory Powers Commissioner's Office (IPCO).
  4. Review the current Consent Form covering voluntary provision of digital devices
  5. Set a clear policy that data from digital devices will only be retained for so long as there is a proven need.  Make inspections for compliance a function of the Information Commissioner's Office (ICO). 
  6. Review the circumstances in which digital forensic kiosks are used
  7. Develop further awareness training for police,  CPS staff and lawyers
  8. Examine the budgets available for access to digital forensic expertise, both those employed in the public criminal justice system and those in private practice. 
  9. Review the advice published to assist victims of rape.  At the moment victims are told to preserve evidence on clothing, to refrain from washing and what to expect from a medical inspection. The advice also ought to include references to smartphones and other digital evidence.

1. 






[3] ACPO,  Association of Chief Police Officers,  is the predecessor of NPCC.
[4] Examples of hashing software include MD5 and the SHA family.
[7] HC1501, September 2018

Thursday, 14 June 2018

Can artificial intelligence solve the criminal disclosure problem?



Here is the problem: digital evidence is of increasing importance in a very wide range of criminal investigations because so much of our lives is being recorded on smart phones, tablets, personal computers and the large system is owned by financial institutions, transport companies and the like. Digital evidence can indicate our location (if we were at any specific place at a specific time or if we were not), our Internet activity, photographs we have taken or had taken of us, who our friends are and how often we contact them, our financial transactions, even our thoughts and attitudes.

That’s why law enforcement officers are keen to seize digital devices from perpetrators, victims and third parties. In order for there to be a fair trial most countries have rules about disclosure, also referred to as discovery. The principle is that a defendant should have an opportunity to review not only the evidence that is adduced against him (or her) but anything else that might have been collected during the investigative process and which might influence the outcome of a trial. In most countries the test is “relevancy” and if necessary defence lawyers will apply to the court for appropriate orders. In the UK the position is rather different: the prosecution has a duty to disclose any material reviewed during an investigation and to disclose it to the defence if it undermines a prosecution case or might assist a defence case. The law dates from 1996 – the Criminal Procedure and Investigations Act (CPIA).

The law was introduced because there had been a number of trials in which crucial material was withheld and miscarriages of justice had occurred. But the law is still not working perfectly and a select committee of the House of Commons is currently reviewing it. (https://www.parliament.uk/business/committees/committees-a-z/commons-select/justice-committee/inquiries/parliament-2017/disclosure-criminal-cases-17-19/) This blog is stimulated by some of the things that are being said in front of that committee.

As soon as anyone starts to examine digital evidence from modern devices they will discover the vast number of files, messages, photos and so on that exist even on the most modestly used smart phone or personal computer, tens and hundreds of thousands. In a typical home there may be seven or eight digital devices that are likely to hold material which ought to be examined. It is difficult enough for a law enforcement investigator to go through all these devices simply to find evidence to support a complaint or suspicion. But the current law of disclosure requires them additionally to look for material which might – undermine their case or support a defendant’s.

Some people hope that “artificial intelligence” will either solve the problem or at least address it. See, for example, the 2017 “ State of Policing” Report by Her Majesty’s Chief Inspector of Constabulary  How far are these expectations likely to be fulfilled?

Digital investigators certainly use significant computer aids but very few of these can really be labelled “artificial intelligence”. The analysis suites they use typically are able to: make a safe forensic copy of the contents of a computer or smartphone, extract obvious potential sources of evidence such as emails, text messages, social media postings, histories of Internet browsing, lists of file downloads and substantive files. Graphics, photo and video files can be viewed in a gallery. The entire contents can be indexed and not only the substantive files but associated time and date stamps and other meta data (additional embedded data associated with Microsoft Office and photo files, for example). Once indexed the investigator can then search for files by combinations of keywords and time and date.  The keywords may be specific to a particular case or maybe generic to types of cases – for example in child sex cases words such as “Lolita”, “teen”, “7yo” and its variants and “asparagus”.  More advanced software allows the investigator to examine files at the bits and bytes level, to analyse hidden operating system features such as the Windows registry and also to interrogate a hard disk directly – these procedures may be necessary when some new product hits the IT market and becomes widely used. The most advanced software even allows the well-trained investigator to create their own procedures, for example to look for things which might be bank account details, credit card credentials, username and password combinations and so on. Increasingly too the software allows examinations to span several different digital devices so that an integrated view of the actions of a person of interest can be examined even if conversations took place using, for example, email, text messages, and social media postings. Separate software can be used to scan an entire hard disk or storage medium for files which have previously been identified as “bad” – child pornography, terrorist material, pirated intellectual property and so on. It does this by using file hashes, aka digital fingerprints – there are databases of file hashes and every time a file is encountered on a hard disk a file hash is created and compared against the database.

But none of this involves artificial intelligence, although this phrase is rather vague and covers a number of different techniques. More properly we are talking about “machine learning”. In machine learning a quantity of unsorted data – files, statistics, graphics – is offered to a program which is capable of deriving rules about that data. Once the rules have been discovered, a feat which may be beyond most humans, they can be applied to further similar unsorted data in order to make predictions or find conclusions. In the health field, given enough medical data, it may be possible to identify commonalities in diagnosis or treatment.  In one form of predictive policing  data can be collected about callouts for police vehicles to respond to incidents. A machine learning program can find rules which in turn can be used to point to situations where and when incidents are more likely to happen so that response teams can get to them more quickly. A travel company with aircraft can monitor over the period of the year the types of meal passengers ask for and thereafter be able to predict with greater accuracy how many meals of each type are loaded onto each flight so that every passenger gets what they want - meat, fish, vegetarian – so that there is less wastage.

There are, however, weaknesses which should not be underestimated. The first of these is the quality and quantity of the training material offered to the program. If the training material is not representative of what you hope to predict results will be poorer. The larger the quantity of material the greater the chance that accurate rules will be derived. Secondly some material is more difficult to parse than others – in the example above of police deployments the data will be unambiguous and clear;  but reliance on informal conversations will be quite another matter. Another form of predictive policing - trying to spot which individuals will turn "bad"- will depend on the observations and judgements of observers, which will inevitably have  inconsistencies.  Third, anyone wishing to deploy machine learning has to look to the possibility of bad outcomes – false and negative positives – where a prediction from machine learning gives a misleading result. A bad outcome in terms of an airline not having the right food on board is one thing but the arrest of a person who turns out to be innocent is quite another.

The main relevant instance of machine learning in disclosure occurs in civil, as opposed to criminal, disclosure. In the civil procedure claimants and defendants are expected to disclose to each other material which might undermine their own case or support that of their opponent. (Civil Procedure Rules Part 31). This is the same test as is applied in the criminal procedure but of course the circumstances are different; in a civil case a dispute exists between parties of roughly equal status (at least in theory) whereas in a criminal case it is the state which charges an accused with a criminal offence and with the possible outcome of loss of liberty and reputation.

In a typical civil case between companies the amount of material that needs to be considered for disclosure can often be enormous – all the emails and substantive documents created by several key individuals over a lengthy period, for example.  Originally the assumption was that lawyers on both sides would carry out a disclosure review manually. But much material will of course be in electronic format and over the years a standard questionnaire has evolved – the ESI Questionnaire. It comes in Practice Direction 31B which is part of Civil Procedure Rule 31. Overall it covers such topics as “reasonable search”, agreements on the format in which files are to be delivered and keyword and other automated searches. The courts may force the parties into an agreement – on the basis that they both have a duty to control costs. But even this type of ESI questionnaire has proved insufficient for the larger cases and resort is now made to the form of artificial intelligence known as machine learning.

Adopting this to disclosure/discovery, the parties to a civil dispute agree to provide a selection of types of document which they believe are likely to meet a disclosure requirement. The machine learning program produces rules defining those documents and the rules are then applied to the much larger archives of documents the parties hold. The parties agree that they will accept the outcome of this machine learning enabled activity. They do this because any more exhaustive form of review is likely to incur crippling expense. Lawyers refer to this as Technology Aided Review or predictive coding.  More detail on how this should work and the judgments a court might make appear in Triumph Controls UK Ltd & others v Primus International Holding Co & another [2018] EWHC 176 (TCC).  A number of companies offer supporting products. The important thing to recognise is that the parties consent to the process.

But will this work for criminal discovery? The first thing to note there is no court mandated requirement to keep costs down. It is up to the prosecution to decide how much to invest in order to support the charges they wish to bring. Secondly, as we saw above, the situation is not dispute resolution but an accused’s potential loss of liberty. There is no mutual consent . Thirdly we need to consider how machine learning-supported criminal disclosure might work in practice.  Who is to provide the documents which the AI programme is to learn from, or take training?  At the moment a defendant is required to produce a Defence Case Statement under ss 5 and 6 CPIA 1996 but all that is required is to set out the general nature of the defence, matters of fact in which there is an issue, the identity of any witness who might be able to provide an alibi and any information in an accused’s possession which might be of material assistance in identifying further witnesses. But they don’t have to produce sample documents and also, given the disparity in resources between the police/CPS and most defence solicitors it is not at all clear how easily most criminal defence solicitors would be able to facilitate the process. The solicitor may indeed require the support of an expert but it is also not clear whether legal aid for this activity would be forthcoming.

Or is it the hope that one can produce a generic set of rules to cover a wide range of disclosure situations? That seems perilously close to the joke widely shared by digital forensic technicians when confronted with an item of analytic software – where is the “find evidence” button? (One vendor went as far as producing a stick-on key for a keyboard imprinted with the words “find evidence”).
One can have nothing but sympathy for police and prosecutors in seeking aids to reduce the burden of criminal disclosure. But a combination of desperation to reduce costs and the exagerated claims of software salesman can lead to wasted money and disappointed expectations. We have seen this with image recognition – image recognition may work well in the limited circumstances of providing access control to a smartphone or for entry to corporate premises but produces poor results when used in the challenging environments of carnivals and other instances of public order.

Almost certainly the remedy to criminal disclosure of digital material is the provision at an early stage of entire forensic images to defence solicitors who wish to employ their own experts. Defence experts, informed by defendants, can then use keyword search and similar software both to verify the work of prosecution experts and to produce, always supposing that it is there to be found, exculpatory material. I have explored this approach both in my evidence to the recent enquiry by the House of Commons Justice Select Committee (Https://Goo.Gl/Qkhxf3) and in another blog (https://goo.gl/rDMwK5 - you may need to scroll down). 

Saturday, 19 May 2018

Disclosure of Digital Evidence in Rape Trials 



This note arises from a hearing by the Commons Justice Select Committee on Disclosure of Evidence in Criminal Trials on 15 May 2018. A transcript is available at: http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/justice-committee/disclosure-of-evidence-in-criminal-cases/oral/83096.pdf and a video at https://www.parliamentlive.tv/Event/Index/13d15d6a-8aa9-40ce-bdf2-3d19777b3af8

 Digital forensics practice requires that the entire contents of a personal computer or smart phone be forensically copied and then analysed; the concern is that if all of this material is provided to the defence it will be used for aggressive cross examination about a complainant’s previous sexual history. 

For a quarter of a century it has been the practice when dealing with evidence from digital devices such as personal computers that a “forensic copy” is made of the device at as early an opportunity as possible. (The procedures have been updated to deal with smart phones and acquisition from cloud-based services). This is done the several reasons. First, it provides an explicit physical link between a device and the person responsible for it so that there can be attribution of its contents. Second, direct examination of a device is highly undesirable because in the course of it data will get altered; the procedures for making a forensic copy avoid this and in fact all examinations take place on the copy not the original. Third, it is all too easy for individual emails, social media postings, webpages, photographs, et cetera to be subject to forgery. But it is extremely difficult to forge an entire hard disk or memory of a phone. The operating systems create file date and time stamps and many other alterations all the time and it is easy to spot tampering. The forensic image thus provides essential provenance, authentication and continuity .

This procedure is for the benefit of all types of evidence that might be adduced from these sources and for the benefit of both prosecution and defence. In a rape trial, along with any other case, the prosecution may wish to rely on digital evidence as well. In case you are asking yourself – can't they redact the forensic image? The answer is not really, given the technical complexity of the task (existence of temporary back-up files, caches, registry entries etc). The issue was examined extensively in the context of legal professional privilege. There the solution is that an independent lawyer is appointed to identify material which should be redacted.

Turning now to the defence position, the availability of a digital image makes it very difficult for the prosecution to cherry pick evidence. The cherry picking may be deliberate, the result of poor training, or simply “confirmation bias”. The role of the defence is to see if this has taken place. It was these concerns that triggered the current enquiry. The enquiry by the Justice Select Committee is about, among other things, the mechanics of disclosure. Because of the quantity of data to be examined it is unrealistic to expect a prosecution expert or technician to carry out an exhaustive examination of all the devices that might have been seized. This plainly creates a problem for the disclosure regime as it is normally understood and where there is a responsibility to identify material which may undermine the prosecution case or support the defence case. In my evidence to the committee I said the solution is to make available to the defence copies of all the forensic images that have been created by the prosecution. It is then open to a defence expert to use tools very similar or identical to those used by the prosecution to carry out the instructions of a defence lawyer. This surely satisfies the aims of disclosure in every practical respect.

There are protections against abuse of disclosed material, specifically sections 17 and 18 of the Criminal Procedure and Investigations Act 1996. There is a criminal offence involved and even if there were not there is still the possibility of contempt of court. (Yes, in the course of examining digital devices I do see information which the owners would regard as private and highly personal but which is also wholly irrelevant to the subject matter of charges. I don’t even share these with instructing lawyers).

 Let us now look at the position of what happens in rape trials, an issue extensively canvassed by subsequent witnesses. The main protection is discussed in the CPS Manual: https://www.cps.gov.uk/legal-guidance/rape-and-sexual-offences-chapter-4-section-41-youth-justice-and-criminal-evidence. References is also made to Criminal Procedure Rule 22 (https://www.justice.gov.uk/courts/procedure-rules/criminal/docs/2015/crim-proc-rules-2015-part-22.pdf).  (I am fully aware of and sympathetic with concerns that defence lawyers from time to time abuse rape victims in the witness box by asking aggressively about previous sexual history. But it seems to me that if the procedures laid down under s 41 Youth Justice and Criminal Evidence Act 1999 and CPR 22 are inadequate the remedy is to reform that part of the law and the linked judicial guidance rather than to take steps which would make digital evidence significantly less reliable. It may also be the case that inadequate funding for the police and CPS mean that the right applications are not made to the court in a timely fashion.