Peter Sommer: Digital Evidence and Policy: 2018

Thursday 14 June 2018

Can artificial intelligence solve the criminal disclosure problem?

Here is the problem: digital evidence is of increasing importance in a very wide range of criminal investigations because so much of our lives is being recorded on smart phones, tablets, personal computers and the large system is owned by financial institutions, transport companies and the like. Digital evidence can indicate our location (if we were at any specific place at a specific time or if we were not), our Internet activity, photographs we have taken or had taken of us, who our friends are and how often we contact them, our financial transactions, even our thoughts and attitudes.

That’s why law enforcement officers are keen to seize digital devices from perpetrators, victims and third parties. In order for there to be a fair trial most countries have rules about disclosure, also referred to as discovery. The principle is that a defendant should have an opportunity to review not only the evidence that is adduced against him (or her) but anything else that might have been collected during the investigative process and which might influence the outcome of a trial. In most countries the test is “relevancy” and if necessary defence lawyers will apply to the court for appropriate orders. In the UK the position is rather different: the prosecution has a duty to disclose any material reviewed during an investigation and to disclose it to the defence if it undermines a prosecution case or might assist a defence case. The law dates from 1996 – the Criminal Procedure and Investigations Act (CPIA).

The law was introduced because there had been a number of trials in which crucial material was withheld and miscarriages of justice had occurred. But the law is still not working perfectly and a select committee of the House of Commons is currently reviewing it. (https://www.parliament.uk/business/committees/committees-a-z/commons-select/justice-committee/inquiries/parliament-2017/disclosure-criminal-cases-17-19/) This blog is stimulated by some of the things that are being said in front of that committee.

As soon as anyone starts to examine digital evidence from modern devices they will discover the vast number of files, messages, photos and so on that exist even on the most modestly used smart phone or personal computer, tens and hundreds of thousands. In a typical home there may be seven or eight digital devices that are likely to hold material which ought to be examined. It is difficult enough for a law enforcement investigator to go through all these devices simply to find evidence to support a complaint or suspicion. But the current law of disclosure requires them additionally to look for material which might – undermine their case or support a defendant’s.

Some people hope that “artificial intelligence” will either solve the problem or at least address it. See, for example, the 2017 “ State of Policing” Report by Her Majesty’s Chief Inspector of Constabulary How far are these expectations likely to be fulfilled?

Digital investigators certainly use significant computer aids but very few of these can really be labelled “artificial intelligence”. The analysis suites they use typically are able to: make a safe forensic copy of the contents of a computer or smartphone, extract obvious potential sources of evidence such as emails, text messages, social media postings, histories of Internet browsing, lists of file downloads and substantive files. Graphics, photo and video files can be viewed in a gallery. The entire contents can be indexed and not only the substantive files but associated time and date stamps and other meta data (additional embedded data associated with Microsoft Office and photo files, for example). Once indexed the investigator can then search for files by combinations of keywords and time and date. The keywords may be specific to a particular case or maybe generic to types of cases – for example in child sex cases words such as “Lolita”, “teen”, “7yo” and its variants and “asparagus”. More advanced software allows the investigator to examine files at the bits and bytes level, to analyse hidden operating system features such as the Windows registry and also to interrogate a hard disk directly – these procedures may be necessary when some new product hits the IT market and becomes widely used. The most advanced software even allows the well-trained investigator to create their own procedures, for example to look for things which might be bank account details, credit card credentials, username and password combinations and so on. Increasingly too the software allows examinations to span several different digital devices so that an integrated view of the actions of a person of interest can be examined even if conversations took place using, for example, email, text messages, and social media postings. Separate software can be used to scan an entire hard disk or storage medium for files which have previously been identified as “bad” – child pornography, terrorist material, pirated intellectual property and so on. It does this by using file hashes, aka digital fingerprints – there are databases of file hashes and every time a file is encountered on a hard disk a file hash is created and compared against the database.

But none of this involves artificial intelligence, although this phrase is rather vague and covers a number of different techniques. More properly we are talking about “machine learning”. In machine learning a quantity of unsorted data – files, statistics, graphics – is offered to a program which is capable of deriving rules about that data. Once the rules have been discovered, a feat which may be beyond most humans, they can be applied to further similar unsorted data in order to make predictions or find conclusions. In the health field, given enough medical data, it may be possible to identify commonalities in diagnosis or treatment. In one form of predictive policing data can be collected about callouts for police vehicles to respond to incidents. A machine learning program can find rules which in turn can be used to point to situations where and when incidents are more likely to happen so that response teams can get to them more quickly. A travel company with aircraft can monitor over the period of the year the types of meal passengers ask for and thereafter be able to predict with greater accuracy how many meals of each type are loaded onto each flight so that every passenger gets what they want - meat, fish, vegetarian – so that there is less wastage.

There are, however, weaknesses which should not be underestimated. The first of these is the quality and quantity of the training material offered to the program. If the training material is not representative of what you hope to predict results will be poorer. The larger the quantity of material the greater the chance that accurate rules will be derived. Secondly some material is more difficult to parse than others – in the example above of police deployments the data will be unambiguous and clear; but reliance on informal conversations will be quite another matter. Another form of predictive policing - trying to spot which individuals will turn "bad"- will depend on the observations and judgements of observers, which will inevitably have inconsistencies. Third, anyone wishing to deploy machine learning has to look to the possibility of bad outcomes – false and negative positives – where a prediction from machine learning gives a misleading result. A bad outcome in terms of an airline not having the right food on board is one thing but the arrest of a person who turns out to be innocent is quite another.

The main relevant instance of machine learning in disclosure occurs in civil, as opposed to criminal, disclosure. In the civil procedure claimants and defendants are expected to disclose to each other material which might undermine their own case or support that of their opponent. (Civil Procedure Rules Part 31). This is the same test as is applied in the criminal procedure but of course the circumstances are different; in a civil case a dispute exists between parties of roughly equal status (at least in theory) whereas in a criminal case it is the state which charges an accused with a criminal offence and with the possible outcome of loss of liberty and reputation.

In a typical civil case between companies the amount of material that needs to be considered for disclosure can often be enormous – all the emails and substantive documents created by several key individuals over a lengthy period, for example. Originally the assumption was that lawyers on both sides would carry out a disclosure review manually. But much material will of course be in electronic format and over the years a standard questionnaire has evolved – the ESI Questionnaire. It comes in Practice Direction 31B which is part of Civil Procedure Rule 31. Overall it covers such topics as “reasonable search”, agreements on the format in which files are to be delivered and keyword and other automated searches. The courts may force the parties into an agreement – on the basis that they both have a duty to control costs. But even this type of ESI questionnaire has proved insufficient for the larger cases and resort is now made to the form of artificial intelligence known as machine learning.

Adopting this to disclosure/discovery, the parties to a civil dispute agree to provide a selection of types of document which they believe are likely to meet a disclosure requirement. The machine learning program produces rules defining those documents and the rules are then applied to the much larger archives of documents the parties hold. The parties agree that they will accept the outcome of this machine learning enabled activity. They do this because any more exhaustive form of review is likely to incur crippling expense. Lawyers refer to this as Technology Aided Review or predictive coding. More detail on how this should work and the judgments a court might make appear in Triumph Controls UK Ltd & others v Primus International Holding Co & another [2018] EWHC 176 (TCC). A number of companies offer supporting products. The important thing to recognise is that the parties consent to the process.

But will this work for criminal discovery? The first thing to note there is no court mandated requirement to keep costs down. It is up to the prosecution to decide how much to invest in order to support the charges they wish to bring. Secondly, as we saw above, the situation is not dispute resolution but an accused’s potential loss of liberty. There is no mutual consent . Thirdly we need to consider how machine learning-supported criminal disclosure might work in practice. Who is to provide the documents which the AI programme is to learn from, or take training? At the moment a defendant is required to produce a Defence Case Statement under ss 5 and 6 CPIA 1996 but all that is required is to set out the general nature of the defence, matters of fact in which there is an issue, the identity of any witness who might be able to provide an alibi and any information in an accused’s possession which might be of material assistance in identifying further witnesses. But they don’t have to produce sample documents and also, given the disparity in resources between the police/CPS and most defence solicitors it is not at all clear how easily most criminal defence solicitors would be able to facilitate the process. The solicitor may indeed require the support of an expert but it is also not clear whether legal aid for this activity would be forthcoming.

Or is it the hope that one can produce a generic set of rules to cover a wide range of disclosure situations? That seems perilously close to the joke widely shared by digital forensic technicians when confronted with an item of analytic software – where is the “find evidence” button? (One vendor went as far as producing a stick-on key for a keyboard imprinted with the words “find evidence”).
One can have nothing but sympathy for police and prosecutors in seeking aids to reduce the burden of criminal disclosure. But a combination of desperation to reduce costs and the exagerated claims of software salesman can lead to wasted money and disappointed expectations. We have seen this with image recognition – image recognition may work well in the limited circumstances of providing access control to a smartphone or for entry to corporate premises but produces poor results when used in the challenging environments of carnivals and other instances of public order.

Almost certainly the remedy to criminal disclosure of digital material is the provision at an early stage of entire forensic images to defence solicitors who wish to employ their own experts. Defence experts, informed by defendants, can then use keyword search and similar software both to verify the work of prosecution experts and to produce, always supposing that it is there to be found, exculpatory material. I have explored this approach both in my evidence to the recent enquiry by the House of Commons Justice Select Committee (Https://Goo.Gl/Qkhxf3) and in another blog (https://goo.gl/rDMwK5 - you may need to scroll down).

Saturday 19 May 2018

Disclosure of Digital Evidence in Rape Trials

This note arises from a hearing by the Commons Justice Select Committee on Disclosure of Evidence in Criminal Trials on 15 May 2018. A transcript is available at: http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/justice-committee/disclosure-of-evidence-in-criminal-cases/oral/83096.pdf and a video at https://www.parliamentlive.tv/Event/Index/13d15d6a-8aa9-40ce-bdf2-3d19777b3af8

Digital forensics practice requires that the entire contents of a personal computer or smart phone be forensically copied and then analysed; the concern is that if all of this material is provided to the defence it will be used for aggressive cross examination about a complainant’s previous sexual history.

For a quarter of a century it has been the practice when dealing with evidence from digital devices such as personal computers that a “forensic copy” is made of the device at as early an opportunity as possible. (The procedures have been updated to deal with smart phones and acquisition from cloud-based services). This is done the several reasons. First, it provides an explicit physical link between a device and the person responsible for it so that there can be attribution of its contents. Second, direct examination of a device is highly undesirable because in the course of it data will get altered; the procedures for making a forensic copy avoid this and in fact all examinations take place on the copy not the original. Third, it is all too easy for individual emails, social media postings, webpages, photographs, et cetera to be subject to forgery. But it is extremely difficult to forge an entire hard disk or memory of a phone. The operating systems create file date and time stamps and many other alterations all the time and it is easy to spot tampering. The forensic image thus provides essential provenance, authentication and continuity .

This procedure is for the benefit of all types of evidence that might be adduced from these sources and for the benefit of both prosecution and defence. In a rape trial, along with any other case, the prosecution may wish to rely on digital evidence as well. In case you are asking yourself – can't they redact the forensic image? The answer is not really, given the technical complexity of the task (existence of temporary back-up files, caches, registry entries etc). The issue was examined extensively in the context of legal professional privilege. There the solution is that an independent lawyer is appointed to identify material which should be redacted.

Turning now to the defence position, the availability of a digital image makes it very difficult for the prosecution to cherry pick evidence. The cherry picking may be deliberate, the result of poor training, or simply “confirmation bias”. The role of the defence is to see if this has taken place. It was these concerns that triggered the current enquiry. The enquiry by the Justice Select Committee is about, among other things, the mechanics of disclosure. Because of the quantity of data to be examined it is unrealistic to expect a prosecution expert or technician to carry out an exhaustive examination of all the devices that might have been seized. This plainly creates a problem for the disclosure regime as it is normally understood and where there is a responsibility to identify material which may undermine the prosecution case or support the defence case. In my evidence to the committee I said the solution is to make available to the defence copies of all the forensic images that have been created by the prosecution. It is then open to a defence expert to use tools very similar or identical to those used by the prosecution to carry out the instructions of a defence lawyer. This surely satisfies the aims of disclosure in every practical respect.

There are protections against abuse of disclosed material, specifically sections 17 and 18 of the Criminal Procedure and Investigations Act 1996. There is a criminal offence involved and even if there were not there is still the possibility of contempt of court. (Yes, in the course of examining digital devices I do see information which the owners would regard as private and highly personal but which is also wholly irrelevant to the subject matter of charges. I don’t even share these with instructing lawyers).

Let us now look at the position of what happens in rape trials, an issue extensively canvassed by subsequent witnesses. The main protection is discussed in the CPS Manual: https://www.cps.gov.uk/legal-guidance/rape-and-sexual-offences-chapter-4-section-41-youth-justice-and-criminal-evidence. References is also made to Criminal Procedure Rule 22 (https://www.justice.gov.uk/courts/procedure-rules/criminal/docs/2015/crim-proc-rules-2015-part-22.pdf). (I am fully aware of and sympathetic with concerns that defence lawyers from time to time abuse rape victims in the witness box by asking aggressively about previous sexual history. But it seems to me that if the procedures laid down under s 41 Youth Justice and Criminal Evidence Act 1999 and CPR 22 are inadequate the remedy is to reform that part of the law and the linked judicial guidance rather than to take steps which would make digital evidence significantly less reliable. It may also be the case that inadequate funding for the police and CPS mean that the right applications are not made to the court in a timely fashion.

Saturday 3 February 2018

How to manage Internet Content Blocking: some practicalities

“The Internet contains some deeply troublesome and harmful material. The main commercial players are both immensely rich and immensely clever – they must be able to do more to find solutions. If they don’t we politicians will prosecute/fine/tax them until they behave responsibly.” So goes the refrain but what can we reasonably expect of the available technologies? Here is a guide for campaigners.

Issue 1: what criteria are you applying for blocking “undesirable” material?

To those who haven’t thought about the issue it seems obvious what needs to be blocked. Almost anyone other than the most extreme libertarian will point to material which they find distressing or harmful – and be able to produce justifying arguments. But if you are asking a computer program or a human being to make decisions there has to be greater clarity. In almost all circumstances there will be countervailing arguments about freedom of speech, freedom of expression and censorship.

The easiest policy to implement is where one can point to existing legislation defining specifically illegal content. For example in the United Kingdom possession of indecent images of children is a strict liability offence[i]. Published guidelines from the Sentencing Council describe in detail three levels of offence in terms of age and specific activities[ii]. Similarly “extreme pornography” is clearly defined – essentially animals, dead people and absence of consent[iii]. But outside that particular context there is no definition of “extremism” still less of “harmful”.[iv] Successive would-be legislators have struggled because so often the appearance of a particular document or file depends not only on its content but on its context.

A simple example: let’s take two statements: “the state of Israel is a theft from Palestinians” and “the state of Israel is entitled to occupy all the territories mentioned in the Bible”. Are these statements, which many people would label “extreme”, simply expressions of history and religious belief? Do we have a different view of them if they are accompanied by a call to action – push all the Jews out, push out all the Arabs? The boundaries are unclear and it seems unreasonable that if legislators are unwilling to provide assistance that somehow Internet companies should be forced to make those decisions. There is a separate further issue for the biggest of the global companies in that judgements about extremism and harmfulness vary across jurisdictions and cultures.

It gets more difficult with “grooming” whether for a sexual purpose or to incite terrorist acts. The whole point of grooming is that its starts low key and then builds. It is easy enough to identify grooming after a successful exercise[v] but how do you distinguish the early stages from ordinary conversation? And how do you do so via a computer program or a human monitor?

Finally, it is even more difficult to think what the evidence would look like where the enforceable law simply says: social media sites should keep children safe.

Issue 2: what is the legal framework within which material gets uploaded?

Material gets uploaded to the Internet via a variety of legal frameworks and this has an impact on where potential legal enforcement can be directed. An individual might buy web space from an Internet service provider and create their own website. That same individual may provide facilities for third parties to post comments which will then be automatically instantly seen by all visitors. A social media service will almost certainly require a specific sign up from their subscribers/members and at that time inform them of an “acceptable use” or “community standards” policy but will thereafter allow postings without prior approval or initial restraint.

The position currently taken by most Internet service companies, bolstered by various directives and laws is that they are not publishers in the same sense as traditional media such as newspapers magazines and broadcast television stations. They say that they are providing facilities but are not editors. Or that they are “data processors” as opposed to “data controllers”[vi]. The claim is that they are “intermediaries” for the purpose of the E-Commerce Directive and Regulations. These arguments are currently being hotly debated. But even under their interpretation there is a significant impact on what one can reasonably expect them to do in terms of attempting to block before publication.

The main business of Google is to index world wide web content which has been originated by others with whom it has no contractual relationship. It has a series of “crawler” programs which scavenge the open part of the World Wide Web; the findings are then indexed and that is what visitors to Google’s main pages see. The contractual relationship that is most important in the basic Google framework is with those who use the indexes – essentially the service is paid for by allowing Google to harvest information about individuals which can be turned into targeted advertising. But Google is not under any compulsion or contractual obligation to index anything; it can block at will. The main policy reason for refusing to block is that it has decided that it favours completeness and freedom of speech and expression; it blocks only when there is an overwhelming reason to do so.

By contrast for Facebook, Twitter, and many similar services the contractual relationship is with their customers/subscribers/members. It is consists of saying “we will let you see what others have posted and we will let you post provided you will allow us to harvest information about you and send you targeted advertisements”. As part of the contract there is usually an Acceptable Use or Community Standards provision which are the basis for blocking. But here again as companies headquartered in the United States they are concerned about observing First Amendment rights[vii].

There are important differences in terms of what one can expect if some of this material is to be blocked. In the case of Google they have no opportunity to prevent material from being uploaded; the earliest point at which they could intervene is when their crawler comes across material which has already been published. Their choice is to refuse to index. But for the social media sites and where the acceptable use policy is part of the customer agreement the earliest opportunity for blocking is when the customer uploads material.

Issue 3: technical means for blocking material (a) that that has already been identified as “undesirable”.

We must now look at the various blocking technologies and see how far they are practical to implement. There is a significant difference between situations where material has already been identified by some method or other as requiring blocking and material which no one has so far seen and passed judgement on.

Blocking of known “undesirable” material (I am using the word “undesirable” to avoid the problems raised in Issue 1 above) is relatively straightforward though there are questions of how to do so at the speed and quantity of uploads. For example on Facebook, it is said that every 60 seconds 510,000 comments are posted, 293,000 statuses are updated and 136,000 photos uploaded[viii].

It is trivially easy to block an entire website. The block is on the URL - www.nastysite.com -and this is the method traditionally used by such bodies as the Internet Watch Foundation and the National Center for Missing and Exploited Children. It is also possible, again by URL, to block part of the website - www.harmlesssite.com /nastymaterial - though here the blocking will fail if the folder containing the undesirable material is given a different name or location in the file structure of the website as a whole. One can extend this method to specific pages and pictures on the website – www.harmlesssite.com/harmless/nastyfile.jpg - but here too simple name changes will render the blocking ineffective.

Blocking on the basis of keyword is impossibly crude. “Sex” eliminates the counties of Sussex, Essex, Middlesex etc as well as much useful material on health, education, law enforcement and more.

In order to overcome these problems one must revert to a different technology – file hashing. A file hash or fingerprint of a file is created using a simple program[ix] which is applied to the totality of a file – photo, picture, documents, software program – to produce a unique short sequence of numbers and letters. The program is clever enough so that the most purposes no two dissimilar files will ever produce the same hash or signature. A database of these hashes is built up and when a file is presented for examination a hash is created and compared with the database. If there is a match the newly uploaded file is then blocked. File hashing is used elsewhere throughout computing in order, for example, to demonstrate that a file has not been altered or that it has.

This method only works to identify absolutely identical files so that if an “undesirable” file has been slightly altered there will be a different hash and so blocking will not take place. To a limited extent there is also a further technology which deals with slightly dissimilar files. For photo images the most popular of these is called photoDNA[x] which is promoted by Microsoft and given away to Internet service providers , social media services and to law enforcement. There are two typical situations where it is effective – when a file has been subject to a degree of compression to reduce its size and where are there are a series of adjacent clips taken from a video.

Issue 4: technical means for blocking material (b) that is new and hasn’t been seen before.

This leaves the situation where a wholly new material never seen before is uploaded or where previously seen material has been substantially altered for example by cropping or selection. Here many claims are made for “artificial intelligence” techniques.

But most computer scientists as opposed to marketing droids no longer use the phrase “artificial intelligence” or its contraction “AI” because concepts of what it is keep on changing in the light of developments in computer science and investigations by biological scientists in how the human brain actually works. Moreover AI consists of a number of separate techniques all with their own value but also limitations. It can include pattern recognition in images, the identification of rules in what initially appears to be random data, data mining, neural networks, and machine learning in which a program follows the behaviour of an individual or event and identifies patterns and linkages. And there are more and there are also many overlaps in definitions and concepts.

Much depends on what sort of results are hoped for. A scientist either operating in the physical or social sciences and possessed of large volumes of data may wish to have drawn to their attention possible patterns from which rules can be derived. They may want to extend this into making predictions. A social media company or retailer may wish to scan the activity of a customer in order to make suggestions for future purchases – but here high levels of accuracy are not particularly required. If an intelligence agency or law enforcement agency uses similar techniques to scan the activities of individual the level of inaccuracy may have unfortunate consequences – the decision to prevent that person from boarding an aeroplane or whether they secure future employment or whether they are arrested.

If one is scrutinising uploaded files, limitations become apparent. In the first place the context in which a file is being uploaded may be critical. Field Manuals from the United States Army[xi] were produced as part of the training mechanism for that organisation but they are also found on the computers of people suspected of terrorism. Terrorist manuals may be reproduced on research and academic websites on the basis that experts need to be able to refer and analyse them. The same photo may appear on a site promoted by a terrorist group and by a news organisation. Some sexually explicit photos may be justified in the context of medical and educational research – or law enforcement.

Beyond that, as we have already discussed, telling the difference between a document which merely advances an argument and one which incites may be beyond what is currently possible via AI. My favourite example of linguistic ambiguity is “I could murder an Indian” which might mean no more than one person is inviting another to a meal in an Indian restaurant. In terms of photos, how does one tell the difference between the depiction of a murderous terrorist act and a clip from a movie or computer game? AI can readily identify a swasitka in an image - but is the photo historic and of Germany in the 1930s and during World War II, or a still from a more modern war movie, or is it on a website devoted to neo-Nazi anti-semitism? How do you reliably distinguish a 16-year-old from an 18-year-old, and for all ethnicities? How does an AI system distinguish the artistic from the exploitative or when in a sexual situation there is an absence of consent? What exactly is "fake news" and where are the generally-accepted guidelines to recognise it?

The role of AI techniques therefore is less that they can make fully automated decisions of their own and more that they can provide alerts for which human monitors will make a final arbitration. Even here there is a problem because as with most alert systems it is usually possible to set a threshold before something is brought to attention. A balance has to be struck between too many false positives – alerts which identify harmless events – and false negatives - failures to identify harmful activity.

Issue 5: the role and training of human monitors.

This takes us back to Issue 1. A human monitor has to make judgements based on criteria laid down by the organisation exercising blocking. That human monitor needs clear and consistent instructions and associated with them appropriate training. Among other things the blocking organisation will want to be able to demonstrate consistency in decisions. As we have seen monitoring for illegality is easier than making judgements about “extremism” and “harm”. But even here the structure of many laws is that it is for a court to determine whether a crime has been committed. Where the test is purely of a factual nature – for example the age of a person in a sexual situation – the decision might be relatively simple. But whether somebody is to be convicted for disseminating terrorist material context may be critical – the academic researcher versus someone against whom there is also evidence of having sent funds to or has begun to accumulate the material necessary to build a bomb.
As a result the human monitor can probably only block where they are absolutely sure that a court would convict – leaving a number of potential situations in which a court might possibly convict but the monitor decides that there is insufficient reason to block. At the Internet Watch Foundation which operates on a relatively limited remit confined to illegal sexual material, decisions about marginal photos and files are usually taken by more than one person and may be referred upwards for special review.

One policy problem in the counter-terrorism domain is that material which by itself is not illegal may nevertheless play a part in the radicalisation of an individual. A striking recent example was a BBC drama based on events involving child abuse in the northern town of Rochdale which was said to have inspired a man to murder a Muslim man and attack others in Finsbury Park, London.

Where are we to obtain appropriate human monitors? Facebook and similar organisations have announced that they plan to recruit 10,000 or more such persons. But there is no obvious source – this is not a role which exists in employment exchanges or in the universities. Almost inevitably a monitor will spend most of their day looking at deeply unpleasant and distressing material – even if you can persuade people to assume such a role it is plainly important to establish that they have the intellectual ability and psychological make up to be able to cope and perform. Current indications are that monitors are recruited in countries that possess a population of graduates but where regular employment for them is very limited and hourly rates are low. It also looks as though the monitors are not directly employed by the social media sites but by third-party out-sourcing companies such as Accenture.[xii] If true this could be aimed at limiting the liability of the major social media sites. Moreover, and again one looks at the experience of the Internet Watch Foundation, employers have a duty of care as damage to the monitor as well as their effectiveness may develop over time. One must also ask what sort of career progression such a monitor can expect.

Observations

Too often those who dislike what they see “on the Internet” spend all their energy in drawing attention to the various harms and neglect to consider in sufficient detail which remedies might have a practical impact.

As this article has tried to show criteria for blocking have to be clear and unambiguous whether the blocking is carried out by human monitors, computer programs or a combination thereof. There will always be a substantial territory at the margins where there are disputes.

Fully automated computer-mediated blocking is high risk because AI is nowhere near sufficiently sophisticated to achieve results which most people will accept. There is a useful mantra: Blocking is good and censorship is bad.

So given that obvious harms exist on the Internet: what practical routes are available now?

One of them, popular with campaigners, is to emulate Germany and its Netzwerkdurchsetzungsgesetz - NetzDG for short. This requires the biggest social networks - those with more than two million German users - to take down "blatantly illegal" material within 24 hours of it being reported. For less obvious material, seven days’ consideration is allowed. Fines for violation could be up to 50 million euros. At the time of writing there have been no cases. But this law seems to be limited to situations where there is existing law describing illegality, not to further instances of extremism and harm.

There are a number of existing UK laws which address situations which are less than full-on sexual and terrorism offences, for example the sending by an adult of a sexually explicit picture to a child and the various preparatory terrorist activities in the Terrorist Act 2006 - “encouragement”, dissemination of materials, raising funds, arranging and attending training events.

The NSPCC proposes a Code of Practice which it says should be mandatory[xiii] but many of their detailed proposals lack the specificity which is required if there is to be legal enforcement – “safeguarding children effectively – including preventative measures to protect children from abuse” is simply the articulation of a desirable policy aim. However there is much to be said for campaigning for a voluntary code, violation of which would be an opportunity for public shaming.

This takes us to a proposal which is in some respects contentious but which merits further examination: much higher personal identity verification standards before admitting people to accounts on social media. This would involve processes similar to those required in opening an online bank account – birth certificates, passports, possibly signatures from trusted individuals to sign off on some-one’s identity. Such an approach would do much to prevent under-age individuals from joining unsuitable services and stop others from seeking to post anonymously or via a fake identity. Just as gun laws do not wholly stop the circulation of illegal firearms such measures would reduce though not eliminate grooming, hate speech and fake news. At the least higher personal identity verification standards would make it much easier to identify fake identities and identities which are bots as opposed to real people. But there will be opposition from privacy advocates who will argue that in some countries dissent is difficult to publish unless there is anonymity.

But higher personal identity verification standards would have to be imposed globally and not just in the UK in order to close off obvious evasion routes – and both the public and the major social media sites would need to be persuaded that the advantages outweigh the loss of convenience and privacy.

[i] S 160 Criminal Justice Act 1988

[ii] https://www.sentencingcouncil.org.uk/offences/item/possession-of-indecent-photograph-of-child-indecent-photographs-of-children/

[iii] sections 63-67 of the Criminal Justice and Immigration Act 2008

[iv] https://www.theguardian.com/uk-news/2017/sep/17/paralysis-at-the-heart-of-uk-counter-extremism-policy

[v] Indeed under s 67 Serious Crime Act 2015 it is an offence for an adult to send a sexually explicit message to a child

[vi] See for example: https://inforrm.org/2017/11/12/cjeu-advocate-general-opines-on-the-definition-of-a-data-controller-applicable-national-law-and-jurisdiction-under-data-protection-law-henry-pearce/

[vii] http://constitutionus.com/; https://www.law.cornell.edu/constitution/first_amendment

[viii] Cited by https://zephoria.com/top-15-valuable-facebook-statistics/ though there are other statistics and it is difficult to know which to credit.

[ix] Such as MD5 or from the SHA family

[x] https://www.microsoft.com/en-us/photodna; https://en.wikipedia.org/wiki/PhotoDNA

[xi] https://www.loc.gov/rr/frd/Military_Law/pamphlets_manuals.html

[xii] https://www.thetimes.co.uk/article/facebook-fails-to-delete-hate-speech-and-racism-hwrzw0qzn; https://www.thetimes.co.uk/article/meet-the-internet-moderators-b86t2lrlv; ttps://www.washingtonpost.com/news/the-intersect/wp/2017/05/04/the-work-of-monitoring-violence-online-can-cause-real-trauma-and-facebook-is-hiring/?utm_term=.4d0a47b56d12; https://www.wsj.com/articles/the-worst-job-in-technology-staring-at-human-depravity-to-keep-it-off-facebook-1514398398;http://www.dailymail.co.uk/news/article-4548898/Facebook-young-Filipino-terror-related-material-Manchester.html

[xiii] https://www.nspcc.org.uk/what-we-do/news-opinion/more-than-1300-cases-sexual-communication-with-child-recorded-after-change-law/