“The Internet contains some deeply
troublesome and harmful material. The main commercial players are both
immensely rich and immensely clever – they must
be able to do more to find solutions.
If they don’t we politicians will prosecute/fine/tax them until they behave
responsibly.” So goes the refrain but what can we reasonably expect of the
available technologies? Here is a guide for campaigners.
Issue
1: what criteria are you applying for blocking “undesirable” material?
To those who haven’t thought about the
issue it seems obvious what needs to be blocked. Almost anyone other than the
most extreme libertarian will point to material which they find distressing or
harmful – and be able to produce justifying arguments. But if you are asking a
computer program or a human being to make decisions there has to be greater
clarity. In almost all circumstances there will be countervailing arguments
about freedom of speech, freedom of
expression and censorship.
The easiest policy to implement is where
one can point to existing legislation defining specifically illegal content.
For example in the United Kingdom possession of indecent images of children is
a strict liability offence[i].
Published guidelines from the Sentencing Council describe in detail three
levels of offence in terms of age and specific activities[ii].
Similarly “extreme pornography” is clearly defined – essentially animals, dead people and absence of consent[iii].
But outside that particular context there is no definition of “extremism” still
less of “harmful”.[iv]
Successive would-be legislators have struggled because so often the appearance
of a particular document or file depends not only on its content but on its
context.
A simple example: let’s take two
statements: “the state of Israel is a theft from Palestinians” and “the state
of Israel is entitled to occupy all the territories mentioned in the Bible”.
Are these statements, which many people would label “extreme”, simply expressions of history and religious
belief? Do we have a different view of them if they are accompanied by a call
to action – push all the Jews out, push out all the Arabs? The boundaries are
unclear and it seems unreasonable that if legislators are unwilling to provide
assistance that somehow Internet companies should be forced to make those
decisions. There is a separate further issue for the biggest of the global
companies in that judgements about extremism and harmfulness vary across
jurisdictions and cultures.
It gets more difficult with “grooming”
whether for a sexual purpose or to incite terrorist acts. The whole point of grooming is that its
starts low key and then builds. It is easy
enough to identify grooming after a successful exercise[v]
but how do you distinguish the early stages from ordinary conversation? And how do you do so via a computer program
or a human monitor?
Finally, it is even more difficult to think
what the evidence would look like where the enforceable law simply says: social
media sites should keep children safe.
Issue
2: what is the legal framework within which material gets uploaded?
Material gets uploaded to the Internet via
a variety of legal frameworks and this has an impact on where potential legal
enforcement can be directed. An
individual might buy web space from an Internet service provider and create
their own website. That same individual may provide facilities for third
parties to post comments which will then be automatically instantly seen by all
visitors. A social media service will almost certainly require a specific sign
up from their subscribers/members and at that time inform them of an
“acceptable use” or “community standards” policy but will thereafter allow
postings without prior approval or initial restraint.
The position currently taken by most
Internet service companies, bolstered by various directives and laws is that
they are not publishers in the same sense as traditional media such as
newspapers magazines and broadcast television stations. They say that they are
providing facilities but are not editors. Or that they are “data processors” as
opposed to “data controllers”[vi]. The claim is that they are “intermediaries”
for the purpose of the E-Commerce Directive and Regulations. These arguments
are currently being hotly debated. But even under their interpretation there is
a significant impact on what one can reasonably expect them to do in terms of
attempting to block before publication.
The main business of Google is to index
world wide web content which has been originated by others with whom it has no
contractual relationship. It has a series of “crawler” programs which scavenge
the open part of the World Wide Web; the findings are then indexed and that is
what visitors to Google’s main pages see. The contractual relationship that is
most important in the basic Google framework is with those who use the indexes
– essentially the service is paid for by allowing Google to harvest information
about individuals which can be turned into targeted advertising. But Google is
not under any compulsion or contractual obligation to index anything; it can block at will. The main policy reason for refusing to block
is that it has decided that it favours completeness and freedom of speech and
expression; it blocks only when there is an overwhelming reason to do so.
By contrast for Facebook, Twitter, and many
similar services the contractual relationship is with their
customers/subscribers/members. It is consists of saying “we will let you see
what others have posted and we will let you post provided you will allow us to
harvest information about you and send you targeted advertisements”. As part of the contract there is usually an
Acceptable Use or Community Standards provision which are the basis for
blocking. But here again as companies headquartered in the United States they
are concerned about observing First Amendment rights[vii].
There are important differences in terms of
what one can expect if some of this material is to be blocked. In the case of
Google they have no opportunity to prevent material from being uploaded; the
earliest point at which they could intervene is when their crawler comes across
material which has already been published. Their choice is to refuse to index. But for the social media sites and where the
acceptable use policy is part of the customer agreement the earliest
opportunity for blocking is when the customer uploads material.
Issue
3: technical means for blocking material (a) that that has already been
identified as “undesirable”.
We must now look at the various blocking
technologies and see how far they are practical to implement. There is a
significant difference between situations where material has already been
identified by some method or other as requiring blocking and material which no
one has so far seen and passed judgement on.
Blocking of known “undesirable” material (I
am using the word “undesirable” to avoid the problems raised in Issue 1 above) is
relatively straightforward though there are questions of how to do so at the
speed and quantity of uploads. For example on Facebook, it is said that every 60 seconds 510,000 comments are
posted, 293,000 statuses are updated and 136,000 photos uploaded[viii].
It is trivially easy to block an entire
website. The block is on the URL - www.nastysite.com -and this is the method
traditionally used by such bodies as the Internet Watch Foundation and the
National Center for Missing and Exploited Children. It is also possible, again
by URL, to block part of the website -
www.harmlesssite.com /nastymaterial
- though here the blocking will fail if the folder containing the
undesirable material is given a different name or location in the file
structure of the website as a whole. One can extend this method to specific
pages and pictures on the website – www.harmlesssite.com/harmless/nastyfile.jpg
- but here too simple name changes will
render the blocking ineffective.
Blocking on the basis of keyword is
impossibly crude. “Sex” eliminates the
counties of Sussex, Essex, Middlesex etc as well as much useful material on
health, education, law enforcement and more.
In order to overcome these problems one
must revert to a different technology – file hashing. A file hash or
fingerprint of a file is created using a simple program[ix]
which is applied to the totality of a file – photo, picture, documents,
software program – to produce a unique short sequence of numbers and letters.
The program is clever enough so that the most purposes no two dissimilar files
will ever produce the same hash or signature. A database of these hashes is
built up and when a file is presented for examination a hash is created and
compared with the database. If there is a match the newly uploaded file is then
blocked. File hashing is used elsewhere throughout computing in order, for
example, to demonstrate that a file has not been altered or that it has.
This method only works to identify
absolutely identical files so that if an “undesirable” file has been slightly
altered there will be a different hash and so blocking will not take place. To
a limited extent there is also a further technology which deals with slightly
dissimilar files. For photo images the most popular of these is called photoDNA[x]
which is promoted by Microsoft and given away to Internet service providers ,
social media services and to law enforcement. There are two typical situations
where it is effective – when a file has been subject to a degree of compression
to reduce its size and where are there are a series of adjacent clips taken
from a video.
Issue
4: technical means for blocking material (b) that is new and hasn’t been seen
before.
This leaves the situation where a wholly
new material never seen before is uploaded or where previously seen material
has been substantially altered for example by cropping or selection. Here many
claims are made for “artificial intelligence” techniques.
But most computer scientists as opposed to
marketing droids no longer use the phrase “artificial intelligence” or its
contraction “AI” because concepts of what it is keep on changing in the light
of developments in computer science and investigations by biological scientists
in how the human brain actually works. Moreover AI consists of a number of separate
techniques all with their own value but also limitations. It can include
pattern recognition in images, the identification of rules in what initially
appears to be random data, data mining,
neural networks, and machine learning in which a program follows the
behaviour of an individual or event and identifies patterns and linkages. And there are more and there are also many
overlaps in definitions and concepts.
Much depends on what sort of results are
hoped for. A scientist either operating in the physical or social sciences and
possessed of large volumes of data may wish to have drawn to their attention
possible patterns from which rules can be derived. They may want to extend this into making
predictions. A social media company or
retailer may wish to scan the activity of a customer in order to make
suggestions for future purchases – but here high levels of accuracy are not
particularly required. If an intelligence agency or law enforcement agency uses
similar techniques to scan the activities of individual the level of inaccuracy
may have unfortunate consequences – the decision to prevent that person from
boarding an aeroplane or whether they secure future employment or whether they
are arrested.
If one is scrutinising uploaded files,
limitations become apparent. In the first place the context in which a file is
being uploaded may be critical. Field Manuals from the United States Army[xi]
were produced as part of the training mechanism for that organisation but they
are also found on the computers of people suspected of terrorism. Terrorist
manuals may be reproduced on research and academic websites on the basis that
experts need to be able to refer and analyse them. The same photo may appear on
a site promoted by a terrorist group and by a news organisation. Some sexually explicit photos may be
justified in the context of medical and educational research – or law
enforcement.
Beyond that, as we have already discussed,
telling the difference between a document which merely advances an argument and
one which incites may be beyond what is currently possible via AI. My favourite
example of linguistic ambiguity is “I could murder an Indian” which might mean
no more than one person is inviting another to a meal in an Indian restaurant.
In terms of photos, how does one tell the difference between the depiction of a
murderous terrorist act and a clip from a movie or computer game? AI can readily identify a swasitka in an image - but is the photo historic and of Germany in the 1930s and during World War II, or a still from a more modern war movie, or is it on a website devoted to neo-Nazi anti-semitism? How do you reliably distinguish a
16-year-old from an 18-year-old, and for all ethnicities? How does an AI system distinguish the
artistic from the exploitative or when in a sexual situation there is an
absence of consent? What exactly is "fake news" and where are the generally-accepted guidelines to recognise it?
The role of AI techniques therefore is less
that they can make fully automated decisions of their own and more that they
can provide alerts for which human monitors will make a final arbitration. Even
here there is a problem because as with most alert systems it is usually
possible to set a threshold before something is brought to attention. A balance
has to be struck between too many false positives – alerts which identify
harmless events – and false negatives - failures to identify harmful activity.
Issue
5: the role and training of human monitors.
This takes us back to Issue 1. A human
monitor has to make judgements based on criteria laid down by the organisation
exercising blocking. That human monitor needs clear and consistent instructions
and associated with them appropriate training. Among other things the blocking
organisation will want to be able to demonstrate consistency in decisions. As we have seen monitoring for illegality is
easier than making judgements about “extremism” and “harm”. But even here the
structure of many laws is that it is for a court to determine whether a crime
has been committed. Where the test is purely of a factual nature – for example
the age of a person in a sexual situation – the decision might be relatively
simple. But whether somebody is to be convicted for disseminating terrorist
material context may be critical – the academic researcher versus someone
against whom there is also evidence of having sent funds to or has begun to
accumulate the material necessary to build a bomb.
As a result the human monitor can probably only block where they are absolutely sure that a court would convict – leaving a number of potential situations in which a court might possibly convict but the monitor decides that there is insufficient reason to block. At the Internet Watch Foundation which operates on a relatively limited remit confined to illegal sexual material, decisions about marginal photos and files are usually taken by more than one person and may be referred upwards for special review.
As a result the human monitor can probably only block where they are absolutely sure that a court would convict – leaving a number of potential situations in which a court might possibly convict but the monitor decides that there is insufficient reason to block. At the Internet Watch Foundation which operates on a relatively limited remit confined to illegal sexual material, decisions about marginal photos and files are usually taken by more than one person and may be referred upwards for special review.
One policy problem in the counter-terrorism
domain is that material which by itself is not illegal may nevertheless play a
part in the radicalisation of an individual.
A striking recent example was a BBC drama based on events involving
child abuse in the northern town of Rochdale which was said to have inspired a
man to murder a Muslim man and attack
others in Finsbury Park, London.
Where are we to obtain appropriate human
monitors? Facebook and similar organisations have announced that they plan to
recruit 10,000 or more such persons. But there is no obvious source – this is
not a role which exists in employment exchanges or in the universities. Almost
inevitably a monitor will spend most of their day looking at deeply unpleasant
and distressing material – even if you can persuade people to assume such a
role it is plainly important to establish that they have the intellectual
ability and psychological make up to be able to cope and perform. Current indications are that monitors are
recruited in countries that possess a population of graduates but where regular
employment for them is very limited and hourly rates are low. It also looks as though the monitors are not
directly employed by the social media sites but by third-party out-sourcing
companies such as Accenture.[xii] If true this could be aimed at limiting the
liability of the major social media sites.
Moreover, and again one looks at the experience of the Internet Watch
Foundation, employers have a duty of care as damage to the monitor as well as
their effectiveness may develop over time. One must also ask what sort of
career progression such a monitor can expect.
Observations
Too often those who dislike what they see
“on the Internet” spend all their energy in drawing attention to the various
harms and neglect to consider in sufficient detail which remedies might have a
practical impact.
As this article has tried to show criteria
for blocking have to be clear and unambiguous whether the blocking is carried
out by human monitors, computer programs or a combination thereof. There will
always be a substantial territory at the margins where there are disputes.
Fully automated computer-mediated blocking
is high risk because AI is nowhere near sufficiently sophisticated to achieve
results which most people will accept. There is a useful mantra: Blocking is
good and censorship is bad.
So given that obvious harms exist on the Internet: what practical routes are available now?
One of them,
popular with campaigners, is to emulate Germany and its Netzwerkdurchsetzungsgesetz - NetzDG for short.
This requires the biggest social networks - those with more than two million
German users - to take down "blatantly illegal" material within 24
hours of it being reported. For less obvious material, seven days’
consideration is allowed. Fines for violation could be up to 50 million
euros. At the time of writing there have
been no cases. But this law seems to be
limited to situations where there is existing law describing illegality, not to
further instances of extremism and harm.
There are a number of
existing UK laws which address situations which are less than full-on sexual
and terrorism offences, for example the sending by an adult of a sexually
explicit picture to a child and the various preparatory terrorist activities in
the Terrorist Act 2006 -
“encouragement”, dissemination of
materials, raising funds, arranging and
attending training events.
The NSPCC proposes a
Code of Practice which it says should be mandatory[xiii]
but many of their detailed proposals
lack the specificity which is required if there is to be legal enforcement –
“safeguarding children effectively – including preventative measures to protect
children from abuse” is simply the articulation of a desirable policy aim.
However there is much to be said for campaigning for a voluntary code,
violation of which would be an opportunity for public shaming.
This takes us to a
proposal which is in some respects contentious but which merits further
examination: much higher personal identity verification standards before admitting people to
accounts on social media. This would
involve processes similar to those required in opening an online bank account –
birth certificates, passports, possibly signatures from trusted individuals
to sign off on some-one’s identity. Such
an approach would do much to prevent under-age individuals from joining
unsuitable services and stop others from seeking to post anonymously or via a fake
identity. Just as
gun laws do not wholly stop the circulation of illegal firearms such measures
would reduce though not eliminate
grooming, hate speech and fake news. At
the least higher personal identity verification standards would make it much easier
to identify fake identities and identities which are bots as opposed to real
people. But there will be opposition from privacy advocates who will argue that in some countries dissent is difficult to publish unless there is anonymity.
But higher personal identity verification standards
would have to be imposed globally and not just in the UK in order to close off
obvious evasion routes – and both the public and the major social media sites
would need to be persuaded that the advantages outweigh the loss of convenience
and privacy.
[i] S 160 Criminal Justice Act 1988
[ii] https://www.sentencingcouncil.org.uk/offences/item/possession-of-indecent-photograph-of-child-indecent-photographs-of-children/
[iv]
https://www.theguardian.com/uk-news/2017/sep/17/paralysis-at-the-heart-of-uk-counter-extremism-policy
[v] Indeed under s 67 Serious Crime Act 2015 it is an offence for an
adult to send a sexually explicit message to a child
[vi] See for example:
https://inforrm.org/2017/11/12/cjeu-advocate-general-opines-on-the-definition-of-a-data-controller-applicable-national-law-and-jurisdiction-under-data-protection-law-henry-pearce/
[vii] http://constitutionus.com/;
https://www.law.cornell.edu/constitution/first_amendment
[viii] Cited by https://zephoria.com/top-15-valuable-facebook-statistics/
though there are other statistics and it is difficult to know which to credit.
[ix] Such as MD5 or from the SHA family
[x] https://www.microsoft.com/en-us/photodna; https://en.wikipedia.org/wiki/PhotoDNA
[xi] https://www.loc.gov/rr/frd/Military_Law/pamphlets_manuals.html
[xii] https://www.thetimes.co.uk/article/facebook-fails-to-delete-hate-speech-and-racism-hwrzw0qzn;
https://www.thetimes.co.uk/article/meet-the-internet-moderators-b86t2lrlv;
ttps://www.washingtonpost.com/news/the-intersect/wp/2017/05/04/the-work-of-monitoring-violence-online-can-cause-real-trauma-and-facebook-is-hiring/?utm_term=.4d0a47b56d12;
https://www.wsj.com/articles/the-worst-job-in-technology-staring-at-human-depravity-to-keep-it-off-facebook-1514398398;http://www.dailymail.co.uk/news/article-4548898/Facebook-young-Filipino-terror-related-material-Manchester.html
[xiii]
https://www.nspcc.org.uk/what-we-do/news-opinion/more-than-1300-cases-sexual-communication-with-child-recorded-after-change-law/
No comments:
Post a Comment