How we built a tool that detects the strength of Islamophobic hate speech on Twitter

How we built a tool that detects the strength of Islamophobic hate speech on Twitter
Finding, and measuring Islamophobia hate speech on social media. Credit: John Gomez/Shutterstock

In a landmark move, a group of MPs recently published a working definition of the term Islamophobia. They defined it as "rooted in racism", and as "a type of racism that targets expressions of Muslimness or perceived Muslimness".

In our latest working paper, we wanted to better understand the prevalence and severity of such Islamophobic hate speech on . Such speech harms targeted victims, creates a sense of fear among Muslim communities, and contravenes fundamental principles of fairness. But we faced a key challenge: while extremely harmful, Islamophobic hate speech is actually quite rare.

Billions of posts are sent on social media every day, and only a very small number of them contain any sort of hate. So we set about creating a classification tool using which automatically detects whether or not tweets contain Islamophobia.

Detecting Islamophobic hate speech

Huge strides have been made in using machine learning to classify more general hate speech robustly, at scale and in a timely manner. In particular, a lot of progress has been made to categorise content based on whether it is hateful or not.

But Islamophobic hate speech is much more nuanced and complex than this. It runs the gamut from verbally attacking, abusing and insulting Muslims to ignoring them; from highlighting how they are perceived to be "different" to suggesting they are not legitimate members of society; from aggression to dismissal. We wanted to take this nuance into account with our tool so that we could categorise whether or not content is Islamophobic and whether the Islamophobia is strong or weak.

We defined Islamophobic hate speech as "any content which is produced or shared which expresses indiscriminate negativity against Islam or Muslims". This differs from but is well-aligned with MPs' working definition of Islamophobia, outlined above. Under our definitions, strong Islamophobia includes statements such as "all Muslims are barbarians", while weak Islamophobia includes more subtle expressions, such as "Muslims eat such strange food".

Being able to distinguish between weak and strong Islamophobia will not only help us to better detect and remove hate, but also to understand the dynamics of Islamophobia, investigate radicalisation processes where a person becomes progressively more Islamophobic, and provide better support to victims.

How we built a tool that detects the strength of Islamophobic hate speech on Twitter
Credit: Vidgen and Yasseri

Setting the parameters

The tool we created is called a supervised machine learning classifier. The first step in creating one is to create a training or testing dataset – this is how the tool learns to assign tweets to each of the classes: weak Islamophobia, strong Islamophobia and no Islamophobia. Creating this dataset is a difficult and time-consuming process as each tweet has to be manually labelled, so the machine has a foundation to learn from. A further problem is that detecting hate speech is inherently subjective. What I consider strongly Islamophobic, you might think is weak, and vice versa.

We did two things to mitigate this. First, we spent a lot of time creating guidelines for labelling the tweets. Second, we had three experts label each tweet, and used statistical tests to check how much they agreed. We started with 4,000 tweets, sampled from a dataset of 140m tweets that we collected from March 2016 to August 2018. Most of the 4,000 tweets didn't express any Islamophobia, so we removed a lot of them to create a balanced dataset, consisting of 410 strong, 484 weak, and 447 none (in total, 1,341 tweets).

The second step was to build and tune the classifier by engineering features and selecting an algorithm. Features are what the classifier uses to actually assign each tweet to the right class. Our main feature was a word embeddings model, a deep learning model which represents individual words as a vector of numbers, that can then be used to study word similarity and word usage. We also identified some other features from the tweets, such as the grammatical unit, sentiment and the number of mentions of mosques.

Once we'd built our classifier, the final step was to evaluate it, which we did by applying it to a new dataset of completely unseen tweets. We selected 100 tweets assigned to each of the three classes, so 300 in total, and had our three expert coders relabel them. This let us evaluate the classifier's performance, comparing the labels assigned by our classifier with the actual labels.

The classifer's main limitation was that it struggled to identify weak Islamophobic tweets as these often overlapped with both strong and none Islamophobic ones. That said, overall, its performance was strong. Accuracy (the number of correctly identified tweets) was 77% and precision was 78%. Because of our rigorous design and testing process, we can trust that the classifier is likely to perform similarly when it is used at scale "in the wild" on unseen Twitter data.

Using our classifier

We applied the classifier to a dataset of 109,488 tweets produced by 45 far-right accounts during 2017. These were identified by the charity Hope Not Hate in their 2015 and 2017 State of Hate reports. The graph below shows the results.

While most of the tweets – 52.6% – were not Islamophobic, weak Islamophobia was considerably more prevalent (33.8%) than strong Islamophobia (13.6%). This suggests that most of the Islamophobia in these far-right accounts is subtle and indirect, rather than aggressive or overt.

Detecting Islamophobic hate speech is a real and pressing challenge for governments, tech companies and academics. Sadly, this is a problem that will not go away – and there are no simple solutions. But if we are serious about removing hate speech and extremism from online spaces, and making social media platforms safe for all who use them, then we need to start with the appropriate tools. Our work shows it's entirely possible to make these tools – to not only automatically detect hateful content but to also do so in a nuanced and fine-grained manner.


Explore further

Responses to terror attacks helping to fuel Islamophobia in society

Provided by The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.The Conversation

Citation: How we built a tool that detects the strength of Islamophobic hate speech on Twitter (2019, January 2) retrieved 24 July 2019 from https://phys.org/news/2019-01-built-tool-strength-islamophobic-speech.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
16 shares

Feedback to editors

User comments

Jan 02, 2019
Any action / speech which assigns characteristics to people the speaker does not know personally based on their "race" / religious beliefs / sexual orientations / etc. is wrong.

Jan 02, 2019
"... and when experience is not retained, as among savages, infancy is perpetual. Those who cannot remember the past are condemned to repeat it."

The Crusades: A History (Yale, 2005) Jonathan Riley-Smith
The History of Jihad: From Muhammad to ISIS (Bombardier, 2018) Richard Spenser

The Saracen Jihadis have been at war with the world, cutting heads, since 632 CE.

mqr
Jan 02, 2019
Assigning characteristics to an individual based on group membership is not wrong. Humans tend to generalized based on different cues. What is wrong is to hate based on those generalizations. For example, I have generalizations about Jews, Hindu Indians, and even about women, but they are not hateful ideas about them. In average, it seems that women talk more than men. But I would not promote discriminating or segregating women because of that. People should check their generalizations once they feel hatred or negative emotions against the person in question. The role of the emotion of hate is to review our own's actions and thoughts, hate is not to be used as a guideline.

Jan 03, 2019
I wonder why there seems to be almost no "Buddhistphobia" or "Hinduphobia" or "Sikhphobia" etc.

Jan 03, 2019
Wouldn't it be easier to measure the strength of Islamist hate speech against infidels on Twitter? That hate speech isnt regularly banned and removed.

Jan 03, 2019
Well, if it was only detecting. the problem is to balance hate speech with free speech within human rights for good effect: is it useful to ban hate speech, akin to how it is useful to regulate markets? We need more research there, but of course these detection tools can be used for that.

Any action / speech which assigns characteristics to people the speaker does not know personally based on their "race" / religious beliefs / sexual orientations / etc. is wrong.


No! This is anti-science post-modern racist-like (grouping for groupings sake) "intersectionality" if anything. Of course we can often assign characteristics to distinctive groups since that is inherent in ability to discern. But as for "race"/"intersectional groups" we should be careful to not confuse and cause harm; e.g. there are no extant human "races" (subpopulations) since the last Neanderthal mixed with the rest.


Jan 03, 2019
I wonder why there seems to be almost no "Buddhistphobia" or "Hinduphobia" or "Sikhphobia" etc.


There is to varying and even larger degree, in most or all societies antisemitism is largest seen as frequency AFAIK. It is certainly twice as frequent compared to antimuhammedanism among people who has Jewish ties here in Sweden. (I wont use the daft, erroneous religious term since it is hiding and thwarting the ethnicity hate problem - religious criticism is dandy, c.f. freedom of religion rights.)

But if you want to point out a weakness of the study, that is that they did not go for the most frequent problem. That may be for political reasons, such as when they ride on the useful-idiot like "intersectionality" that may or may not have popped up in these comments. Or it is just coincidence - I google Taha Yasseri as born in Iran and Bertie Vidgen tagging himself with "Islamophobia" but not other problematic racist ideas.

Jan 03, 2019
It is interesting how reporting an increase in crime in certain areas of a country is considered hate speech. It is also interesting that people entering a host country expect that country to adopt their religious laws in that nations secular courts.

Jan 03, 2019
BTW it is no accident that western nations are suffering an immigration crisis. That is the point of the Middle East wars! The New World Order crowd want to destroy existing governments with the chaos that unlimited immigration will create. It was all written by Saul Alynsky and Cloward & Piven.

Jan 03, 2019
Here is how immigration works here in the US. Women come with their children or give birth here and become wards of the state. They are fed and housed by the state. They fill local schools with their children and receive free treatment in hospital emergency rooms. All of this is at taxpayer expense. Their boyfriends, aka baby daddys, work off the books and pay no taxes. They send this money out of the country which further drains the US economy. They also receive free medical care via emergency rooms.

All in all this costs the US 100s of billions a year if not more.

Jan 04, 2019
balance hate speech with free speech
Unfortunately, bigotry is an inseparable aspect of tribalism. 'Internal amity + external emnity'. And tribalism is the inevitable result of overpopulation, whether perceived or actual.

Floods of refugees reawaken tribalist tendencies in indigenes. And these refugees are fleeing the results of overpop and tribalism in their home countries, and so are bringing their tribal dynamic with them.

It just so happens that politicians and their social media cohorts are favoring one over the other, and we get to watch it all play out.

The underlying reason for this contrived imbalance is that the US, and by extension all western culture, functions by design as a melting pot. An amalgamator of the species and the only way of ridding it of its tribalist tendencies, along with eliminating overpopulation.

Melting pots require the proper ingredients introduced at the proper time, in the proper proportions.

This is exactly what we are seeing.

Jan 04, 2019
"Melting pots require the proper ingredients introduced at the proper time, in the proper proportions. This is exactly what we are seeing."

I have to disagree. The melting pot is being replaced with separatism! Common language requirements are being ended. Cultural blending is now being called appropriation. Every possible group of people is being pitted against the majority in an attempt to divide and conqueror


Jan 04, 2019
I have to disagree
-A decision not a conclusion.
The melting pot is being replaced with separatism! Common language requirements are being ended. Cultural blending is now being called appropriation
Fifteen years ago there was a healthy emergence of antireligionism in response to 9/11. Hitchens, dawkins and their 4 horsemen were popular talkshow guests and TED speakers. Bill Mahers 'Religulous' movie was very successful. Evangelists were prosecuted, the catholic church exposed.

And then we began seeing a resurgeance of religionism in the US. Conservative talk show hosts and Fox pundits were espousing their personal xian convictions unchallenged. Hitchens died. Very discouraging.

Why the sudden change? Well the gulf wars were over, and the west was open to a new surge of very religious third worlders who needed to feel welcome among fellow believers. Hispanic signs and labels everywhere. Open borders.
Cont>

Jan 04, 2019
Every possible group of people is being pitted against the majority in an attempt to divide and conquer
No, to equalize. Old tribal identities are being dissolved. Confederate monuments once considered harmless are being torn down. Orange 'general lee' dukes Chargers no longer sold in toy stores.

The key is within the fabric of the melting pot itself. When you make all religions equal and you enforce that concept, then they ALL lose their power. When people are allowed to compare their beliefs in a secular setting they gradually realize that they are all essentially identical and thus superfluous.

But you've got to get them all in the same place first, mix them all up, and publicize both their worst aspects (bigotry, violence) and their efforts at mutual tolerance, under a new flag.

You give them a common enemy - Trump, the old guard - and they gain a sense of communal power by electing their ocasio-cortezes.
Cont>

Jan 04, 2019
"Rashida Tlaib is sworn into office Thursday, she will use Thomas Jefferson's centuries-old quran... Tlaib's profanity-laced tirade came during an event held by the MoveOn progressive organization. Recalling a conversation she had with her son, who told her that "Look mama you won- bullies don't win," Tlaib said that she replied "You're right, they don't. And we're gonna go in and impeach the motherfucker!""

-An old formula. Rome at one time was 90% slave as much of the citizenry had left to work and fight in the provinces. Slaves could own businesses and serve in public office.

Necessary ingredients are added to the melting pot, one way or another.

Slaves are often the best way to jump-start colonial economies. In the west, slave-based economies had been made obsolete by the industrial revolution. At the time of the American civil war, fully 1/3 of the pop of southern states was slave.
Cont>

Jan 04, 2019
That war accomplished two very important things. It killed off a significant percentage of workers north and south to accommodate these newly freed slaves, and it systematically destroyed the southern infrastructure that was based on slavery. Cities, towns, and plantations were burned. Railways were torn up and replaced with tracks that matched the standard northern guage. Harbors and factories were destroyed and rebuilt. Etc.

There was the very real danger that blacks would congregate around southern cities, replacing the indigenous culture with their own, and eventually press for secession. So they were forced to move north and westward by the Klan. Their communities were kept small and their political participation minimal until the dispersion was complete.

And when demographic Goals had finally been met a century later, they were given a voice and a cause.

We are watching the same sort of Machinations taking place today, stirring the pot, adding new ingredients.

Jan 04, 2019
All well and good Otto but today we are giving government sponsored welfare and health benefits to anyone who enters the country illegally. We place limits on educated people who want to enter legally and support themselves but no real limits on the uneducated who want to enter illegally and be supported.

Jan 04, 2019
It is not just a federal problem but a state and local problem also. Schools are overcrowded by multiple families living in a one family dwelling causing local school taxes to skyrocket. Police are also overtaxed since these poor living in overcrowded conditions need more services.

Jan 04, 2019
All well and good Otto but today we are giving government sponsored welfare and health benefits to anyone who enters the country illegally. We place limits on educated people who want to enter legally and support themselves but no real limits on the uneducated who want to enter illegally and be supported
-And once black people were adequately dispersed, their reproduction was subsidized with welfare to meet predesigned Goals.

This is much the same as the Canadian govt that would give massive subsidies to parents. With enough children, many people didnt have to work at all.

"A family or child allowance is a monthly government payment to families with children to help cover the costs of child maintenance. The Family Allowance began in 1945 as Canada's first universal welfare program. Benefits were awarded without reference to the family's income or assets..."

-Same process, different optics. Applied demographics.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more