My bachelor assignment was on how to detect cyberbullying in Twitter messages. Machine learning was a great way to tackle this problem.

For the uninitiated: Machine learning is catch-all term for when a machine (computer) is used to learn from certain characteristics of a dataset and apply this knowledge to do something with similar data. In short, it learns how to classify things from classified things.

Data gathering

For this to work we first need some Twitter messages that show bullying behaviour and some that don't. For a few reasons, it's also nice if the amounts of both are about equal. To increase our chances of finding bullying messages on twitter (because there are a lot of other messages) we set two criteria on hte messages that we want to take. One: It contains swearing. We thought that verbal abuse would be a good indicator of bullying. (However I did find a different study that saw no great correlation between swearwords and abuse on Instagram. This is the third reference in the .pdf) Two: It contains an @-mention. We figured that a bully attacking a victim would address their bullying to the victim.

Then we manually annotated about a thousand as being bullying or not. About a third was bullying.

Pre-processing

As mentioned before, it is nice when the classes are balanced; when there are as many bullying as normal messages. Many classifiers do better that way. The more technical explanation is in the paper, but I increased the amount of bullying messages using SMOTE (Synthetic Minority Oversampling TEchnique). It tries to increase the amount of items in the smaller class by generating new items that 'look similar' to the existing ones. This way it (hopefully) prevents the problem of overfitting. That is when the classifier is so well trained on the data you gave it, that it doesn't know what to do with new data and accuracy suffers.

Classifying

For the actual classifying I used the Weka software. It has its issues, but is well suited for this task. I compared a few different classifiers and their parameters before settling on the LibSVM classifier.

Feature exploration

Besides the text of the tweet, there are other things in a tweet that might indicate a bullying message. For example, maybe cyberbullying happens more often just after school, or after bedtime. I checked this by taking the time the tweet was posted, converting it to the tweeter's local time, and comparing the two.

For statistical analysis I normalized the data by starting the day in the morning at about 7am. The analysis suggests that there is a significant difference between the two. Due to a couple of factors I decided against implementing it in the model.

Another thing that might have an effect is the difference between "I hate you" and "I don't hate you". If you use a simple bag-of-words model (without bigrams or other n-grams) these look pretty similar. If "hate" is a feature found in bullying messages, the classifier is likely to sort them both as bullying. Checking for these negations in the data found that negations are used pretty much everywhere and does not really make much difference between bullying and not. I didn't, but looking at n-grams (where n words together are taken as a 'word') instead of single words would help, and would catch more of these issues.

Classification helper tool

I also made a small tool to help with the manual classification. It takes data as input and asks you to annotate each in order. When you have done a few, you can tell it to make model of the data so far. Then it uses this model to sort the rest of the data based on the likeliness that a message is a bullying message. Hopefully, now you encounter more messages of the smaller bullying class when you resume annotating. If you keep annotating and re-training the classifier, the classifier will get better over time and will give you more samples of the bullying messages.

An added benefit is that when language changes it will incorporate these into the model. To improve the tool, you could even make it so that features that were strong in the past but are now hardly seen anymore get a lower weight, and vice versa.

Fun future experiment

Word2Vec is a really cool tool that turns words into multidimensional vectors and then lets you do some interesting things with them. For example, many countries are decently 'close' to each other. And if you take 'king', subtract 'man' and add 'woman', the resulting vector is very close to the one for 'queen'.

It might be cool to use this to find other words that are close to strong feature words, and find and weigh them before the classifier encounters them in new data.

South Park relevance

Finally, a few days before I presented my bachelor thesis an episode of South Park came out where Butters had to go through the social media feeds of Cartman and remove all of the nasty messages. If he had access to this tool, his job would've gotten a lot easier. Although, maybe it should first be tweaked to show more nice messages than nasty, because he soon got overwhelmed with the amount of negativity he was reading. Or if the classifier gets accurate enough, he wouldn't even have to read any messages and just let the computer do the job.

Here is the .pdf of the paper.