Machine learning and Twitter

My bachelor assignment was a machine learning message classification problem.

The goal was to separate bullying from non-bullying Twitter messages. I used and compared machine learning algorithms to do this.

For the classifying I made use of the Weka machine learning suite. I used Java to make a framework for easily testing classifiers and their parameters against eachother.

I looked at other potential features in the metadata and more detailed textual analysis. Things like the time of day a message was posted, or the presence of swearing, or the negation of feature words.

I made a tool using Python to help with manual classification, where the manually annotated dataset is used to sort the incoming messages based on how likely it is they are bullying messages. As the user classifies more messages, the automatic classifier becomes increasingly better. This helps with picking out bullying messages more easily.