While social media might seem like a confusing tangle, there are actually discernible structures underlying and ordering online communities like Twitter, according to a new study by professors from
Royal Holloway University and Princeton University.
The article by John Bryden, Sebastian Funk, and Vincent Jansen, titled “Word usage mirrors community structure in the online social
network Twitter” and published in EPJ Data Science, describes how Twitter users group themselves (consciously or
unconsciously) into “tribes” which are identifiable by shared language patterns -- and which also tended to have similar vocations, political opinions, and hobbies.
The researchers
tracked 75 million public messages sent on Twitter by around 250,000 members, and used this data to determine which individuals were more likely to talk to each other. Then they correlated these
relationships with linguistic patterns to identify a wide range of sub-communities, which they term “tribes.”
No surprise, one such community was interested in -- perhaps
“obsessed with” would be more accurate -- Justin Bieber, and members of this group displayed their own unique verbal tics. Jansen noted: “Interestingly, just as people have varying
regional accents, we also found that communities would misspell words in different ways. The Justin Bieber fans have a habit of ending words in ‘ee’, as in
‘pleasee’.”
The Twitter language tribes get pretty, well, tribal, in the sense of tightly focused on their parochial interests. According to Funk, another group, which the
researchers nicknamed “anipals,” “was interested in hosting parties to raise funds for animal welfare, while another was a fascinating growing community interested in the concept of
gratitude.”
The model passes the predictive test, according to the authors, who assert that “The words used by an individual user, in turn, can be used to predict the community of
which that user is a member.” The authors claim a roughly 80% accuracy rate in being able to predict which community a Twitter user belongs to based on linguistic analysis.