Mitigating online gender harassment through
1) user feedback, 2) empathetic innovation, and 3) data-science products

Disclaimer: Given the nature of the problem we're trying to mitigate with our products, please note that as you scroll through the website,
you will be exposed to some offensive and violent language.
Please continue at your own discretion.


Empathization refers to an effort that, thus far, has produced two data-science products aimed to foster empathy and how people regard themselves and each other. The products are informed by user feedback to mitigate an epidemic -- online gender harassment -- as part of an even larger issue: global gender inequality that exists online and offline.

The current products revolve around two groups of Twitter users: people who repeatedly send online gender harassment tweets, and people who receive and are affected by such harassment. The products are built upon artificial intelligence (AI) algorithms that learn from humans' detection of gender harassment, and do so in an automated way.

Among tweets the algorithms predict as online gender harassment, the algorithms are 76-80% accurate. The user-facing products have shown the scalable potential to not only detect but take action on over 1 million offensive tweets per week.

Global Problem

Unequal treatment of women cuts across race/ethnicity, nationality, and region. However, the opportunity to elevate gender equality exists through daily interpersonal interactions.

According to the United Nations Human Rights Office of the High Commissioner, gender equality refers to “equal rights, responsibilities and opportunities. . . However, after 60 years, it is clear that it is the human rights of women that we see most widely ignored around the world”. And McKinsey Global Institute estimates potential gender equality at $12+ trillion gain per year for future global GDP. While social and financial aspects exist, our efforts revolve around the social aspect.

Given gender inequality comprises myriad issues, we target our efforts at a specific sub-issue that has reached an epidemic: online gender harassment. Such harassment is prevalent (Norton, 2016; Women, Action, & the Media, 2015; Pew Research Center, 2014) across Twitter, Facebook, YouTube, etc. Consequently, this influences many women to disengage from social media and not share their perspectives as much. Yet, women deserve equal opportunities to contribute online and offline.

Source: Norton, 2016 via Claire Reilly, c|net, 2016

Various women granted valuable interviews, helping us understand their personal experiences with Twitter online gender harassment and what they view as potential solutions. For instance, regarding potential solutions, women provided feedback that's been incorporated in our product work in the next section. As for the global problem, some women said they anonymize their usernames to reduce gender-based backlash. Various women said their engagement with Twitter and other social media has declined. And some believe they are treated worse online, where offenders (men and women) reveal their true character. Moreover, people's harassing behavior can bleed into their offline behavior.

Tara Moss (Canadian-Australian author, women's rights advocate, and UNICEF ambassador) explains, “it's feeding into the higher rates of sexual violence and sexual harassment that women are experiencing in the physical world.” Other research uncovers:

  • In part of the globe, nearly 1 of 2 women is harassed online, while 3 out of every 4 women under age 30 have experienced online harassment. In addition, "Women are twice as likely to receive death threats online, and women are also twice as likely to receive threats of sexual violence and rape. They're also more likely to be the target of revenge porn, sextortion and sexual harassment" (Source: Norton, 2016 via Claire Reilly, c|net, 2016).
  • In some cases, women usernames incur an average of 100 sexually explicit or threatening messages a day, whereas men usernames receive 3.7 (2014 article on University of Maryland, 2006)
  • WAM! (Women, Action, and the Media) study: "The vicious targeting of women, women of color, queer women, trans women, disabled women, and other oppressed groups who speak up on online has reached crisis levels. Hate speech and violent threats are being used to silence the voices of women and gender non-conforming people in the public discourse everyday. Examples of the impact these attacks are having on women’s lives are everywhere" (Women, Action, & the Media, 2015).
  • Twitter General Counsel, Vijayada Gadde, admits: "These users often hide behind the veil of anonymity on Twitter and create multiple accounts expressly for the purpose of intimidating and silencing people" (Washington Post, 2015).
  • “Online violence against women is an overt expression of the gender discrimination and inequality that exists offline. Online, it becomes amplified,” says Jac sm Kee of the Association for Progressive Communications (APC), a Global Fund for Women grantee partner, which provided the above examples of online violence and harassment. “The most important way to shift this is to enable women and girls to engage with the Internet at all levels – from use, creation, and development to the imagination of what it should and can be" (Global Fund for Women, 2015).

    The Automated Twitter Bot is the first of our products. This bot, disguised to appear as a young white male, is designed to detect gender harassment tweets and intervene by calling out the offensive language of the tweet in a reply to the offender. This product is designed with the intent to mitigate abusive online behavior at the source.

    The AI behind the bot detects gender harassment tweets using an ensemble of eight models: Five Gradient Boosting Decision Trees (GDBT), Two Feed Forward Neural Networks (FNN), and One Logistic Regression (LR). Tweets are classified based on the average predicted probability of harassment across these models. Our default probability threshold is set at 70%. This threshold was defined through the rigorous process of analyzing almost 20,000 tweets for language specifically indicative of gender harassment.

    The method and message of intervention was informed by two studies. One is a study using ReThink, a software product designed to prevent adolescents from sending or posting hurtful messages. The second is an NYU Field Experiment, which addressed racial harassment on Twitter. Both studies found that checking offensive language with a simple message was effective.


    (Names and parts of some messages have been blacked out for privacy reasons)


    Field Experiment and Results

    Hypothesis: The number of offensive tweets (tweets with harassment probability of 70% or more) per offender in the treatment group will be lower than the number of offensive tweets per offender in the control group post intervention. The intervention is the response from the bot to the offender.

    Setup: Over 25 million tweets were run through our AI ensemble of models to identify about 4K offenders, excluding porn and bot accounts. Each bot is now set up to track around 1.5K offender accounts for offensive tweets.

    Randomization: As soon as a tweet from a selected offender is flagged as gender harassment, the offender is randomly placed in treatment or control. If they get placed in treatment, the bot replies to their offensive tweet six minutes later. No reply to offenders in the control group.

    The experiment ran April 15-23 (with results posted and presented previously) after a brief pilot study. A new, larger pilot study was run June-July 2017, while a full study will run July-August 2017. The rightmost column will be populated with full study results in September 2017.


    The Gender Harassment Tweets Blocker is the second product. In contrast to the first product, women can download this Chrome browser extension to automatically block tweets that the product predicts to be gender harassment. Based on feedback, women have two customizable features to start. One can adjust a setting that automatically hides/removes tweets at their preferred level (e.g., tweets that have 60%+ chance of harassment, 80%+ chance of harassment, etc.). In addition, one can click a button to flag tweets as gender harassment that the browser extension didn't block (similar to clicking spam in email), or flag tweets as not gender harassment (similar to restoring email that went to spam incorrectly).

    With positive user experience our top priority, we plan to add a complex but essential layer of web security around the product before release. In addition, the product will be accompanied by a 5-minute, step-by-step video tutorial to facilitate how one can quickly use the product. Please stay tuned for September 2017. . . The product has been demonstrated at UC Berkeley and at Google.

    Chrome Web Store: Free Downloadable Product To Be Released

    Artificial Intelligence

    We used active learning (machine learning) to collect enough tweets (dated 2017 and earlier) that humans such as Mechanical Turk women workers regard as gender harassment. That allowed our AI models to better learn and predict what humans regard as gender harassment language and symbols.

    We found roughly 0.09% (9 out of 10,000) tweets are harassment. Rather than read through 10,000 tweets to find roughly 9 harassment ones, active learning (machine learning) helped us circumvent. We first labeled 1K tweets via various methods (i.e., Twitter live stream via API, Twitter keyword searches via API, harassment tweets via articles, etc.), then used our earliest baseline model (Logistic Regression) to output predicted probabilities on the tweets, before starting the cycle of active learning. Below are actual tweets we presented to an initial audience on 2/15/2017.

    [If you prefer not to read what many regard as highly offensive / misogynistic tweets, please bypass the table below, and jump to the next circular diagram.]

    With active learning -- iteratively moving back and forth between data collection and machine learning -- we retrained our earliest baseline model, improving its ability to predict probability of harassment on past labeled tweets and new unlabeled tweets. For instance, our earliest model predicted the tweet, “you deserve vagina cancer”, at only 50.6% probability of harassment. As the model learned further, it eventually predicted that tweet with 70%+ probability of harassment. Our active learning process. . .

    Our data collection process. . .

    We chose to leverage and tune three categories of models: Gradient Boosting Decision Trees (GBDTs), Feed Forward Neural Networks (FNNs), and Logistic Regression (LR). And we sought not the best single model but the best combination of models. Our artificial intelligence process. . .

    Rather than take a tweet's predicted probability of harassment from one model, we took a tweet's average predicted probability of harassment across multiple models for better reliability. Different models can make different mistakes in predicting the likelihood that tweets are offensive. For instance, for specific tweets, two models might predict low probability of gender harassment incorrectly, whereas six models might predict high probability of gender harassment correctly. By taking their average, the models can compensate for each other.

    We used an automated approach that viewed the results of thousands of ensembles (where one ensemble refers to one combination of models). The graph below shows results for three separate combinations of models. The combination in the leftmost column is tied to our user-facing products: the Twitter Bot and Gender Harassment Tweets Blocker. Note, our flexibile approach allows replacement of one ensemble with another ensemble, if users and stakeholders prefer different performance. [Technical Language (Optional): As an interim step, we trained our final ensemble on the labeled train + validation data, then ran it on the labeled test data once (AUC: 91.3%, Precision: 78.9%, Recall: 32.5%). Then we proceeded to create a sampling distribution of results. Our final ensemble and specific alternative ensembles were eventually retrained on all labeled data (train + validation + test data), before linking our final ensemble to our user-facing products in the wild.]

    As we collect more labeled tweets via various channels, including via the Gender Harassment Tweets Blocker, our ensemble of models should improve even further.

    [Technical Language (Optional): Our final models analyze words, not characters, despite our preference for some models in our ensemble to analyze characters. For instance, we initially analyzed characters as well, leveraging vectorizer "analyzer='char'" with random search across "ngram_range=(1,4)". Some of our initial GBDT models achieved about 95%+ precision and 80% recall. However, GDBT feature importances revealed some single characters such as " ' " took too much importance in the predicted probability of harassment, despite limited occurrences. So, we concluded a larger dataset than 18.8K tweets seems necessary to analyze characters in the future, and should not use that character-level method until then. Thus, to be fair and reasonable, we discarded ensembles which use that method and yield better results, and instead selected ensembles that both perform well and should generalize to new tweets in the Twitter universe. As revealed in the graph above, we built a sampling distribution to show not only our averages, but our standard errors around those averages. The small standard errors indicate each ensemble's consistent performance on 15 cross-validation samples of 6K+ tweets. 15 samples were derived from randomizing seed for 5 iterations and, within each iteration, implementing 3-fold cross-validation.]

    Our combination of 8 models (5 GBDTs, 2 FNNs, and 1 LR) yielded gender harassment probabilities across a sample of 46.2 million tweets. . .

    If users collectively tweet an average of 500 million times a day (David Sayce, November 2016; Business Insider, June 2015), our products (if scaled) could have not only detected but responded to around 1.18 million tweets per week. In full transparency, that also means our products could have incorrectly flagged about 331,000 tweets per week. However, the AI underlying our Twitter Bot and Gender Harassment Tweets Blocker can allow more correct predictions currently (if exchanged for lower detection rates of harassment tweets). For instance, users of the Gender Harassment Tweets Blocker can change the default of 0.70 (hiding tweets with a 70%+ chance of harassment) to 0.85 (hiding tweets with an 85%+ chance of harassment) to have less tweets incorrectly flagged as gender harassment.

    Projected number of harassment tweets that our AI could have detected on Twitter's full dataset (3/6/2017 - 4/16/2017) The horizontal green line refects the projected average of 168K harassment tweets a day across the 6-week timeframe

    Future Possibilities

    Implement an existing list of user feedback for the Twitter Bot and Gender Harasment Tweets Blocker

    Continue to learn from users on product concepts aimed to mitigate online gender harassment

    Reach out to writers and organizations whose gender-harassment research and advocacy has inspired us to consider partnerships

    Create a corresponding tweets blocker for phone and tablet given 82% of Twitter active users are on mobile

    Continue field experiments for cause-effect conclusions on product impact

    Collect more labeled tweets and/or try other AI methods to further detect and mitigate online gender harassement


    Derek S. Chan

    Manager (Strategy, Product, & Analytics) at Oath (acquired Yahoo!); Artist in live theater

    Shruti van Hemmen

    Data Scientist at Intelisent

    Apekshit Sharma

    Software Engineer, Cloudera

    Women Who Granted Anonymized Interviews


    Joyce Shen

    Investment Director at Tenfore Holdings; Lecturer at UC Berkeley

    Alberto Todeschini

    Lecturer at UC Berkeley

    D. Alex Hughes

    Lecturer at UC Berkeley