Hate Speech Detector
and Analytic Tool
This tool extracts comments from news articles based on filter keywords.
Comments are extracted and assigned weighted scores
.jpg)
Identifying and exposing the culture of hate found in online comment sections, designed
for UIUC CS410 Text Information Systems
This tool extracts comments from news articles based on filter keywords.
Comments are extracted and assigned weighted scores
The application runs behind the scenes against a collection of articles and associated comments. Each article is evaluated against a list of keywords indicating it may be a topic that generates comments of interest. For the identified articles it then extracts all comments and ranks them according to the keywords discovered. A list of articles ranked by the content of their comments is then returned to the end user on the results page.
Currently the application comes preconfigured to run against a set of articles and comments that have been scrapped from the web to allow the user to test the function
This application consists of several technologies customized to the needs of our target users. The installation instructions allow an end-user to download the source code and links to setup the application. :
The "custom brew" of this application is the use of keyword weights. Normal search engines apply a calcualted score used to rank documents. For example BM25. With the use of weights we allow the end user to control the impact on a word by word basis. The weights are stored in the /text/weights.txt file in the folder structure. The format of this file is
{'race':100,
'hate':100,
'kill':100, ...}
Please view our team video on a demonstration on how to edit keywords, weights, queries and categories.
A link to the video can be found under the "how to install section" of this user guide.
We are grateful to the people listed in our references for making their source code open to the public.
and we want to make a special call out to Nick Hirakawa for publishing what we used as the base of our BM25.
Coming Soon!