April 25, 2017

How to Install
This project

In order to run this application locally you will need to install a number of pieces of software
listed below. These directions are for a Windows 10 machine

Solution Overview

The following tools and technologies were used in the construction of this application. Some items were dropped. Here is a lis tof the technologies and tools. Some of these features were removed in the end, but will be re-added

PythonVisual Studio/CodeBM25
PyCharmScrapyMetaPy
BracketsHTML 5Javascript
ExcelJuypterCSS
AnacondaJSOND3
Jira

Project overview

Our project is a custom implementation of the BM25 ranking algorithm. We have added a number of new features to the code base.

We make use of the BM25 algorithm dervived from https://github.com/nhirakawa/BM25. Also, we present our result using a software, D3 (Data-Driven Documents), a JavaScript library for producing dynamic, interactive data visualizations in web browsers. It allows us to visualize our final result for demonstraion. In particular we use the code here, https://bl.ocks.org/seemantk/3368f8c9b3d896965879 to build our project.

You can clone our respository at Get the Code!

To run our code, PYTHON is required. The detailed installation guide can be found here, https://realpython.com/installing-python/.
The application can be run from the command line with a PYTHON shell, but we recommend the user make use of an IDE such as Visual Studio Code or PyCharm. We perfer PyCharm as the integrated hints make working with PYTHON easy for the new developer. Visual Studio Code is free, which makes that a compelling choice. If you are a student you can request a 1 year free copy of PyCharm professional from JetBrains.

Database Setup

We have removed the MongoDB portion of the application to make this easier on the end user, currently the files are stored in CSV or JSON files locally.

Website Configuration

In order to run the website you will need to have a local webserver. PyCharm has one integrated into the shell, but you must set the default webserver to port 8000. Alteratively you can use the http server that ships with Python (under the Anaconda distribution). We only know Windows machines, so you will need to perform a similar function with a Mac.
  • Open a command prompt with Administrator rights
  • In the command prompt run
    python -V
    this allows you verify you are running Python. Version should be 3.5 or later
  • Using the cd commands navigate to the location where you have installed the application
  • Navigate down one level cd html
  • Type in the follwing command exactly
    python -m http.server
  • Verify that it starts the server on port 8000
  • Click on index.html

Setting up Keywords and Filters

The application ships with a default set of documents that allow you run the application right away. These include a weights file, a query file, hate categories, a list of articles (100) and associated comments (~4000)

Our project video at Project Light Overview

Source articles and comments are required to be in a specific format with leading ## symbols. Please view the existing comments and articles under text in order to see more detail.

One interesting experiment for the end user is to add a custom word into a comments section and then update the weights and query to incorporate that term and see how changing weights for that term drive the position of the document. We liked to use or own names and assign higher and lower weights.