Research Paper:

Proposal: Crowdsourcing Online Comment Aggregation

Team Members: Sonia Ng and Amanda Strickler

Introduction: We would like to implement a system for online comment aggregation that leverages computational linguistics methodologies as well as crowdsourced tasks to present meaningful representative comments to users. Specifically, our ideal use case would be postings with 100 or more users where a given user would be presented with one or a sample set of representative comments related to the post rather than the user having to read through all responses to the post before writing a response. As a result, this would encourage more participation since users will not be overwhelmed with having to read a large amount of posts to feel like they understand enough to respond. Further, this approach differs from a voting mechanism used by many websites to highlight meaningful comments.

One of the problems we see arising however, is the loss of information. If we have a representative post, how “representative” is it? How can we make sure it contains the most relevant bits of information in there? Inevitably, we will have to compromise some loss of information for convenience.

Technical Details: We plan to implement a web application for topic posting and commenting using Python and Google App Engine.

  • 10/31: Have an algorithm working for clustering and a basic operational web-application
  • 11/7: Submit a progress report
  • 11/14: Have a functional web-application
  • 11/21-11/28: Run an experiment
  • 12/12: Present the prototype and submit a paper of findings

Objective: We intend to use TF-IDF (frequency-inverse document frequency) to compare comments, create clusters of similar comments into sub-topics and determine the representative comments from the entire input set. Recently, TF-IDF was used by Jagan Sankaranarayanan et al. in TwitterStand to perform news processing based on Twitter tweets. In addition to TF-IDF clustering, users who wish to post will first be required to perform a task that contributes to the selection process for representative comments. For this study, we will assume that most users will not mind performing a simple task prior to commenting (though we will ask for their feedback on it); we will try to make it as simple and straightforward as “liking” a Facebook comment. Such tasks could involve choosing what the top post is among a set of “most representative” posts preselected by the clustering algorithm. We also believe that an element of human subjectivity will help generate high-quality comment aggregation results beyond what automated methods can provide.

We hope to leverage classmates as users for our system as experimentation. If possible, class paper readings will be posted to our website along with Google Plus, and students will submit comments to the papers on both sites. As representative comment selection is a subjective task, we will likely manually and intuitively evaluate our success. We may also solicit feedback from classmates following the experiment.

Source: Jagan Sankaranarayanan, Hanan Samet, Benjamin E. Teitler, Michael D. Lieberman, and Jon Sperling. 2009. TwitterStand: news in tweets. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS '09). ACM, New York, NY, USA, 42-51.

PROJECT PROGESS REPORT (added November 7, 2011):
Progress Report for Crowdsourced Comment Aggregation