True or False: Analyzing Customer Reviews In the Market

Team Members: Amanda Strickler, Sonia Ng


The question we are trying to answer is, can we identify what customer reviews are truthful and which are fraudulent? Given that most businesses nowadays have a customer review system that influence the reputation of a product or business, we want to find out how many of these are genuine customer opinions. In order to solve this, we will come up with a set of criteria that will help determine whether a post is genuine or not. We will also build a tool to test this criteria.

From our research, there already has been work done in this area. For instance, Yoo and Gretzel (2009) manually compared linguistic differences between hotel reviews, and in 2011, Myle Ott, Yejin Choi, Claire Cardie and Jeffrey T. Hancock came up with an automated process to determine the degree of truthfulness in customer reviews on Yelp and TripAdvisor.

Technical Details

We plan to get in touch with Myle Ott et al and see if we can get a copy of their code. This will give us a tool to extract and analyze customer reviews. Moreover, we will be modifying it in order to test our criteria to determine the degree of authenticity in a post.

We are also thinking in talking to faculty member in the department of Computational Linguistics to ask them what factors we are to look into when evaluating truthfulness in customer reviews, and what computational linguistics techniques are recommended for this.

In terms of the data we will use, we will be working on one or two review websites to focus on, for example, and


We will be aiming at creating a tool that will determine whether a customer review is truthful or fraudulent, and use it to analyze data in one or two sites.

The level of accuracy of our tool and our criteria will determine how successful we are doing this task.


M. Ott, Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.

Comment (Ben): Definitely an important area, and I'm glad to see that you have started looking at the related work. One of the big challenges here is having a reliable dataset. Getting Ott et al.'s data set may be even more valuable than their code. Also, given that they have already identified some linguistic characteristics of truthful reviews, my big question is - what makes you think you can do better than them? In order to pursue this, I think you need a solid idea of what technical approach you will take to make an advance here.