In today avaibility of data, the rise of customer data in form of reviews and ratings given by them to estabelishment has given rise of opportunity to see and seek way how to harness them.
This research attempts a thorough analysis of each of the elements of customer reviews that contribute to the overall customer experience in providing theoretical and practical insights into the hospitality industry, especially the regional hotels. The purpose of this research is thus to understand the linguistic attributes of the customer ratings that lead to the satisfaction of guests and their dissatisfaction through the use of big data.
This research was conducted in relation to me being a member of research lab, I alongside other colleagules took upon project to see wether something can be done other than the usual sentiment analysis of positive and negative, and upon literatue review, we decided that expanding it by seeing linguistic proeprties of the review alongside user characteristic could help.
We decided that we would test and observe hotels and accomodation places on platform such as Traveloka and Airbnb
The key contribution of this study is that it combines the technological dimension with overall customer satisfaction, namely the descriptive style of online reviews. Following from the previous study, the research objective is to investigate the role of technical variables of online customer reviews, including subjectivity, diversity, readability, sentiment polarity, lexicon and length of review, in predicting overall customer satisfaction along with the role of customer review participation in influencing overall customer satisfaction.
Customer Overall Rating = β0+β1 Involvement +β2 Readability +β3 Polarity +β4 Subjectivity +β5 Word Count +β6 Lexicon + β7 Diversity + β8 Star Level
The hotel industry is constantly striving to please travelers, but travelers usually have different tastes depending on their destination, travel style and previous experiences. In the hospitality industry, there is a growing interest in using user-generated data to gain insights into research problems that traditional methods have not well understood. In particular, the experience and satisfaction of hotel guests has long been a subject of concern as it is generally accepted that they lead to consumer loyalty, repeat transactions, a positive word-of-mouth and eventually higher profitability.
Reviews provide information on the reviewer's experiences after visiting the place, as well as suggestions and tips for prospective tourists to consider before visiting. Reviews or feedback are very important for companies as well for the government because they can gain useful insights into goods and service. In the age of digital tourism, many clients book hotels online after their stay and post comments. These online reviews produce an electronic-word of- mouth or commonly known as eWOM impact in the context of both textual reviews (comments) and ratings, which influence future customer demand and the financial performance of hotels and therefore have significant business value.
Online reviews given by consumers are a significant aspect of the online business of hotels around the world, because they have interest in knowing the product and service quality of the consumers’ experiences. Via online feedback, accommodation owners try to understand the satisfaction of guests and their hotel expectations to enhance their marketing strategy and decision making.
Consumer satisfaction can be measured by the difference between perceived product or service quality and the expectations of pre-purchase customers. Many research has found that customer satisfaction plays an important role in encouraging behavioral engagement of consumers, which results in favorable feedback, returns as consumers or recommends the product or service to other.
Aside from numeric rating, the textual reviews are important to evaluate as well. The advantage of examining textual review is that it contains better reflection of customer experience and perception toward the service. These reviews have open structure that enable the author to express their opinion without formatting limitations and include each customer's linguistic style that might hint at the overall evaluation of the product
This study makes use of both quantitative and qualitative research techniques in determining which review linguistic element significantly affects the overall customer satisfaction level for Traveloka rooms in Bandung city. The satisfaction and dissatisfaction will be analyzed using (big data approach). The samples are collected from Traveloka website ranging from 1 to 5 star level properties which were limited into places that sport hundred plus reviews to reduce bias. The number of reviews in Bandung Raya ranges from 100 to more than 7,000 reviews per place. Total review data collected by the researcher in this study are 199,554 reviews among hotels, apartments, villas, resort and bed and breakfast properties. The reviews were then collected and compiled to be pre-processed.

Collected data would then be going through a series of pre-processing. Normalizing data will improve calculation performance and make data more conditioned so there is no bias. The research uses Min Max normalization. The advantage of this method is the comparative balance values between data before and after the normalization process and do not produce biased data. The data variables to be normalized are the length of review and lexicon as the data ranges are extreme and broadly differ compared to other variables.
The research uses textblob and lexicon processors to measure linguistics aspects of reviews. The collected data are compiled and fed into the linguistic processor that comprises textblob and lexicon processors. There are other processors considered such as NLTK or CoreNLP but the research needs are best fulfilled by textblob and lexicon processors. The linguistics properties results are then used to do regression analysis to find the effect toward customer satisfaction, which is measured by individual customer reviews. The regression analysis is multiple linear regression with seven linguistic attributes and control variable toward one dependent variable. The variable as follow

the results can be described as follows. The highest positive variable goes to polarity (β3: 2.475) and continues with diversity (β7: 0.800), star level (β8: 0.127), lexicon (β6: 0.048), and involvement (β1: 0.006). Meanwhile a negative variable, sorted from smallest, is readability (β2: -0.003), word count (β5: -0.087), and subjectivity (β4: -0.237).The variable of subjectivity has a negative effect towards customer ratings. This may be ascribed to greater online review subjectivity results in lower customer ratings. Subjective happens when customers express their emotions through online reviews. The variable of diversity has positive effects towards customer rating, this proven by greater diversity in online reviews contributes to higher ratings for consumers or on the other words greater diversity means that users use less repetitive words in their online reviews. The next variable is readability. Readability has negative effects on customer rating. Higher readability means that readers need a higher level of knowledge and comprehension to understand the text meaning.
comprehension to understand the text meaning. The variable of polarity shows positive effect towards customer ratings. Higher polarity of opinion contributes to higher customer ratings. The more intense the sentiment becomes, the more the readers believe, the more convincing the expression is. The variable word count negatively impacted customer ratings. This can be seen in wordy comments tend to give lower customer ratings. The variable review involvement has a positive effect on customer ratings. Higher customer engagement ratings make hotel's score higher. Customers with a greater involvement in the online review have stayed in more hotels, which makes comparing hotels easier for them. Lastly, this study adds a new variable called lexicon, where lexicon positively impacted customer ratings. Diversity of lexical choices and proper lexical type can influence the decision of the reader on a text. Lexical capability is an essential component of language skills and fluency
The theoretical implication of this study put forth confirmation of the relation between linguistics properties from the usual sentiment analysis to the diversity of lexicon used. Using linguistic characteristics makes the score given in the review more valuable and reflects the customer's aspects as it captures more of the insight than a simple score. This study confirms the theory of previous study that cited in literature review, this study also shown that linguistic properties effects are not a specific phenomenon for a specific platform or geolocation but transcend language nativity aspect, geolocation and platform. This study also highlighted the relation between lexical diversity in customer review with the overall score given, as an alternative or a supporting variable to sentiment analysis
There are three variables that show negative correlation to customer rating, which are readability, subjectivity and word count. Subjectivity shows a negatively impactful number toward customer rating by -0.24. A high value subjectivity would mean more indicative of more emotionally substanced reviews and tends to affect overall review score down. A good sentiment in the reviews correlates positively with the overall review score that a customer gave and the ones that write reviews often would be more compelled to write positively. Last significant factor is involvement in which it positively affects the overall end of scores. This research results has the ability for businesses to take upon measurements to analyse their customer satisfaction better and spend resources optimally by targeting specific linguistic criterion in order to quantify review of their customer. Accommodations and hospitality managers could take upon a new view of their customer by looking into the more detailed aspect of the review. Future research of this study could make do of broadening the sample base, an object or data source change and taking data from the provincial level. This study is limited to only one greater metropolitan area and its satellites cities and sourced from only one single online platform. This study is also limited to only specific common linguistic factors and while uses data from a user base that use English as a second language, does not take into account other potential variables such as cultural background. Further research could change objects into other hospitality industries other than accommodation such as culinary or amusement.