PRFashion24: A Dataset for Sentiment Analysis of Fashion Products Reviews in Persian

Read original: arXiv:2405.18060 - Published 5/29/2024 by Mehrimah Amirpour, Reza Azmi
Total Score

0

📊

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • The PRFashion24 dataset is a comprehensive Persian dataset collected from various online fashion stores, spanning from April 2020 to March 2024.
  • The dataset contains 767,272 reviews, making it the first of its kind in the Persian language that encompasses diverse categories within the fashion industry.
  • The study aimed to harness deep learning techniques, specifically Long Short-Term Memory (LSTM) networks and a combination of Bidirectional LSTM and Convolutional Neural Network (BiLSTM-CNN), to analyze and reveal sentiments towards online fashion shopping.

Plain English Explanation

The PRFashion24 dataset is a collection of over 767,000 reviews from Persian-language online fashion stores, gathered over the course of four years. This is the first dataset of its kind to provide such a comprehensive look at how people feel about shopping for fashion online in the Persian language.

The researchers used two different deep learning models, LSTM and BiLSTM-CNN, to analyze the sentiment expressed in these reviews. LSTM is a type of artificial neural network that is well-suited for processing sequential data, like text. BiLSTM-CNN combines a Bidirectional LSTM, which can analyze text from both directions, with a Convolutional Neural Network, which is good at identifying patterns in data.

The LSTM model was able to accurately predict the sentiment (positive or negative) of the reviews 81.23% of the time. The BiLSTM-CNN model performed even better, reaching an accuracy of 82.89%. These findings provide valuable insights into the attitudes and opinions of Persian-speaking online fashion shoppers, which tend to be more positive overall.

By making this dataset and the optimized models publicly available on GitHub, the researchers hope to contribute to the understanding of sentiment analysis in the Persian-language context, as well as inspire further research in this area.

Technical Explanation

The researchers collected the PRFashion24 dataset from various online Persian-language fashion stores, spanning a period of four years from April 2020 to March 2024. The dataset consists of 767,272 reviews, making it the first of its kind to provide such a comprehensive coverage of the Persian fashion industry.

To analyze the sentiment expressed in these reviews, the researchers employed two deep learning techniques: Long Short-Term Memory (LSTM) networks and a combination of Bidirectional LSTM and Convolutional Neural Network (BiLSTM-CNN). LSTM is a type of recurrent neural network that is well-suited for processing sequential data, such as text, and can capture long-term dependencies. The BiLSTM-CNN model combines the strengths of a Bidirectional LSTM, which can analyze text from both directions, and a Convolutional Neural Network, which is adept at identifying patterns in data.

The LSTM model achieved an accuracy of 81.23% in predicting the sentiment (positive or negative) of the reviews. The BiLSTM-CNN model performed even better, reaching an accuracy of 82.89%. These results demonstrate the effectiveness of deep learning techniques in analyzing sentiment in Persian-language fashion reviews.

By making both the optimized models and the PRFashion24 dataset publicly available on GitHub, the researchers aim to contribute to the understanding of sentiment analysis in the Persian-language context and inspire further research in this area.

Critical Analysis

The PRFashion24 dataset and the associated deep learning models provide valuable insights into the sentiments of Persian-speaking online fashion shoppers. However, there are a few areas that could be explored further.

One potential limitation is the representativeness of the dataset. While it is the first of its kind in the Persian language, it is still limited to a specific time frame and may not capture the full range of opinions and experiences across all demographics and geographic regions. Expanding the dataset to include a more diverse set of reviews over a longer period could help strengthen the findings.

Additionally, the study focuses solely on sentiment analysis, but it might be interesting to explore other aspects of the data, such as the relationships between product categories, brand preferences, and customer behavior. Incorporating these elements into the analysis could provide a more comprehensive understanding of the Persian fashion industry.

Furthermore, the researchers could consider applying other deep learning architectures or hybrid approaches, such as the use of attention mechanisms or multimodal models, to see if they can further improve the sentiment analysis performance.

Overall, the PRFashion24 dataset and the deep learning models presented in this study represent a valuable contribution to the field of sentiment analysis in the Persian-language context. By making the dataset and models publicly available, the researchers have opened the door for further research and exploration in this area.

Conclusion

The PRFashion24 dataset and the associated deep learning models provide a comprehensive analysis of sentiment in Persian-language online fashion reviews. With over 767,000 reviews spanning diverse fashion categories, this dataset is a significant step forward in understanding the attitudes and opinions of Persian-speaking online shoppers.

The LSTM and BiLSTM-CNN models developed in this study demonstrate the effectiveness of deep learning techniques in sentiment analysis, achieving accuracies of 81.23% and 82.89%, respectively. These findings not only contribute to the understanding of sentiment analysis in the Persian-language context but also have the potential to inform businesses and policymakers about the preferences and concerns of Persian-speaking fashion consumers.

By making the dataset and optimized models publicly available on GitHub, the researchers have opened the door for further exploration and innovation in this field. This research sets the stage for future studies that could delve deeper into the relationships between product categories, brand preferences, and customer behavior, as well as the application of more advanced deep learning architectures to enhance sentiment analysis in the Persian-language fashion domain.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Total Score

0

PRFashion24: A Dataset for Sentiment Analysis of Fashion Products Reviews in Persian

Mehrimah Amirpour, Reza Azmi

The PRFashion24 dataset is a comprehensive Persian dataset collected from various online fashion stores, spanning from April 2020 to March 2024. With 767,272 reviews, it is the first dataset in its kind that encompasses diverse categories within the fashion industry in the Persian language. The goal of this study is to harness deep learning techniques, specifically Long Short-Term Memory (LSTM) networks and a combination of Bidirectional LSTM and Convolutional Neural Network (BiLSTM-CNN), to analyze and reveal sentiments towards online fashion shopping. The LSTM model yielded an accuracy of 81.23%, while the BiLSTM-CNN model reached 82.89%. This research aims not only to introduce a diverse dataset in the field of fashion but also to enhance the public's understanding of opinions on online fashion shopping, which predominantly reflect a positive sentiment. Upon publication, both the optimized models and the PRFashion24 dataset will be available on GitHub.

Read more

5/29/2024

Prompt2Fashion: An automatically generated fashion dataset
Total Score

0

Prompt2Fashion: An automatically generated fashion dataset

Georgia Argyrou, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou

Despite the rapid evolution and increasing efficacy of language and vision generative models, there remains a lack of comprehensive datasets that bridge the gap between personalized fashion needs and AI-driven design, limiting the potential for truly inclusive and customized fashion solutions. In this work, we leverage generative models to automatically construct a fashion image dataset tailored to various occasions, styles, and body types as instructed by users. We use different Large Language Models (LLMs) and prompting strategies to offer personalized outfits of high aesthetic quality, detail, and relevance to both expert and non-expert users' requirements, as demonstrated by qualitative analysis. Up until now the evaluation of the generated outfits has been conducted by non-expert human subjects. Despite the provided fine-grained insights on the quality and relevance of generation, we extend the discussion on the importance of expert knowledge for the evaluation of artistic AI-generated datasets such as this one. Our dataset is publicly available on GitHub at https://github.com/georgiarg/Prompt2Fashion.

Read more

9/16/2024

🗣️

Total Score

0

New!Sentiment Analysis Dataset in Moroccan Dialect: Bridging the Gap Between Arabic and Latin Scripted dialect

Mouad Jbel, Mourad Jabrane, Imad Hafidi, Abdulmutallib Metrane

Sentiment analysis, the automated process of determining emotions or opinions expressed in text, has seen extensive exploration in the field of natural language processing. However, one aspect that has remained underrepresented is the sentiment analysis of the Moroccan dialect, which boasts a unique linguistic landscape and the coexistence of multiple scripts. Previous works in sentiment analysis primarily targeted dialects employing Arabic script. While these efforts provided valuable insights, they may not fully capture the complexity of Moroccan web content, which features a blend of Arabic and Latin script. As a result, our study emphasizes the importance of extending sentiment analysis to encompass the entire spectrum of Moroccan linguistic diversity. Central to our research is the creation of the largest public dataset for Moroccan dialect sentiment analysis that incorporates not only Moroccan dialect written in Arabic script but also in Latin letters. By assembling a diverse range of textual data, we were able to construct a dataset with a range of 20 000 manually labeled text in Moroccan dialect and also publicly available lists of stop words in Moroccan dialect. To dive into sentiment analysis, we conducted a comparative study on multiple Machine learning models to assess their compatibility with our dataset. Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing step. We were able to achieve 92% accuracy in our model and to further prove its liability we tested our model on smaller publicly available datasets of Moroccan dialect and the results were favorable.

Read more

9/16/2024

KazSAnDRA: Kazakh Sentiment Analysis Dataset of Reviews and Attitudes
Total Score

0

KazSAnDRA: Kazakh Sentiment Analysis Dataset of Reviews and Attitudes

Rustem Yeshpanov, Huseyin Atakan Varol

This paper presents KazSAnDRA, a dataset developed for Kazakh sentiment analysis that is the first and largest publicly available dataset of its kind. KazSAnDRA comprises an extensive collection of 180,064 reviews obtained from various sources and includes numerical ratings ranging from 1 to 5, providing a quantitative representation of customer attitudes. The study also pursued the automation of Kazakh sentiment classification through the development and evaluation of four machine learning models trained for both polarity classification and score classification. Experimental analysis included evaluation of the results considering both balanced and imbalanced scenarios. The most successful model attained an F1-score of 0.81 for polarity classification and 0.39 for score classification on the test sets. The dataset and fine-tuned models are open access and available for download under the Creative Commons Attribution 4.0 International License (CC BY 4.0) through our GitHub repository.

Read more

4/11/2024