Comparison of commercial decoder-only large language models for multilingual sentiment analysis of short text

Authors

  • John Burns University of St Andrews, Scotland, United Kingdom
  • Tom Kelsey University of St Andrews, Scotland, United Kingdom https://orcid.org/0000-0002-8091-1458

DOI:

https://doi.org/10.29329/jsomer.38

Keywords:

Multilingual Analysis, Sentiment Analysis, Social Media, Artificial Intelligence, Large Language Models

Abstract

This article explores multilingual sentiment analysis of short texts using three commercial decoder-only Large Language Models (“LLMs” ): OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Gemini. The training data for these models is approximately 90% English, and it remains an open question whether it is better to evaluate text data in its original language or translate it into English first. We build on previous research on sentiment analysis of multilingual short texts, such as those found on social media, using 1000 short text samples in seven languages (English, Spanish, French, Portuguese, Arabic, Japanese, and Korean) translated into English using Google Translate. We processed these samples with decoder-only LLMs and compared their results with those from other methods (encoder-only LLMs, RNNs, lexicons). We found that decoder-only LLMs achieved the highest accuracy across all sentiment analysis methods when working with the original language data. The only exception was with the French data, where an RNN was the most accurate. Among the three decoder-only LLMs, ChatGPT had the highest accuracy in four of the seven languages, Claude in two, and Gemini, which ranked second in six of the seven languages.

References

Research on 100 Million Tweets: What It Means for Your Social Media Strategy for Twitter. (2018) Vicinitas https://web.archive.org/web/20191122044543/https://www.vicinitas.io/blog/twitter-social-media-strategy-2018-research-100-million-tweets#language.

Abbas, A., Rehman, M. S., & Rehman, S. S. (2024). Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions. Cureus, vol. 16, no. 3, doi:10.7759/cureus.55991.

AI, Epoch. (2024). Notable Ai Models. Epoch AI, Published: June 1st, 2024. https://epochai.org/data/notable-ai-models?view=table#explore-the-data.

Alomari, K. (2016). Arabic-twitter-corpus-AJGT, GitHub, https://github.com/komari6/Arabic-twitter-corpus-AJGT.

Anthropic. (2024). Introducing the Next Generation of Claude. Announcements. Anthropic https://www.anthropic.com/news/claude-3-family. Accessed June 1st, 2024.

Barbieri, F., Anke, L. E., & Camacho-Collados, J. (2022) Xlm-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. edited by Cardiff NLP Snap Inc., ArXiv Cornell University, doi:10.485/arxiv.2104.12250.

Bello, H. J. (2021). Sentiment-Spanish. sentiment-analysis-sentiment. GitHub https://github.com/sentiment-analysis-spanish/sentiment-spanish.

Brand, J., Israeli, A., & Ngwe, D. (2023). Using GPT for Market Research. edited by Harvard Business School, https://www.hbs.edu/faculty/Pages/item.aspx?num=63859.

Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kalpan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Kruger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language Models Are Few-Shot Learners. edited by Johns Hopkins University & Open AI, https://arxiv.org/abs/2005.14165.

Burns, J. C. (2024). Automatic Evaluation of Geopolitical Risk. School of Computer Science, vol. PhD Doctor of Philosophy, University of St Andrews, general editor, Tom; Donovan Kelsey, Carl, doi:https://doi.org/10.17630/sta/918.

Dave, P. (2023). Chatgpt Is Cutting Non-English Languages out of the Ai Revolution. Wired, May 31st, 2023. https://www.wired.com/story/chatgpt-non-english-languages-ai-revolution/.

Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. edited by Google AI Language, ArXiv,, doi:10.48550/arxiv.1810.04805.

Dias, F. (2021). Brazilian Portuguese Sentiment Analysis Datasets Kaggle, https://www.kaggle.com/datasets/fredericods/ptbr-sentiment-analysis-datasets?select=olist.csv

Dreibelbis, E. (2024). Google Translate Vs. Chatgpt: Which One Is the Best Language Translator? Contented Traveler. PCMag https://www.pcmag.com/articles/google-translate-vs-chatgpt-which-is-the-best-language-translator. Accessed June 1st, 2024.

Duarte, F. (2025). Number of Chatgpt Users (October 2025). explodingtopics. https://explodingtopics.com/blog/chatgpt-users. Accessed October 17th, 2025.

Fatouros, G., Soldatos, J., Kouroumali, K., Makridis, G., & Kyriazis, D. (2023) Transforming Sentiment Analysis in the Financial Domain with Chatgpt. Machine Learning with Applications, vol. 14, https://www.sciencedirect.com/science/article/pii/S2666827023000610.

Ferrer, J. (2024). Everything We Know About Gpt-5. Artificial Intelligence (AI). Data Camp https://www.datacamp.com/blog/everything-we-know-about-gpt-5. Accessed June 1st, 2024.

Gamebusterz (2017) xae, French-Sentiment-Analysis-Dataset, GitHub, https://github.com/gamebusterz/French-Sentiment-Analysis-Dataset/blob/master/xae

Gamebusterz (2017) xak, French-Sentiment-Analysis-Dataset, GitHub, https://github.com/gamebusterz/French-Sentiment-Analysis-Dataset/blob/master/xak

Google (2024). Gemini Apps Faq. Google Gemini https://gemini.google.com/faq?gad_source=1&gclid=EAIaIQobChMIy9ffvv7jhgMV8FBHAR3GfAjyEAAYASABEgKyf_D_BwE. Accessed June 1st, 2024.

Hammer, B. (2016). crowdflower-airline-twitter-sentiment, Kaggle, crowdflower.com, https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

Hoque, M. (2023). A Comprehensive Overview of Transformer-Based Models: Encoders, Decoders, and More. Medium, https://medium.com/@minh.hoque/a-comprehensive-overview-of-transformer-based-models-encoders-decoders-and-more-e9bc0644a4e5. Accessed June 1st, 2024.

Hutto, C. J., & Gilbert, E. (2014). Vader: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. International AAAI Conference on Weblogs and Social Media (ICWSM), http://eegilbert.org/papers/icwsm14.vader.hutto.pdf.

IBM. (2024) What Are Large Language Models (Llms)? IBM https://www.ibm.com/topics/large-language-models. Accessed June 1st, 2024.

Kheiri, K., & Karimi, H. (2023) Sentimentgpt: Exploiting Gpt for Advanced Sentiment Analysis and Its Departure from Current Machine Learning. edited by Utah State University, ArXiv, 2023. https://arxiv.org/abs/2307.10234.

Koto, F., Beck, T., Talat, Z., Gurevych, I., & Baldwin, T. (2024) Zero-Shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon. edited by MBZUAI Department Natural Language Processing, ArXiv, https://arxiv.org/abs/2402.02113.

Kubo, T., Nakayama, H., & Kamura, J. (2018) chABSA-dataset Github, https://github.com/chakki-works/chABSA-dataset

Leechewyuwasorn, D. & Wangpratahm, N. (2024) Comparative Analysis of Financial Sentiment Analysis Models for the Thai Stock Market: Traditional NLP vs. GPT vs. Gemini. QuantCorner Laboratory https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4921837

Leading countries based on number of X (formerly Twitter) users as of February 2025 (2025). Statista https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019) Bart: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. edited by Facebook AI, ArXiv, https://arxiv.org/pdf/1910.13461.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O, Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019) Roberta: A Robustly Optimized Bert Pretraining Approach. edited by Facebook AI, ArXiv. https://arxiv.org/pdf/1907.11692.

Luna, J. C. (2024) The Top 10 Chatgpt Alternatives You Can Try Today. Artifical Intelligence. datacamp https://www.datacamp.com/blog/10-chatgpt-alternatives. Accessed June 1st, 2024.

Mykhalevych, N. (2024) How Accurate is Google Translate? Preply. https://preply.com/en/blog/how-accurate-is-google-translate/

Miah, M.S.U., Kabir, M.M., Sarwar, T.B., Safran, M., Alfarhood, S., & Mridha, M. F. (2024) A Multimodal Approach to Cross-Lingual Sentiment Analysis with Ensemble of Transformer and Llm. Scientific Reports, vol. 14, doi:https://doi.org/10.1038/s41598-024-60210-7.

MindsDB, M., P. C., & Slawinska, M. (2024). Updated June 2024: A Comparative Analysis of Leading Large Language Models. MindsDB https://mindsdb.com/blog/navigating-the-llm-landscape-a-comparative-analysis-of-leading-large-language-models. Accessed June 1st, 2024.

Mondal, A. (2021). The Importance of Community Engagement on Covid-19 Vaccination Strategy: Lessons from Two California Pilot Programs. EClinicalMedicine, Part of The Lancet Discovery Science, vol. 32, https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(21)00034-1/fulltext.

Naveed, H., Khan, A. U., Qui, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024) A Comprehensive Overview of Large Language Models. edited by Lahore University of Engineering and Technology (UET), Pakistan, ArXiv, https://arxiv.org/pdf/2307.06435.

Nicholas, G., & Bhatia, A. (2023) Lost in Translation: Large Language Models in Non-English Content Analysis. AI Policy & Goverance, Center for Democracy & Technology, https://cdt.org/wp-content/uploads/2023/05/non-en-content-analysis-primer-051223-1203.pdf.

Ochieng, M., Gumma, V., Sitaram, S., Wang, J., Chaudary, V., Ronen, K., Bali, K., & O’Neill, J (2024) Beyond Metrics: Evaluating Llms’ Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios. edited by University of Washington Microsoft Corporation, ArXiv, https://arxiv.org/html/2406.00343v2.

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I, Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., Bello, I.,… Zoph, B. (2023) Gpt-4 Technical Report. edited by Open AI, ArXiv, https://arxiv.org/pdf/2303.087742.

Park, L. (2015) Nsmc, GitHub https://github.com/e9t/nsmc.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, L., Li, W., & Liu, P (2020) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, vol. 21, pp. 1-67, https://arxiv.org/pdf/1910.10683.

Rodríguez-Ibánez, M., Casánez-Ventura, A., Castejón-Mateos, F., & Cuenca-Jiménez, P.M. (2023) A Review on Sentiment Analysis from Social Media Platforms. Expert Systems with Applications, https://www.sciencedirect.com/science/article/pii/S0957417423003639.

Rusnachenko, N., Golubev, A., & Loukachevitch, N. (2024) Large Language Models in Targeted Sentiment Analysis. edited by England Newcastle Upon Tyne, United Kingdom, ArXiv, https://arxiv.org/abs/2404.12342.

Sherstinsky, A. (2020) Fundamentals of Recurrent Neural Network (Rnn) and Long Short-Term Memory (Lstm) Network. Physica D: Nonlinear Phenomena, vol. 404, https://arxiv.org/pdf/1808.03314.

Staff, Coursera. (2024) Bert Vs. Gpt: What’s the Difference? AI and Machine Learning. Coursera https://www.coursera.org/articles/bert-vs-gpt. Accessed June 1st, 2024.

Sudhir, P., & Suresh, V. D. (2021) Comparative Study of Various Approaches, Applications and Classifiers for Sentiment Analysis. Global Transitions Proceedings, vol. 2, no. 2, pp. 205-11, doi:https://doi.org/10.1016/j.gltp.2021.08.004.

Taira, B. R., Kreger, V., Orue, A., & Diamond, L. C. (2021). A Pragmatic Assessment of Google Translate for Emergency Department Instructions. Journal of General Internal Medicine, vol. 36. https://link.springer.com/article/10.1007/s11606-021-06666-z

Tiedemann, J., & Thottingal, S. (2020). Opus-Mt — Building Open Translation Services for the World. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT), https://github.com/Helsinki-NLP/Opus-MT.

Thompson, A. (2024). The Memo - Special Edition: Claude 3 Opus. LifeArchitect https://lifearchitect.substack.com/p/the-memo-special-edition-claude-3. Accessed June 1st, 2024.

Uddin, M. A., & Sarker, I. H. (2024). An Explainable Transformer-Based Model for Phishing Email Detection: A Large Language Model Approach. edited by BGC Trust University Bangladesh Department of Computer Science and Engineering, ArXiv. https://arxiv.org/abs/2402.13871.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I (2017). Attention Is All You Need. edited by Google Brain Google Research, ArXiv, https://arxiv.org/abs/1706.03762.

Vivien, L. (2025). Inside Google Translate – The Technology Behind. La Vivien Post. https://www.lavivienpost.com/google-translate-and-transformer-model/ Accessed October 17th, 2025

Wahidur, R. S., Tashdeed, I., Kaur, M. & Lee, H. N. (2024). Enhancing Zero-Shot Crypto Sentiment with Fine-Tuned Language Model and Prompt Engineering. IEEE Access, https://ieeexplore.ieee.org/iel7/6287639/10380310/10382518.pdf.

Wendler, C., Veselovsky, V., Monea, G., & West, R. (2024). Do Llamas Work in English? On the Latent Language of Multilingual Transformers." ArXiv, doi:https://doi.org/10.48550/arXiv.2402.10588.

WhyLabs, I. (2024). Sentiment Analysis with Large Language Models (Llms). Use Cases of Large Language Models (LLMs). WhyLabs Inc https://whylabs.ai/learning-center/llm-use-cases/sentiment-analysis-with-large-language-models-llms.

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., & Norouzi, M. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. edited by Google, ArXiv, https://arxiv.org/pdf/1609.08144.

Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., & Raffel, C. (2021). Mt5: A Massively Multilingual Pre-Trained Text-to-Text Transformer. edited by Google Research, ArXiv, https://arxiv.org/abs/2010.11934.

Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., Cui, Y., Zhou, Z., Gong, C., Shen, Y., Zhou, J., Chen, S., Gui, T., Zhang, Q., & Huang, X (2023). A Comprehensive Capability Analysis of Gpt-3 and Gpt-3.5 Series Models. edited by Fudan University, ArXiv, https://arxiv.org/abs/2303.10420.

Zelieska, L. (2024). Top Models for Natural Language Understanding (Nlu) Usage. Quantpedia https://quantpedia.com/top-models-for-natural-language-understanding-nlu-usage/#:~:text=RoBERTa%20(A%20Robustly%20Optimized%20BERT,extensive%20and%20optimized%20pretraining%20process.

Zhang, W., Deng, Y., Liu, B., Pan, S. J., & Bing, L. (2023). Sentiment Analysis in the Era of Large Language Models: A Reality Check. Edited by Alibaba and Nanyang Technological University, ArXiv, https://arxiv.org/abs/2305.15005 .

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y,, Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X,. Liu, Z., Liu, Z., Liu, P., Nie, J. Y., & Wen, J. R. (2023) A Survey of Large Language Models. Edited by Renmin University of China, Gaoling School of Artificial Intelligence and School of Information, ArXiv. https://arxiv.org/pdf/2303.18223.

Published

04.12.2025

How to Cite

Burns, J., & Kelsey, T. (2025). Comparison of commercial decoder-only large language models for multilingual sentiment analysis of short text. Journal of Social Media Research, 2(4), 319–331. https://doi.org/10.29329/jsomer.38

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.