Social media analytics for investigations: A survey of recent trends, challenges and future research direction
DOI:
https://doi.org/10.29329/jsomer.57Keywords:
sentiment analysis, natural language procesing, machine learning, crime investigations, social media analyticsAbstract
Social media analytics has emerged as a critical tool for investigations, providing valuable insights into user interactions, trends, and behaviors across various platforms. With over 4.76 billion active social media users globally, this vast data source has proven useful in government investigations and criminal litigations. However, extracting reliable information from social media presents significant challenges, including data representativeness, ethical and legal concerns, user behavior interpretation, and the growing issue of misinformation. This paper reviews recent trends in social media analytics, particularly its application in investigations, highlighting the increasing use of machine learning for data processing and analysis. Our analysis reveals its transformative impact, enabling advanced evidence collection, suspect identification, criminal network mapping, and proactive crime prevention. However, significant challenges persist regarding data validity, bias minimization, ethical responsibilities, and misinformation. Additionally, the paper offers future research directions, emphasizing the need for standardized frameworks, adaptive ethical guidelines, AI-driven real-time multimodal analysis, and robust privacy protection measures. These advancements aim to enhance the reliability and effectiveness of social media analytics for investigative purposes.
References
Abkenar, S. B., Kashani, M. H., Mahdipour, E., & Jameii, S. M. (2021). Big data analytics meets social media: A systematic review of techniques, open issues, and future directions. Telematics and informatics, 57, 101517. https://doi.org/10.1016/j.tele.2020.101517
Abrar, M. F., Arefin, M. S., & Hossain, M. S. (2019). A framework for analyzing real-time tweets to detect terrorist activities. In the 2019 International Conference on Electrical, Computer, and Communication Engineering (ECCE) (p. 1-6). https://doi.org/10.1109/ECACE.2019.8679430
Adek, R. T., Ula, M., et al. (2021). Systematics review on the application of social media analytics for detecting radical and extremist group. In IOP conference Series: Materials Science and Engineering (Vol. 1071, p. 012029). https://doi.org/0.1088/1757-899X/1071/1/012029
Aghababaei, S., & Makrehchi, M. (2016). Mining social media content for crime prediction. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI) (p. 526-531). https://doi.org/10.1109/WI.2016.0089
Aichner, T., Grünfelder, M., Maurer, O., & Jegeni, D. (2021). Twenty-five years of social media: A review of social media applications and definitions from 1994 to 2019. Cyberpsychology, Behavior, and Social Networking, 24(4), 215-222. https://doi.org/10.1089/cyber.2020.0134
Alassafi, M., Alghamdi, W., Naveena, S., Alkhayyat, A., Tolib, A., & Ugli, I. (2023) Machine learning for predictive analytics in social media data. E3S Web of Conferences, 399. https://doi.org/10.1051/e3sconf/202339904046
Al Bashaireh, R., Zohdy, M., & Sabeeh, V. (2020). Twitter data collection and extraction: A method and a new dataset, the UTD-MI. In Proceedings of the 2020 the 4th International Conference on Information System and Data Mining (pp. 71–76). https://doi.org/10.1145/3404663.3404686
Alsaedi, N., Burnap, P., & Rana, O. (2017). Can we predict a riot? Disruptive event detection using twitter. ACM Transactions on Internet Technology 17(2). https://doi.org/10.1145/2996183
An, Y., Huang, Y., Danjuma, N. U., Apuke, O. D., & Tunca, E. A. (2023). Why do people spread fake news? Modelling the factors that influence social media users’ fake news sharing behaviour. Information Development, 41(1). https://doi.org/10.1177/02666669231194357
Apuke, O. D., Omar, B., Tunca, E. A., & Gever, C. V. (2022). Information overload and misinformation sharing behaviour of social media users: Testing the moderating role of cognitive ability. Journal of Information Science, 50(6) (pp 1371-1381). https://doi.org/10.1177/01655515221121942
Arunachalam, R., & Sarkar, S. (2013). The new eye of government: citizen sentiment analysis in social media. In Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP) (pp. 23–28). https://aclanthology.org/W13-4204/
Babu, N. V., & Kanaga, E. G. M. (2021, November). Sentiment analysis in social media data for depression detection using artificial intelligence: A review. SN Comput. Sci.,3(1). https://doi.org/10.1007/s42979-021-00958-1
Baier, A. L. (2019). The ethical implications of social media: Issues and recommendations for clinical practice. Ethics & Behavior, 29(5), 341–351. https://doi.org/10.1080/10508422.2018.1516148
Balaji, T., Annavarapu, C. S. R., & Bablani, A. (2021). Machine learning algorithms for social media analysis: A survey. Computer Science Review, 40. https://doi.org/10.1016/j.cosrev.2021.100395
Batrinca, B., & Treleaven, P. C. (2015). Social media analytics: A survey of techniques, tools and platforms. AI & Society, 30, 89–116. https://doi.org/10.1007/s00146-014-0549-4
Bayer, M., Kaufhold, M.-A., & Reuter, C. (2021). Information overload in crisis management: Bilingual evaluation of embedding models for clustering social media posts in emergencies. IN Deep Learning in Textual Low-Data Regimes for Cybersecurity. Technology, Peace and Security I Technologie, Frieden und Sicherheit. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-48778-2_6
Bkakria, A., Cuppens, F., Boulahia Cuppens, N., & Tasidou, A. (2021). Information theoretic-based privacy risk evaluation for data anonymization. Journal of Surveillance, Security and Safety, 2, 83–102. https://doi.org/10.20517/jsss.2020.20
Boateng, F., & Chenane, J. (2020, 06). Policing and social media: A mixed-method investigation of social media use by a small-town police department. International Journal of Police Science and Management, 22(3). https://doi.org/10.1177/1461355720927429
Bontcheva, K., Gorrell, G., & Wessels, B. (2013). Social media and information overload: Survey results. arXiv. https://doi.org/10.48550/arXiv.1306.0813
Boukabous, M., & Azizi, M. (2022). Crime prediction using a hybrid sentiment analysis approach based on the bidirectional encoder representations from transformers. Indonesian Journal of Electrical Engineering and Computer Science, 25(2), 1131-1139. https://doi.org/10.11591/ijeecs.v25.i2.pp1131-1139
Brady, L. L. (2016). Canaries in the ethical coal mine? Case vignettes and empirical findings for how psychology leaders have adopted Twitter. Ethics & Behavior, 26 (2), 110–127. https://doi.org/10.1080/10508422.2014.994064
Bright, D., Brewer, R., & Morselli, C. (2022). Reprint of: Using social network analysis to study crime: Navigating the challenges of criminal justice records. Social Networks, 69, 235–250. https://doi.org/10.1016/j.socnet.2022.01.008
Carlton, J., & Malik, H. (2024). A data privacy survey on personal identifiable information (PII) left on rental vehicle infotainment systems. Journal of Surveillance, Security and Safety, 5(4), pp(198-212). http://dx.doi.org/10.20517/jsss.2024.07
Chandra, N., Khatri, S. K., & Som, S. (2019). Natural language processing approach to identify analogous data in offline data repository. System Performance and Management Analytics, 65–76. Springer. https://doi.org/10.1007/978-981-10-7323-6_6
Collins, B., Hoang, D. T., Nguyen, N. T., & Hwang, D. (2021). Trends in combating fake news on social media–a survey. Journal of Information and Telecommunication, 5(2), 247–266. https://doi.org/10.1080/24751839.2020.1847379
Deng, Z., Weng, D., Liu, S., Tian, Y., Xu, M., & Wu, Y. (2023). A survey of urban visual analytics: Advances and future directions. Computational visual media, 9(1), 3–39. https://doi.org/10.1007/s41095-022-0275-7
Di Minin, E., Tenkanen, H., & Toivonen, T. (2015). Prospects and challenges for social media data in conservation science. Frontiers in Environmental Science, 3, 63. https://doi.org/10.3389/fenvs.2015.00063
Dong, X., & Lian, Y. (2021). A review of social media-based public opinion analyses: Challenges and recommendations. Technology in Society, 67, 101724. https://doi.org/10.1016/j.techsoc.2021.101724
Duffy, B. E., & Meisner, C. (2023). Platform governance at the margins: Social media creators’ experiences with algorithmic (in) visibility. Media, Culture & Society, 45(2), 285–304. https://doi.org/10.1177/01634437221111923
Duijn, P. A. C., & Klerks, P. P. H. M. (2014). Social network analysis applied to criminal networks: Recent developments in Dutch law enforcement. In: Masys, A. (eds) Networks and Network Analysis for Defence and Security. Lecture Notes in Social Networks (pp. 121–159). Springer, Cham: https://doi.org/10.1007/978-3-319-04147-6_6
Escamilla, C. A., Fraccastoro, K. A., & Ehrlich, E. (2019). The impact of social media on fraternal organizations: Ethical concerns. Journal of Business Case Studies (Online), 15(2), (pp. 45-54). https://doi.org/10.19030/JBCS.V15I2.10316
Fernández-Gavilanes, M., Costa-Montenegro, E., García-Méndez, S., González-Castaño, F. J., & Juncal-Martínez, J. (2021). Evaluation of online emoji description resources for sentiment analysis purposes. Expert Systems with Applications, 184, 115279. https://doi.org/10.1016/j.eswa.2021.115279
Gangarde, R., Sharma, A., & Pawar, A. (2022). Clustering approach to anonymize online social network data. In 2022 international conference on sustainable computing and data communication systems (ICSCDS) (pp. 1070–1076). IEEE. https://doi.org/10.1109/ICSCDS53736.2022.9760742
Gulyás, G. G., & Imre, S. (2013). Hiding information in social networks from de-anonymization attacks by using identity separation. In De Decker, B., Dittmann, J., Kraetzer, C., Vielhauer, C. (eds) Communications and Multimedia Security. CMS 2013. Lecture Notes in Computer Science, vol 8099. Springer, Berlin, Heidelberg). https://doi.org/10.1007/978-3-642-40779-6_15
Guo, M., Apuke, O. D., Tunca, E. A., & Gever, C. V. (2023). Modelling the information abundance factors that predict fake news sharing behaviour of social media users: Testing the moderating role of resilience. Journal of Asian and African Studies, 60(2). https://doi.org/10.1177/00219096231192312
Habibi, M., & Cahyo, P. W. (2021). A social network analysis: Identifying influencers in the COVID-19 vaccination discussion on twitter. Compiler, 10(2), 99–108. https://doi.org/10.28989/compiler.v10i2.1074
He, W., Tian, X., Tao, R., Zhang, W., Yan, G., & Akula, V. (2017). Application of social media analytics: A case of analyzing online hotel reviews. Online Information Review, 41(7), 921–935. https://doi.org/10.1108/OIR-07-2016-0201
Ho, I., Goh, H. N., & Tan, Y. F. (2022). Preprocessing impact on sentiment analysis performance on malay social media text. Journal of System and Management Sciences, 12(5), 73–90. https://doi.org/10.33168/JSMS.2022.0505
Hu, X., & Lovrich, N. (2019). Social media and the police: A study of organizational characteristics associated with the use of social media. Policing an International Journal of Police Strategies and Management, 42(4), 654-670. https://doi.org/10.1108/PIJPSM-09-2018-0139
Islam, T. U., Mohammed, N., & Alhadidi, D. (2024). Privacy preserving vertical distributed learning for health data. Journal of Surveillance, Security and Safety, 5(1), 1–18. https://doi.org/10.20517/jsss.2023.28
Jayachandran, S., & Dumala, A. (2023). Recurrent neural network based sentiment analysis of social media data during corona pandemic under national lockdown. Journal of Intelligent & Fuzzy Systems, 44 (2), 2131–2146. https://doi.org/10.3233/JIFS-221883
Jeong, U., Ding, K., & Liu, H. (2021). FBAdtTracker: An interactive data collection and analysis tool for Facebook advertisements. arXiv. https://doi.org/10.48550/arXiv.2106.00142
Kaufhold, M.-A., Rupp, N., Reuter, C., & Habdank, M. (2020). Mitigating information overload in social media during conflicts and crises: Design and evaluation of a cross-platform alerting system. Behaviour & Information Technology, 39 (3), 319–342. https://doi.org/10.1080/0144929X.2019.1620334
Khder, M. A. (2021). Web scraping or web crawling: State of art, techniques, approaches and application. International Journal of Advances in Soft Computing & Its Applications, 13(3), pp(145-168). https://doi.org/10.15849/IJASCA.211128.11
Krylov, D., Poliakov, S., Khanzhina, N., Zabashta, A., Filchenkov, A., & Farseev, A. (2021). Improving multimodal data labeling with deep active learning for post classification in social networks. In Multimedia understanding with less labeling on multimedia understanding with less labeling (pp. 17–25). https://doi.org/10.1145/3476098.3485055
Laxmi Narasamma, V., & Sreedevi, M. (2016, November). Modeling of tweet summarization systems using data mining techniques: A review report. Indian J. Sci. Technol., 9(44). https://doi.org/10.17485/ijst/2016/v9i44/102441
Liao, L., He, X., Ren, Z., Nie, L., Xu, H., & Chua, T.-S. (2017). Representativeness-aware aspect analysis for brand monitoring in social media. Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 2017 August 19 - 25. 310-316. https://doi.org/10.24963/ijcai.2017/44
Liu, T.-H., Ma, Z., & Xia, Y. (2024). Serving on WeChat: Understanding the logics of police’s engagement with the public in chinese contexts. International Journal of Law, Crime and Justice, 77, 100665. https://doi.org/10.1016/j.ijlcj.2024.100665
Mahoney, J., Le Louvier, K., & Lawson, S. (2022). The ethics of social media analytics in migration studies. In Information and communications technology in support of migration (pp. 333–346). Springer. https://doi.org/10.1007/978-3-030-93266-4_19
Majeed, A., Khan, S., & Hwang, S. O. (2022). A comprehensive analysis of privacy-preserving solutions developed for online social networks. Electronics, 11 (13), 1931. https://doi.org/10.3390/electronics11131931
Matthes, J., Karsay, K., Schmuck, D., & Stevic, A. (2020). “Too much to handle”: Impact of mobile social networking sites on information overload, depressive symptoms, and well-being. Computers in Human Behavior, 105, 106217. https://doi.org/10.1016/j.chb.2019.106217
McGarry, K. (2023). Analyzing social media data using sentiment mining and bigram analysis for the recommendation of YouTube videos. Information, 14(7), 408. https://doi.org/10.3390/info14070408
Mehta, H., & Passi, K. (2022). Social media hate speech detection using explainable artificial intelligence (XAI). Algorithms, 15(8), 291. https://doi.org/10.3390/a15080291
Melchior, C., & Oliveira, M. (2024). A systematic literature review of the motivations to share fake news on social media platforms and how to fight them. New Media & Society, 26 (2), 1127–1150. https://doi.org/10.1177/14614448231174224
Moussaoui, M., Zaghdoud, M., & Akaichi, J. (2019). A possibilistic framework for the detection of terrorism-related twitter communities in social media. Concurrency and Computation: Practice and Experience, 31(13), https://doi.org/10.1002/cpe.5077
Murić, G., Tregubov, A., Blythe, J., Abeliuk, A., Choudhary, D., Lerman, K., & Ferrara, E. (2020). Massive cross-platform simulations of online social networks. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (pp. 895–903).
Murphy, A., Justin P. Fontecilla. (2013). Social media evidence in government investigations and criminal proceedings: A frontier of new legal issues. 19 Rich. J.L. & Tech 11. Available at: https://scholarship.richmond.edu/jolt/vol19/iss3/4
Nirmal, A., Jiang, B., & Liu, H. (2023). SocioHub: An interactive tool for cross-platform social media data collection. arXiv. https://doi.org/10.48550/arXiv.2309.06525
Oliveira, L. S. D., Costa, W., Vaz De Melo, P. O. S., & Benevenuto, F. (2022). How politicians communicate in social media: A cross-platform study. In Proceedings of the Brazilian Symposium on Multimedia and the Web (pp. 75–83). https://doi.org/10.1145/3539637.3558232
Park, E., Chae, B., & Kwon, J. (2020). The structural topic model for online review analysis: Comparison between green and non-green restaurants. Journal of Hospitality and Tourism Technology, 11 (1), 1–17. https://doi.org/10.1108/JHTT-08-2017-0075
Pate, R., Patil, S., Patil, M., & Raut, R. (2023). Sentiment analysis of tweets using machine learning algorithms. In 2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC) (p. 1-6). https://doi.org/10.1108/JHTT-08-2017-0075
Pérez-Landa, G. I., Loyola-González, O., & Medina-Pérez, M. A. (2021). An explainable artificial intelligence model for detecting xenophobic tweets. Applied Sciences, 11(22). https://doi.org/10.3390/app112210801
Phengsuwan, J., Shah, T., Thekkummal, N. B., Wen, Z., Sun, R., Pullarkatt, D., Thirugnanam, H., Ramesh, M. V., Morgan, G., James, P., & Ranjan, R. (2021). Use of social media data in disaster management: A survey. Future Internet, 13(2), 46. https://doi.org/10.3390/fi13020046
Rashidi, T. H., Abbasi, A., Maghrebi, M., Hasan, S., & Waller, T. S. (2017). Exploring the capacity of social media data for modelling travel behaviour: Opportunities and challenges. Transportation Research Part C: Emerging Technologies, 75 , 197–211. https://doi.org/10.1016/j.trc.2016.12.008
Rathore, A. K., Kar, A. K., & Ilavarasan, P. V. (2017). Social media analytics: Literature review and directions for future research. Decision Analysis, 14(4), 229-249. https://psycnet.apa.org/doi/10.1287/deca.2017.0355
Reuter, K., & Lee, D. (2019). Evaluating patients’ perspectives on social media: the importance of clearly reporting data search, cleaning and processing. British Journal of Dermatology, 181 (1), 222. https://doi.org/10.1111/bjd.17868
Rødgaard, E.-M., Jensen, K., Miskowiak, K. W., & Mottron, L. (2022). Representativeness of autistic samples in studies recruiting through social media. Autism Research: Official journal of the International Society for Autism Research, 15 (8), 1447–1456. https://doi.org/10.1002/aur.2777
Rodríguez-Ibánez, M., Casánez-Ventura, A., Castejón-Mateos, F., & Cuenca-Jiménez, P.-M. (2023). A review on sentiment analysis from social media platforms. Expert Systems with Applications, 223. https://doi.org/10.1016/j.eswa.2023.119862
Ross, C. T., & Redhead, D. (2021). DieTryin: An R package for data collection, automated data entry, and post-processing of network-structured economic games, social networks, and other roster-based dyadic data. Behavior Research Methods, 54, 611-631. https://doi.org/10.3758/s13428-021-01606-5
Ross, C. T., & Redhead, D. (2023). Automatic entry and coding of social networks and dyadic peer ratings. Methodological Innovations, 16(2), 138–148. https://doi.org/10.1177/20597991231160281
Sabarky, M. A., Karyanta, N. A., & Anggarani, F. K. (2023). Information overload as a mediator in the relationship between instagram’s social media use intensity and social media fatigue in emerging adulthood. Jurnal ASPIKOM, 8(2), 305–318. http://dx.doi.org/10.24329/aspikom.v8i2.1274
Sabharwal, M., & Sharma, D. (2019). Sentiment analysis for social media using SVM classifier of machine learning. International Journal of Innovative Technology and Exploring Engineering, 8 , 39-47. https://doi.org/10.35940/ijitee.I1107.0789S419
Salminen, J., Mustak, M., Corporan, J., Jung, S.-G., & Jansen, B. J. (2022). Detecting pain points from user-generated social media posts using machine learning. Journal of Interactive Marketing, 57 (3), 517–539. https://doi.org/10.1177/10949968221095556
Shah, N., Bhagat, N., & Shah, M. (2021). Crime forecasting: a machine learning and computer vision approach to crime prediction and prevention. Visual Computing for Industry, Biomedicine, and Art, 4(1), 9. https://doi.org/10.1186/s42492-021-00075-z
Sharma, D. K., Singh, B., Agarwal, S., Kim, H., & Sharma, R. (2022). Sarcasm detection over social media platforms using hybrid auto-encoder-based model. Electronics, 11 (18), 2844. https://doi.org/10.3390/electronics11182844
Shchepina, E., & Surikov, A. (2022). Modeling the trajectories of interests and preferences of users in digital social systems. Procedia Computer Science, 212, 104–113. https://doi.org/10.1016/j.procs.2022.10.212
Shi, B., & Wang, H. (2023). An AI-enabled approach for improving advertising identification and promotion in social networks. Technological Forecasting and Social Change, 188. https://doi.org/10.1016/j.techfore.2022.122269
Smith, M., Szongott, C., Henne, B., & Von Voigt, G. (2012). Big data privacy issues in public social media. In 2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST) (pp. 1–6). https://doi.org/10.1109/DEST.2012.6227909
Sriram, A., Adhiraju, P. R., Kalangi, P. K., & Sathiyamoorthi, V. (2021). A comprehensive study of data analytics in social perspectives. In Challenges and applications of data analytics in social perspectives (pp. 257–274). IGI Global. https://doi.org/10.4018/978-1-7998-2566-1.ch014
Srivastava, S., Singh, M. K., & Singh, Y. N. (2021). Social media analytics: Current trends and future prospects. In Communication and intelligent systems: Proceedings of ICCIS 2020 (pp. 1005–1016). Springer. https://doi.org/10.1007/978-981-16-1089-9_78
Stieglitz, S., Mirbabaie, M., Ross, B., & Neuberger, C. (2018). Social media analytics – Challenges in topic discovery, data collection, and data preparation. International journal of information management, 39 , 156–168. https://doi.org/10.1016/j.ijinfomgt.2017.12.002
Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other methodological pitfalls. In Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 505–514. https://doi.org/10.1609/icwsm.v8i1.14517
Viswanath, B., Bashir, M. A., Crovella, M., Guha, S., Gummadi, K. P., Krishnamurthy, B., & Mislove, A. (2014). Towards detecting anomalous user behavior in online social networks. In 23rd USENIX Security Symposium (USENIX Security 14) (pp. 223–238).
Walsh, J. P., & O’Connor, C. (2019). Social media and policing: A review of recent research. Sociology Compass, 13 (1). https://doi.org/10.1111/soc4.12648
Yadav, A., Alahmar, M., Singh, A., Sharma, K., Agrawal, R., & Sharma, C. B. (2023). Analyzing user behavior in social media through big data analytics. In 2023 IEEE International Conference on ICT in Business Industry & Government (ICTBIG) (pp. 1–5). https://doi.org/10.1109/ICTBIG59752.2023.10456112
Yu, S. (2023). Social media intelligence: AI applications for criminal investigation and national security. In Handbook of Research on Artificial Intelligence Applications in Literary Works and Social Media (pp. 152–170). IGI Global. https://doi.org/10.4018/978-1-6684-6242-3.ch00
Yuan, Y., Wei, G., & Lu, Y. (2018). Evaluating gender representativeness of location-based social media: A case study of Weibo. Annals of GIS , 24 (3), 163–176. https://doi.org/10.1080/19475683.2018.1471518
Zhang, K., Geng, Y., Zhao, J., Liu, J., & Li, W. (2020). Sentiment analysis of social media via multimodal feature fusion. Symmetry, 12 (12), 2010. https://doi.org/10.3390/sym12122010
Zhang, Y. (2023, October). A study of the impact of information overload in social media in the simple medium network - The case of the university students majoring in communication studies. Communications in Humanities Research, 7 (1), 262–268. https://doi.org/10.54254/2753-7064%2F7%2F20230892
Zhou, X., Liang, X., Zhang, H., & Ma, Y. (2015). Cross-platform identification of anonymous identical users in multiple social media networks. IEEE Transactions on Knowledge and Data Engineering, 28 (2), 411–424. https://doi.org/10.1109/TKDE.2015.2485222
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sarbagya Shakya, Edgar Ceh-Varela

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
