"AI can enhance data collection, analysis, prediction, and evaluation processes"

25 Jun 2024


(1) Hamid Reza Saeidnia, Department of Information Science and Knowledge Studies, Tarbiat Modares University, Tehran, Islamic Republic of Iran;

(2) Elaheh Hosseini, Department of Information Science and Knowledge Studies, Faculty of Psychology and Educational Sciences, Alzahra University, Tehran, Islamic Republic of Iran;

(3) Shadi Abdoli, Department of Information Science, Université de Montreal, Montreal, Canada

(4) Marcel Ausloos, School of Business, University of Leicester, Leicester, UK and Bucharest University of Economic Studies, Bucharest, Romania.

Abstract and Introduction

Materials and Methods


RQ 1: AI and scientometrics

RQ 2: AI and webometrics

RQ 3: AI and bibliometrics


RQ 4: Future of Scientometrics, Webometrics, and Bibliometrics with AI

RQ 5: Ethical Considerations of Scientometrics, Webometrics, and Bibliometrics with AI

Conclusion, Limitations, and References


In this report, we emphasize the importance and potential of integrating AI algorithms with scientometrics, webometrics, and bibliometrics, through numerous examples in the literature. The paradigm shift undergone by AI algorithms in these fields has been shown to have revealed new possibilities for analysis, prediction, and pattern mining-based recommendations. Within this review, the paper contributes to underscoring the prominent prospects and value of integrating AI in scientometrics, webometrics, and bibliometrics, i.e., to signify the synergy that can be achieved and fostered through this integration.

In brief, AI helps scientometrics by providing efficient and accurate methods to analyze and derive insights from scientific publications, citation networks, and collaborative relationships. This should enable researchers to gain a deeper understanding of scientific knowledge, trends, and impact, facilitating better decision-making and advances in scientific research. Moreover, AI enhances webometrics by providing efficient and automated methods to analyze web-based scientific data, understand link structures and social interactions, assess web impact, and provide personalized recommendations. This enables researchers to gain insights into the web-based scientific ecosystem, facilitate collaborations, and improve research visibility and impact in the digital age. In addition, AI enhances the bibliometrics field of activities by automating data collection, providing accurate author disambiguation, analyzing citation networks, assessing research impact, and providing personalized recommendations. This enables researchers to gain insights into scholarly communication, assess research performance, and make informed decisions in their bibliometric analyses. Overall, AI presents an efficient and scalable approach to scientometrics, webometrics, and bibliometrics, enabling researchers to extract meaningful insights from vast and diverse sources of scientific information.

In conclusion, the integration of artificial intelligence (AI) into scientometrics, webometrics, and bibliometrics holds significant potential for advancing research and understanding in these fields. AI can enhance data collection, analysis, prediction, and evaluation processes, providing researchers with valuable insights and improving decision-making processes.

However, the use of AI in these areas also raises important ethical considerations that must be carefully addressed. Data privacy and security, bias, and fairness, transparency and explainability, accountability and responsibility, informed consent, impact on employment and society, and continuous monitoring and evaluation are among the key ethical considerations that should be taken into account. To ensure the responsible and ethical use of AI, interdisciplinary collaboration, stakeholder engagement, and ongoing evaluation are crucial. Researchers, policymakers, ethicists, and stakeholders from various fields should work together to develop guidelines, frameworks, and best practices that promote ethical AI use in scientometrics, webometrics, and bibliometrics. By addressing these ethical considerations, we can harness the full potential of AI to advance knowledge, improve research practices, and contribute to the betterment of society while ensuring fairness, transparency, and accountability in the use of these technologies.


In this particular study, we did not include the gray literature in our search and review process, nor did we manually search in Google Scholar. Instead, our intention was to focus on searching in reliable databases. While Google Scholar is often referred to as a database, it is actually a search engine that may not include high-quality articles and may only retrieve reliable studies. By not searching in Google Scholar, we aimed to minimize the number of overlapping studies.

However, it is important to note that this highly technical approach may have resulted in overlooking certain articles, which could regretfully lead to our study excluding relevant information. We consider that up to the time of writing and submitting this paper, we safeguard against much omission. Yet, for future studies, it may be beneficial to conduct a comprehensive review that includes the gray literature, in order to provide readers with a broader perspective.


  1. Darko A, Chan AP, Adabre MA, Edwards DJ, Hosseini MR, Ameyaw EE. Artificial intelligence in the AEC industry: Scientometric analysis and visualization of research activities. Automation in Construction. 2020;112:103081.

  2. Park S, Park HW. A webometric network analysis of electronic word of mouth (eWOM) characteristics and machine learning approach to consumer comments during a crisis. Profesional de la Información. 2020;29(5).

  3. Van Raan A. Scientometrics: State-of-the-art. Scientometrics. 1997;38(1):205-18.

  4. Bharvi D, Garg K, Bali A. Scientometrics of the international journal Scientometrics. Scientometrics. 2003;56(1):81-93.

  5. Thelwall M, Vaughan L, Björneborn L. Webometrics. Annual review of information science and technology. 2005;39(1):81-135.

  6. Björneborn L, Ingwersen P. Perspective of webometrics. Scientometrics. 2001;50:65-82.

  7. McBurney MK, Novak PL, editors. What is bibliometrics and why should you care? Proceedings IEEE international professional communication conference; 2002: IEEE.

  8. Cooper ID. Bibliometrics basics. Journal of the Medical Library Association: JMLA. 2015;103(4):217.

  9. Xu Y, Liu X, Cao X, Huang C, Liu E, Qian S, et al. Artificial intelligence: A powerful paradigm for scientific research. The Innovation. 2021;2(4).

  10. Melnikova E. Big data technology in the set of methods and means of scientific research in modern scientometrics. Scientific and Technical Information Processing. 2022;49(2):102-7.

  11. Tapeh ATG, Naser M. Artificial intelligence, machine learning, and deep learning in structural engineering: a scientometrics review of trends and best practices. Archives of Computational Methods in Engineering. 2023;30(1):115-59.

  12. Saeidnia H. Using ChatGPT as a Digital/Smart Reference Robot: How May ChatGPT Impact Digital Reference Services? Information Matters. 2023;2(5).

  13. Saeidnia H. Open AI, ChatGPT: To Be, or Not to Be, That Is the Question. Information Matters. 2023;3(6).

  14. Yuan S, Shao Z, Wei X, Tang J, Hall W, Wang Y, et al. Science behind AI: The evolution of trend, mobility, and collaboration. Scientometrics. 2020;124:993-1013.

  15. Chaudhuri N, Gupta G, Vamsi V, Bose I. On the platform but will they buy? Predicting customers' purchase behavior using deep learning. Decision Support Systems. 2021;149:113622.

  16. G. Martín A, Fernández-Isabel A, Martín de Diego I, Beltrán M. A survey for user behavior analysis based on machine learning techniques: current models and applications. Applied Intelligence. 2021;51(8):6029-55.

  17. Saeidnia HR. Ethical artificial intelligence (AI): confronting bias and discrimination in the library and information industry. Library Hi Tech News. 2023;ahead-of-print(ahead-of-print). doi: 10.1108/LHTN-10-2023-0182.

  18. Hain D, Jurowetzki R, Lee S, Zhou Y. Machine learning and artificial intelligence for science, technology, innovation mapping and forecasting: Review, synthesis, and applications. Scientometrics. 2023;128(3):1465-72.

  19. Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Annals of internal medicine. 2018;169(7):467-

  20. Epub 2018/09/05. doi: 10.7326/m18-0850. PubMed PMID: 30178033.

  21. Holzmann GJ, Peled DA, Yannakakis M. On nested depth first search. The Spin Verification System. 1996;32:81-9.

  22. Donthu N, Kumar S, Mukherjee D, Pandey N, Lim WM. How to conduct a bibliometric analysis: An overview and guidelines. Journal of business research. 2021;133:285-96.

  23. Caputo A, Kargina M. A user-friendly method to merge Scopus and Web of Science data during bibliometric analysis. Journal of Marketing Analytics. 2022;10(1):82-8.

  24. Chen X, Chen J, Cheng G, Gong T. Topics and trends in artificial intelligence assisted human brain research. PLoS One. 2020;15(4):e0231192.

  25. Abrishami A, Aliakbary S. Predicting citation counts based on deep neural network learning techniques. Journal of Informetrics. 2019;13(2):485-99.

  26. Corea F, Corea F. AI knowledge map: How to classify AI technologies. An introduction to data: Everything you need to know about AI, big data and data science. 2019:25-9.

  27. Jebari C, Herrera-Viedma E, Cobo MJ. The use of citation context to detect the evolution of research topics: a large-scale analysis. Scientometrics. 2021;126(4):2971-89.

  28. Ma A, Liu Y, Xu X, Dong T. A deep-learning based citation count prediction model with paper metadata semantic features. Scientometrics. 2021;126(8):6803-23.

  29. Maghsoudi M, Shokouhyar S, Ataei A, Ahmadi S, Shokoohyar S. Co-authorship network analysis of AI applications in sustainable supply chains: Key players and themes. Journal of cleaner production. 2023;422:138472.

  30. Pedro F, Subosa M, Rivas A, Valverde P. Artificial intelligence in education: Challenges and opportunities for sustainable development. 2019.

  31. Ullah M, Shahid A, Roman M, Assam M, Fayaz M, Ghadi Y, et al. Analyzing interdisciplinary research using Co-authorship networks. Complexity. 2022;2022.

  32. Zhao Q, Feng X. Utilizing citation network structure to predict paper citation counts: A Deep learning approach. Journal of Informetrics. 2022;16(1):101235.

  33. Saeidnia HR, Kozak M, Lund B, Mannuru NR, Keshavarz H, Elango B, et al. Design, Development, Implementation, and Evaluation of a Mobile Application for Academic Library Services: A Study in a Developing Country. Information Technology and Libraries. 2023;42(3).

  34. Soleymani H, Saeidnia HR, Ausloos M, Hassanzadeh M. Selective dissemination of information (SDI) in the age of artificial intelligence (AI). Library Hi Tech News. 2023;ahead-of-print(ahead-of-print). doi: 10.1108/LHTN-08-2023-0156.

  35. Fazeli-Varzaneh M, Ghorbi A, Ausloos M, Sallinger E, Vahdati S. Sleeping beauties of coronavirus research. Ieee Access. 2021;9:21192-205.

  36. Ceptureanu S, Cerqueti R, Alexandru A, Popescu D, Dhesi G, Ceptureanu E. Influence of blockchain adoption on technology transfer, performance and supply chain integration, exibility and responsiveness. A case study from IT&C medium size enterprises. Studies in Informatics and Control. 2021;30(3):61-74.

  37. Amjad S, Younas M, Anwar M, Shaheen Q, Shiraz M, Gani A. Data mining techniques to analyze the impact of social media on academic performance of high school students. Wireless Communications and Mobile Computing. 2022;2022:1-11.

  38. Barclay I, Taylor H, Preece A, Taylor I, Verma D, de Mel G. A framework for fostering transparency in shared artificial intelligence models by increasing visibility of contributions. Concurrency and Computation: Practice and Experience. 2021;33(19):e6129.

  39. Grover P, Kar AK, Dwivedi YK. Understanding artificial intelligence adoption in operations management: insights from the review of academic literature and social media discussions. Annals of Operations Research. 2022;308(1-2):177-213.

  40. Khder MA. Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application. International Journal of Advances in Soft Computing & Its Applications. 2021;13(3).

  41. Maulud DH, Zeebaree SR, Jacksi K, Sadeeq MAM, Sharif KH. State of art for semantic analysis of natural language processing. Qubahan academic journal. 2021;1(2):21-8.

  42. Serafini F, Reid SF. Multimodal content analysis: expanding analytical approaches to content analysis. Visual Communication. 2019:1470357219864133.

  43. Serrano W. Neural networks in big data and Web search. Data. 2018;4(1):7.

  44. Wang W, Yu L. UCrawler: a learning-based web crawler using a URL knowledge base. Journal of Computational Methods in Sciences and Engineering. 2021;21(2):461-74.

  45. Wu L, Dodoo NA, Wen TJ, Ke L. Understanding Twitter conversations about artificial intelligence in advertising based on natural language processing. International Journal of Advertising. 2022;41(4):685-702.

  46. Zhang Q, Lu J, Jin Y. Artificial intelligence in recommender systems. Complex & Intelligent Systems. 2021;7(1):439-57. doi: 10.1007/s40747-020-00212-w.

  47. Xu Y, Liu X, Cao X, Huang C, Liu E, Qian S, et al. Artificial intelligence: A powerful paradigm for scientific research. The Innovation. 2021;2(4):100179. doi: https://doi.org/10.1016/j.xinn.2021.100179.

  48. Abramo G, D’Angelo CA. How reliable are unsupervised author disambiguation algorithms in the assessment of research organization performance? Quantitative Science Studies. 2023:1-26.

  49. Al-Jamimi HA, BinMakhashen GM, Bornmann L. Use of bibliometrics for research evaluation in emerging markets economies: a review and discussion of bibliometric indicators. Scientometrics. 2022;127(10):5879-930.

  50. Cox AM, Mazumdar S. Defining artificial intelligence for librarians. Journal of Librarianship and Information Science.0(0):09610006221142029. doi: 10.1177/09610006221142029.

  51. Eisenstein J. Introduction to natural language processing: MIT press; 2019.

  52. Kang Y, Cai Z, Tan C-W, Huang Q, Liu H. Natural language processing (NLP) in management research: A literature review. Journal of Management Analytics. 2020;7(2):139-72.

  53. Loan FA, Nasreen N, Bashir B. Do authors play fair or manipulate Google Scholar h-index? Library Hi Tech. 2022;40(3):676-84.

  54. Rehs A. A supervised machine learning approach to author disambiguation in the Web of Science. Journal of Informetrics. 2021;15(3):101166.

  55. Mohammadzadeh Z, Ausloos M, Saeidnia HR. ChatGPT: high-tech plagiarism awaits academic publishing green light. Non-fungible token (NFT) can be a way out. Library Hi Tech News. 2023.

  56. Saeidnia HR, Lund BD. Non-fungible tokens (NFT): a safe and effective way to prevent plagiarism in scientific publishing. Library Hi Tech News. 2023;40(2):18-9.

  57. Mrowinski MJ, Fronczak P, Fronczak A, Ausloos M, Nedic O. Artificial intelligence in peer review: How can evolutionary computation support journal editors? PloS one. 2017;12(9):e0184711.

  58. Piva F, Tartari F, Giulietti M, Aiello MM, Cheng L, Lopez-Beltran A, et al. Predicting future cancer burden in the United States by artificial neural networks. Future Oncology. 2020;17(2):159-68.

  59. Brewer R, Westlake B, Hart T, Arauza O. The Ethics of Web Crawling and Web Scraping in Cybercrime Research: Navigating Issues of Consent, Privacy, and Other Potential Harms Associated with Automated Data Collection. In: Lavorgna A, Holt TJ, editors. Researching Cybercrimes: Methodologies, Ethics, and Critical Approaches. Cham: Springer International Publishing; 2021. p. 435-56.

  60. Alaidi AHM, Roa’a M, ALRikabi H, Aljazaery IA, Abbood SH. Dark web illegal activities crawling and classifying using data mining techniques. iJIM. 2022;16(10):123.

  61. Thomas DM, Mathur S, editors. Data analysis by web scraping using python. 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA); 2019: IEEE.

  62. Korkmaz M, Sahingoz OK, Diri B, editors. Detection of phishing websites by using machine learning-based URL analysis. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT); 2020: IEEE.

  63. Yuan H, Chen X, Li Y, Yang Z, Liu W, editors. Detecting phishing websites and targets based on URLs and webpage links. 2018 24th International Conference on Pattern Recognition (ICPR); 2018: IEEE.

  64. Dutta AK. Detecting phishing websites using machine learning technique. PloS one. 2021;16(10):e0258361.

  65. Jalil S, Usman M, Fong A. Highly accurate phishing URL detection based on machine learning. Journal of Ambient Intelligence and Humanized Computing. 2023;14(7):9233-51.

  66. Kiesel J, Meyer L, Kneist F, Stein B, Potthast M, editors. An empirical comparison of web page segmentation algorithms. European Conference on Information Retrieval; 2021: Springer.

  67. Balaji T, Annavarapu CSR, Bablani A. Machine learning algorithms for social media analysis: A survey. Computer Science Review. 2021;40:100395.

  68. Rietz T. Designing AI-based systems for qualitative data collection and analysis. 2021.

  69. Nicholson JM, Mordaunt M, Lopez P, Uppala A, Rosati D, Rodrigues NP, et al. Scite: A smart citation index that displays the context of citations and classifies their intent using deep learning. Quantitative Science Studies. 2021;2(3):882-98.

  70. Mihaljević H, Santamaría L. Disambiguation of author entities in ADS using supervised learning and graph theory methods. Scientometrics. 2021;126(5):3893-917. doi: 10.1007/s11192-021-03951-w.

  71. Tekles A, Bornmann L. Author name disambiguation of bibliometric data: A comparison of several unsupervised approaches1. Quantitative Science Studies. 2020;1(4):1510-28. doi: 10.1162/qss_a_00081.

  72. Grodzinski N, Grodzinski B, Davies BM. Can co-authorship networks be used to predict author research impact? A machine-learning based analysis within the field of degenerative cervical myelopathy research. Plos one. 2021;16(9):e0256997.

  73. Fonseca Bde P, Sampaio RB, Fonseca MV, Zicker F. Co-authorship network analysis in health research: method and potential use. Health research policy and systems. 2016;14(1):34. Epub 2016/05/04. doi: 10.1186/s12961-016-0104-5. PubMed PMID: 27138279; PubMed Central PMCID: PMCPMC4852432.

  74. Hancock JT, Naaman M, Levy K. AI-mediated communication: Definition, research agenda, and ethical considerations. Journal of Computer-Mediated Communication. 2020;25(1):89-100.

  75. Safdar NM, Banja JD, Meltzer CC. Ethical considerations in artificial intelligence. European journal of radiology. 2020;122:108768.

  76. Pratomo AB, Mokodenseho S, Aziz AM. Data Encryption and Anonymization Techniques for Enhanced Information System Security and Privacy. West Science Information System and Technology. 2023;1(01):1-9.

  77. Ferrer X, van Nuenen T, Such JM, Coté M, Criado N. Bias and discrimination in AI: a crossdisciplinary perspective. IEEE Technology and Society Magazine. 2021;40(2):72-80.

  78. Ferrara E. Fairness And Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, And Mitigation Strategies. arXiv preprint arXiv:230407683. 2023.

  79. Gichoya JW, Thomas K, Celi LA, Safdar N, Banerjee I, Banja JD, et al. AI pitfalls and what not to do: mitigating bias in AI. The British Journal of Radiology. 2023:20230023.

  80. von Eschenbach WJ. Transparency and the black box problem: Why we do not trust AI. Philosophy & Technology. 2021;34(4):1607-22.

  81. Yazdanpanah V, Gerding EH, Stein S, Dastani M, Jonker CM, Norman TJ, et al. Reasoning about responsibility in autonomous systems: challenges and opportunities. AI & SOCIETY. 2023;38(4):1453-64.

This paper is available on arxiv under CC BY 4.0 DEED license.