Methodology

        The Ranking Web or Webometrics is the largest academic ranking of Higher Education Institutions. Since 2004 and every six months an independent, objective, free, open scientific exercise is performed by the Cybermetrics Lab (Spanish National Research Council, CSIC) for the providing reliable, multidimensional, updated and useful information about the performance of universities from all over the world based on their web presence and impact.

History

        The Cybermetrics Lab has been developing quantitative studies on the academic web since the mid-nineties. A first indicator was presented during the EASST/4S conference in Bielefeld (1996) and the collection of web data from European universities started in 1999 supported by the EU funded project EICSTES. These efforts are a follow-up of our scientometric research started in 1994 that has been presented in the conferences of the International Society for Scientometrics and Informetrics (ISSI, 1995-2011) and the International Conferences on Science and Technology Indicators (STI-ENID, 1996-2012) and published in high impact journals (Journal of Informetrics, Journal of the American Society for Information Science and Technology, Scientometrics, Journal of Information Science, Information Processing & Management, Research Evaluation and others). In 1997 we started the edition of an all-electronic open access peer-reviewed journal, Cybermetrics, devoted to the publication of webometrics-related papers.

        In 2003 after the publication of the Shanghai Jiatong University breakthrough ranking, the Academic Ranking of World Universities (ARWU), we decided to adopt the main innovations proposed by Liu and his team. The ranking will be built from publicly available web data, combining the variables into a composite indicator, and with a true global coverage. The first edition was published in 2004, it appears twice per year since 2006 and after 2008 the portal also includes webometrics rankings for research centers, hospitals, repositories and business schools.

Objectives and motivation

        The original aim of the Ranking is to promote academic web presence, supporting the Open Access initiatives for increasing significantly the transfer of scientific and cultural knowledge generated by the universities to the whole Society. In order to achieve this objective, the publication of rankings is one of the most powerful and successful tools for starting and consolidating the processes of change in the academia, increasing the scholars’ commitment and setting up badly needed long term strategies

        The objective is not to evaluate websites, their design or usability or the popularity of their contents according to the number of visits or visitors. Web indicators are considered as proxies in the correct, comprehensive, deep evaluation of the university global performance, taking into account its activities and outputs and their relevance and impact.

        At the end a reliable rank is only possible if the web presence is a trustworthy mirror of the university. In the second decade of the 21st century the Web is key for the future of all the university missions, as it is already the most important scholarly communication tool, the future channel for the off-campus distance learning, the open forum for the community engagement and the universal showcase for attracting talent, funding and resources.

Philosophy and justification

        Webometrics only publish a unique Ranking of Universities in every edition. The combination of indicators is the result of a careful investigation and it is not open to individual choosing by users without enough knowledge or expertise in this field. Other publishers provide series of very different rankings using exactly the same data in different fashions that is completely useless and very confusing.

        Webometrics is a ranking of all the universities of the world, not only a few hundred institutions from the developed world. Of course, “World-class” universities usually are not small or very specialized institutions.

        Webometrics is continuously researching for improving the ranking, changing or evolving the indicators and the weighting model to provide a better classification. It is a shame that a few rankings maintain stability between editions without correcting errors o tuning up indicators.

        Rankings backed by a for-profit company exploiting rank-related business or with strong political links reflected in individual ranks should be checked with care.

        Research only (bibliometrics) based rankings are biased against technologies, computer science, social sciences and humanities, disciplines that usually amounts for more than half of the scholars and students in a standard comprehensive university. Webometrics also measure, in an indirect way, other missions like teaching or the so-called third mission, considering not only the scientific impact of the university activities, but also the economic relevance of the technology transfer to industry, the community engagement (social, cultural, environmental roles) and even the political influence.

        Webometrics uses link analysis for quality evaluation as it is a far more powerful tool than citation analysis or global surveys. In the first case, bibliometrics only counts formal recognition between peers, while links not only includes bibliographic citations but also third parties involvement with university activities. Surveys are not a suitable tool for World Rankings as there is not even a single individual with a deep (several semesters per institution), multi-institutional (several dozen), multidisciplinary (hard sciences, biomedicine, social sciences, technologies) experience in a representative sample (different continents) of universities worldwide.

        Research output is also key topic for Webometrics, but including not only formal (e-journals, repositories) publications but also informal scholarly communication. Web publication is cheaper, maintaining the high standards of quality of peer review processes. It could also reach much larger potential audiences, offering access to scientific knowledge to researchers and institutions located in developing countries and also to third parties (economic, industrial, political or cultural stakeholders) in their local community.

        We intend to motivate both institutions and scholars to have a web presence that reflect accurately their activities. If the web performance of an institution is below the expected position according to their academic excellence, university authorities should reconsider their web policy, promoting substantial increases of the volume and quality of their electronic publications.

        Candidate students should use additional criteria if they are trying to choose university. Webometrics ranking correlates well with quality of education provided and academic prestige, but other non-academic variables need to be taken into account.

Composite indicators and Web Impact Factor

        Probably one of the major contributions of the Shanghai Ranking was to introduce a composite indicator, combining with a weighting system a series of indicators. Traditional bibliometric indexes are built on ratios like the Garfield’s Journal Impact Factor that based on variables following power law distributions is useless for describing large and complex scenarios. The Ingwersen proposal in 1997 for a similarly designed Web Impact Factor (WIF) using a links/webpages (L/W) ratio is equally doomed by the mathematical artifacts that generates.

        Following the Shanghai model we developed an indicator transforming the ratio L/W into the following formula aL+bW, where L & W should be normalized in advance and a & b are weights adding 100%. We strongly discouraged the use of WIF due to its severe shortcomings. The composite indicator can be designed with different sets of variables and weightings according to the developer’s needs and models.

Design and Weighting of Indicators

        Webometrics uses an “a-priori” scientific model for building the composite indicator. Other rankings choose arbitrary weights for strongly dependent variables and even combine raw values with ratios. None of them follow a logical ratio between activity related and impact related variables, i.e. each group representing 50% of the total weighting. Referring to the individual variables, some of them have values larger than zero for only a few universities and others segregate universities according to differences so small that they are even lower than their error rates.

        Prior to combination the values should be normalized, but the practice of using percentages is mostly incorrect due to the power law distribution of the data.

       Webometrics log-normalize the variables before combining according to a ratio 1:1 between activity/presence and visibility/impact groups of indicators.

 

 

        The current composite indicator is now built as follows:

Visibility (50%)

IMPACT. The quality of the contents is evaluated through a "virtual referendum", counting all the external inlinks that the University webdomain receives from third parties. Those links are recognizing the institutional prestige, the academic performance, the value of the information, and the usefulness of the services as introduced in the webpages according to the criteria of millions of web editors from all over the world. The link visibility data is collected from the two most important providers of this information: Majestic SEO and ahrefs. Both use their own crawlers, generating different databases that should be used jointly for filling gaps or correcting mistakes. The indicator is the product of square root of the number of backlinks and the number of domains originating those backlinks, so it is not only important the link popularity but even more the link diversity. The maximum of the normalized results is the impact indicator.

Activity (50%)

PRESENCE (1/3). The total number of webpages hosted in the main webdomain (including all the subdomains and directories) of the university as indexed by the largest commercial search engine (Google). It counts every webpage, including all the formats recognized individually by Google, both static and dynamic pages and other rich files. It is not possible to have a strong presence without the contribution of everybody in the organization as the top contenders are already able to publish millions of webpages. Having additional domains or alternative central ones for foreign languages or marketing purposes penalizes in this indicator and it is also very confusing for external users.

OPENNESS (1/3). The global effort to set up institutional research repositories is explicitly recognized in this indicator that takes into account the number of rich files (pdf, doc, docx, ppt) published in dedicated websites according to the academic search engine Google Scholar. Both the total files Both the total records and those with correctly formed file names are considered (for example, the Adobe Acrobat files should end with the suffix .pdf). The objective is to consider recent publications that now are those published between 2008 and 2012 (new period).

EXCELLENCE (1/3). The academic papers published in high impact international journals are playing a very important role in the ranking of Universities. Using simply the total number of papers can be misleading, so we are restricting the indicator to only those excellent publications, i.e. the university scientific output being part of the 10% most cited papers in their respective scientific fields. Although this is a measure of high quality output of research institutions, the data provider Scimago group supplied non-zero values for more than 5200 universities (period 2003-2010). In future editions it is intended to match the counting periods between Scholar and Scimago sources.

Advantages and shortcomings

        Coverage. Webometrics is the largest ranking by number of HEIs analyzed, but there is no classification of the different institutional types, so research-intensive universities are listed together with community colleges or theological seminaries. However the rank segregates all of them so it is not difficult to build sub-rankings for those interested.

        University missions. The direct measurement of teaching mission is virtually unfeasible and those evaluations based on surveys (subjective), ratios of students/scholars (data unreliable and results not segregating) or employment results (with many variables involved other than quality of teaching) should be avoided. Webometrics rank indirectly this mission using web presence as an indicator of the commitment of the scholars with their students. It is not perfect but the future of this mission is clearly in the web arena and any institution or individual not realizing that is losing ground very fast.

        Big numbers. Quality of the data does not only depend of the source used, but also of the numbers involved. For example, the number of universities with more than one Nobel Prize is probably lower than 200 (including all of those granted since 1900) that makes very difficult to rank them correctly. The same applies to citation data, the most powerful bibliometric tool that is providing figures in the order of thousands and tens of thousands. The link data offer far larger big number, usually two or even three orders of magnitude larger. Certainly the web indicators are noisier but statistically they are better suited for uncovering patterns and discriminating larger number of institutions.

        Size-dependent. There is no debate about this issue: The most popular rankings, including Webometrics, are size dependent, although size does not refer to number of scholars or students (Harvard or especially MIT are not large in that sense) but probably to resources (current funding, past funding reflected in buildings, laboratories or libraries). But this criticism is not correct as really none of the rankings are really measuring efficiency but global performance. The economic wealth of the nations can be measured in terms of GDP (USA, China, Japan) or in terms of GDP per capita (Luxembourg, Emirates, Norway), both indicators are correct but their objectives are completely different.

        Bad naming practices. University managers are still fighting for convincing their authors to assign the correct affiliations in the scientific publications. Situation is not far better in the Web with several hundred institutions having more than one central webdomain, preserving active old domains, using alternative domains for international (English) contents or sharing domains with third parties. Even among those universities with only one domain, many of them change the domain frequently, sometimes without any apparent good reason for doing that. A strange relatively common situation is when those changes are for transferring a national top level domain to an “.edu” domain (that usually refers to a USA university!) even when the country has a clearly defined academic subdomain (edu.pl, edu.ua, ac.kr). These changes and, especially the preservation along the time of several domains, penalizes very severely in Webometrics ranking. But of course it is also a very misleading practice that decreases the web visibility of the universities. Probably it has not so strong effect on local populations, but it is really confusing for the global audiences.

        Fake and non-accredited universities. We try to do the best for not including fake institutions, checking especially online, international and foreign branches if they have independent web domain or subdomain. Any suggestion on these issues is greatly welcomed.

        For more information please contact:

Isidro F. Aguillo

Cybermetrics Lab - CSIC
Albasanz, 26-28
28037 Madrid. SPAIN

Bibliography:

- Aguillo, I. F.; Granadino, B.; Ortega, J. L.; Prieto, J. A. (2006). Scientific research activity and communication measured with cybermetric indicators. Journal of the American Society for the Information Science and Technology, 57(10): 1296 - 1302.

- Wouters, P.; Reddy, C. & Aguillo, I. F. (2006). On the visibility of information on the Web: an exploratory experimental approach. Research Evaluation, 15(2):107-115.

- Ortega, J L; Aguillo, I.F.; Prieto, JA. (2006). Longitudinal Study of Contents and Elements in the Scientific Web environment. Journal of Information Science, 32(4):344-351.

- Kretschmer, H. & Aguillo, I. F. (2005).New indicators for gender studies in Web networks. Information Processing & Management, 41 (6): 1481-1494.

- Aguillo, I. F.; Granadino, B.; Ortega, J.L. & Prieto, J.A. (2005). What the Internet says about Science. The Scientist, 19(14):10, Jul. 18, 2005.

- Kretschmer, H. & Aguillo, I. F. (2004). Visibility of collaboration on the Web. Scientometrics, 61(3): 405-426.

- Cothey V, Aguillo IF & Arroyo N (2006). Operationalising “Websites”: lexically, semantically or topologically?. Cybermetrics, 10(1): Paper 4. http://cybermetrics.cindoc.csic.es/articles/v10i1p4.pdf