PRESENTATION
The Webometrics Ranking formally and explicitly adheres
to the Berlin
Principles of Higher Education Institutions.
The ultimate aim is the continuous improvement
and refinement of the methodologies according to a set
of agreed principles of good practices.
0) Background of the project.
The “World Universities'
ranking on the Web” is an initiative
of the Cybermetrics Lab, a research
group of the Centro de Información y Documentación
(CINDOC),
part of the National Research Council (CSIC),
the largest public research body
in Spain.
Cybermetrics Lab is devoted to the
quantitative analysis of the Internet and Web contents
specially those related to the processes of generation
and scholarly communication of scientific knowledge.
This is a new emerging discipline that has been called
Cybermetrics (our team developed and publishes the
free electronic journal Cybermetrics
since 1997) or Webometrics.

With these rankings we intend to
provide extra motivation to researchers worldwide
for publishing more and better scientific content
on the Web, making it available to colleagues and
people wherever they are located.
The "Webometrics Ranking of
World Universities" was officially launched in
2004, and it is updated every 6 months (data collected
in January and July and published one month later).
The Web indicators used are based and correlated with
traditional scientometric and bibliometric indicators
and the goal of the project is to convince academic
and political communities of the importance of the
web publication not only for dissemination of the
academic knowledge but for measuring scientific activities,
performance and impact too.
A) Purposes and Goals of Rankings
1. Assessment of higher education
(processes, and outputs) in the Web. The Web
indicators and we are already publishing comparative
analysis with similar initiatives. But the current
objective of the Webometrics Ranking is to promote
Web publication by universities, evaluating the commitment
to the electronic distribution of these organizations
and to fight a very concerning academic digital divide
which is evident even among world universities from
developed countries. However, even when we do not
intend to assess universities performance solely on
the basis of their web output, Webometrics Ranking
is measuring a wider range of activities than the
current generation of bibliometric indicators that
focuses only in the activities of scientific elite.
2. Ranking purpose and target
groups. Webometrics Ranking is measuring the
volume, visibility and impact of the web pages published
by universities, with special emphasis in the scientific
output (referred papers, conference contributions,
pre-prints, monographs, thesis, reports, …)
but also taking into account other materials (courseware,
seminars or workshops documentation, digital libraries,
databases, multimedia, personal pages, …) and
the general information on the institution, their
departments, research groups or supporting services
and people working or attending courses.
There is a direct target group for the Ranking which
are the university authorities. If the web performance
of an institution is below the expected position according
to their academic excellence, they should reconsider
their web policy, promoting substantial increases
in the volume and quality of their electronic publications.
Faculty members are indirect target groups as we expect
that in a near future the web information could be
as important as other bibliometric and scientometric
indicators for the evaluation of the scientific performance
of scholars and their research groups.
Finally, candidate students should not used this data
as the sole guide for choosing university, although
a Top position means that the institution has a policy
that encourages new technologies and it has resources
for their adoption.
3. Diversity of institutions:
Missions and goals of the institutions. Quality
measures for research-oriented institutions, for example,
are quite different from those that are appropriate
for institutions that provide broad access to underserved
communities. Institutions that are being ranked and
the experts that inform the ranking process should
be consulted often.
4. Information sources and interpretation
of the data provided. Access to the Web information
is done mainly through search engines. These intermediaries
are free, universal, and very powerful even when considering
their shortcomings (coverage limitations and biases,
lack of transparency, commercial secrets and strategies,
irregular behaviour). Search engines are key for measuring
visibility and impact of university’s websites.
There are a limited number of sources that can be
useful for webometric purposes: 7 general search engines
(Google*, Yahoo Search*, Live (MSN) Search*, Exalead*,
Ask (Teoma), Gigablast and Alexa) and 2 specialised
scientific databases (Google Scholar* and Live Academic).
All of them have very large (huge) independent databases,
but due to the availability of their data collection
procedures (Apis), only those marked with asterisk
are used in compiling the Webometrics Ranking.
5. Linguistic, cultural, economic,
and historical contexts. The project intends
to have true global coverage, not narrowing the analysis
to a few hundreds of institutions (world-class universities)
but including as many organizations as possible. The
only requirement in our international rankings is
having an autonomous web presence with an independent
web domain. This approach allows a larger number of
institutions to monitor their current ranking and
the evolution of this position after adopting specific
policies and initiatives. Universities in developing
countries have the opportunity to know precisely the
indicators' threshold that marks the limit of the
elite.
Current identified biases of the Webometrics Ranking
includes the traditional linguistic one (more than
half of the internet users are English-speaking people),
and a new disciplinary one (technology instead of
biomedicine is at the moment the hot topic) Since
in most cases the infrastructure (web space) and the
connectivity to the Internet already exits , the economic
factor is not considered a major limitation (at least
for the 3.000 Top universities).
B) Design and Weighting of Indicators
6. Methodology used to create
the rankings. The unit for analysis is the institutional
domain, so only universities and research centres
with an independent web domain are considered. If
an institution has more than one main domain, two
or more entries are used with the different addresses.
About 5-10% of the institutions have no independent
web presence, most of them located in developing countries.
Our catalogue of institutions includes not only universities
but also other Higher Education institutions following
the recommendations of UNESCO. Names and addresses
were collected from both national and international
sources including among others:
University activity is multi-dimensional
and this is reflected in its web presence. So the
best way to build the ranking is combining a group
of indicators that measures these different aspects.
Almind & Ingwersen proposed the first Web indicator,
Web Impact Factor (WIF), based on link analysis that
combines the number of external inlinks and the number
of pages of the website, a ratio of 1:1 between visibility
and size. This ratio is used for the ranking but adding
two new indicators to the size component: Number of
documents, measured from the number of rich files
in a web domain, and number of publications being
collected by Google Scholar database. As it has been
already commented, the four indicators were obtained
from the quantitative results provided by the main
search engines as follows:
Size (S). Number
of pages recovered from four engines: Google, Yahoo,
Live Search and Exalead. For each engine, results
are log-normalised to 1 for the highest value. Then
for each domain, maximum and minimum results are
excluded and every institution is assigned a rank
according to the combined sum.
Visibility (V). The total number
of unique external links received (inlinks) by a
site can be only confidently obtained from Yahoo
Search, Live Search and Exalead. For each engine,
results are log-normalised to 1 for the highest
value and then combined to generate the rank.
Rich Files (R). After evaluation
of their relevance to academic and publication activities
and considering the volume of the different file
formats, the following were selected: Adobe Acrobat
(.pdf), Adobe PostScript (.ps),
Microsoft Word (.doc) and Microsoft Powerpoint
(.ppt). These data were extracted using
Google and merging the results for each filetype
after log-normalising in the same way as described
before.
Scholar (Sc). Google Scholar provides
the number of papers and citations for each academic
domain. These results from the Scholar database
represent papers, reports and other academic items.
The four ranks were combined according
to a formula where each one has a different weight:
Webometrics Rank (position)=
4*RankV+2*RankS+1*RankR+1*RankSc
7. Relevance and validity of
the indicators. The choice of the indicators
was done according to several criteria (see note),
some of them trying to catch quality and academic
and institutional strengths but others intending to
promote web publication and Open Access initiatives.
The inclusion of the total number of pages is based
on the recognition of a new global market for academic
information, so the web is the adequate platform for
the internationalization of the institutions. A strong
and detailed web presence providing exact descriptions
of the structure and activities of the university
can attract new students and scholars worldwide .
The number of external inlinks received by a domain
is a measure that represents visibility and impact
of the published material, and although there is a
great diversity of motivations for linking, a significant
fraction works in a similar way as bibliographic citation.
The success of self-archiving and other repositories
related initiatives can be roughly represented from
rich file and Scholar data. The huge numbers involved
with the pdf and doc formats means that not only administrative
reports and bureaucratic forms are involved. PostScript
and Powerpoint files are clearly related to academic
activities.
8. Measure outcomes in preference
to inputs whenever possible. Data on inputs are
relevant as they reflect the general condition of a given establishment
and are more frequently available. Measures of outcomes provide a more accurate
assessment of the standing and/or quality of a given institution or program.
We expect to offer a better balance in the future,
but current edition intend to call the attention to
incomplete strategies, inadequate policies and bad
practices in web publication before attempting a more
complete scenario.
9. Weighting the different indicators:
Current and future evolution. The current rules
for ranking indicators including the described weighting
model has been tested and published in scientific
papers. More research is still done on this topic,
but the final aim is to develop a model that includes
additional quantitative data, especially bibliometric
and scientometric indicators.
C) Collection and Processing of Data
10. Ethical standards. We
identified some relevant biases in the search engines
data including under-representation of some countries
and languages. As the behaviour is different for each
engine, a good practice consists of combining results
from several sources. Any other mistake or error is
unintentional and it should not affect the credibility
of the ranking. Please contact us if you think the
ranking is not objective and impartial in any way.
11. Audited and verifiable data. The only
source for the data of the Webometrics Ranking is
a small set of globally available, free access search
engines. All the results can be duplicated according
to the describing methodologies taking into account
the explosive growth of the web contents, their volatility
and the irregular behaviour of the commercial engines.
12. Data collection. Data are collected during
the same week, in two consecutive rounds for each
strategy, being selected the higher value. Every website
under common institutional domain is explored, but
no attempt has been done to combine contents or links
from different domains.
13. Quality of the ranking processes. After
automatic collection of data, positions are checked
manually and compared with previous editions. Some
of the processes are duplicated and new expertise
is added from a variety of sources. Pages that linked
to the Webometrics Ranking are explored and comments
from blogs and other fora are taken into account.
Finally, our mailbox receives a lot of requests and
suggestions that are acknowledged individually.
14. Organizational measures to enhance credibility.
The ranking results and methodologies are discussed
in scientific journals, and presented in international
conferences. We expect international advisory or even
supervisory bodies to take part in future developments
of the ranking.
D) Presentation of Ranking Results
15. Display of data and factors
involved. The published tables show all the Web
indicators used in a very synthetic and visual way.
Rankings are provided not only from a central Top
4000 classification but also considering several regional
rankings for comparative purposes.
16. Updating and error reducing. The listings
are offered from asp dynamic pages build on several
databases that can be corrected when errors or typos
are detected.
Coments welcomed
Our group thanks the comments,
suggestions and proposals than can be useful for improving
this website. We try to maintain an objective position
on the quantitative data provided but mistakes can
occur. Please, take into account that merging, domain
change or networks problems can affect the ranking
of the institutions.
Currently the members of our team
are Isidro F. AGUILLO, José Luis ORTEGA, Mario
FERNÁNDEZ (Webmaster) and Helena ZAMORA.
For more information please contact:
Isidro F.
Aguillo
CINDOC - CSIC
Joaquín Costa, 22
28002 Madrid. SPAIN
Notes:
- Aguillo, I. F.; Granadino, B.;
Ortega, J. L.; Prieto, J. A. (2006). Scientific research
activity and communication measured with cybermetric
indicators. Journal of the American Society
for the Information Science and Technology,
57(10): 1296 - 1302.
- Wouters, P.; Reddy, C. & Aguillo,
I. F. (2006). On the visibility of information on
the Web: an exploratory experimental approach. Research
Evaluation, 15(2):107-115.
- Ortega, J L; Aguillo, I.F.; Prieto,
JA. (2006). Longitudinal Study of Contents and Elements
in the Scientific Web environment. Journal
of Information Science, 32(4):344-351.
- Kretschmer, H. & Aguillo,
I. F. (2005).New indicators for gender studies in
Web networks. Information Processing &
Management, 41 (6): 1481-1494.
- Aguillo, I. F.; Granadino, B.;
Ortega, J.L. & Prieto, J.A. (2005). What the Internet
says about Science. The Scientist,
19(14):10, Jul. 18, 2005.
- Kretschmer, H. & Aguillo,
I. F. (2004). Visibility of collaboration on the Web.
Scientometrics, 61(3): 405-426.
- Cothey V, Aguillo IF & Arroyo
N (2006). Operationalising “Websites”:
lexically, semantically or topologically?. Cybermetrics,
10(1): Paper 4. http://www.cindoc.csic.es/cybermetrics/articles/v10i1p4.html
|