15 dic. 2017

MADAP: A method for depicting academic disciplines through Google Scholar Citations

Alberto Martín-Martín, Enrique Orduna-Malea, 
Emilio Delgado López-Cózar 
A novel method for depicting academic disciplines through Google Scholar Citations: 
The case of Bibliometrics
Scientometrics, in press. 

This article describes a procedure to generate a snapshot of the structure of a specific scientific community and their outputs based on the information available in Google Scholar Citations. We call this method MADAP (Multifaceted Analysis of Disciplines through Academic Profiles). 
The international community of researchers working in Bibliometrics, Scientometrics, Informetrics, Webometrics, and Altmetrics was selected as a case study. The records of the top 1,000 most cited documents by these authors were manually checked to fill any missing information and deduplicate fields like the journal names and book publishers. The results suggest that it is feasible to use Google Scholar Citations and the MADAP method to produce an accurate depiction of the community of researchers working in Bibliometrics (both specialist and occasional) and their publication habits (main publication venues such as journals and book publishers). 
Additionaly, the wide document coverage of Google Scholar (especially books and book chapters) enables more comprehensive analyses of the documents published in a specific discipline than were previously possible with other citation indexes, finally shedding light on what until now had been a blind spot in most citation analyses

Top 25 influential specialist/occasional authors in Bibliometrics according to Google Scholar Citations

Top 25 most influential documents in Bibliometrics according to Google Scholar Citations

 Top 25 most influential journals in Bibliometrics according to Google Scholar Citations

Network of the Bibliometrics discipline through the MFDP method in Google Scholar (author-journal)

11 dic. 2017

Visibility in Google scholar of 48 Institutional Repositories of Peruvian Universities

Alhuay-Quispe, J., Quispe-Riveros, D., Bautista-Ynofuente, L., Pacheco-Mendoza, J. (2017). 
Metadata Quality and Academic Visibility Associated with Document Type Coverage in Institutional Repositories of Peruvian Universities. 
Journal of Web Librarianship, 1-14.

This article analyzes level of metadata quality (MQ ratio) and level of academic visibility in Google Scholar (IGS ratio) associated with coverage of four types of documents (theses, articles, books, and conferences) in repositories of Peruvian universities. 
This research is a cross-sectional descriptive and correlational study with intentional non-probabilistic sampling that analyzes 48 repositories from national (n = 10) and private (n = 38) universities integrated in the Peruvian National Digital Repository Alicia (alicia.concytec.gob.pe). 
Regarding the MQ ratio, we found a median of 0.67 [RIC: 0.552–0.891] for national universities and a median of 0.65 [RIC: 0.407–0.838] for private universities (p = .542). Regarding the IGS ratio, we found a median of 0.32 [RIC: 0.241–0.596] for national universities and a median of 0.62 [RIC: 0.464–0.749] for private universities (p = .054). The p value in Spearman's rank correlation shows a moderate correlation (ρ = 0.594; p < .01) between MQ ratio and the thesis coverage indicator, and a low correlation (ρ = 0.157) between the index of document indexing in Google Scholar and the proportion of documents harvested in Alicia. 
We conclude that the highest proportion of academic visibility is concentrated in private universities, and the metadata quality number of items integrated in Alicia favors public universities.

7 dic. 2017

Google Scholar as a source for scholarly evaluation: a bibliographic review of database errors

Enrique Orduna-Malea, Alberto Martín-Martín, 
Emilio Delgado López-Cózar
Google Scholar as a source for scholarly evaluation: a bibliographic review of database errors
Revista Española de Documentación Científica 40.4 (2017): e185.

The launch of Google Scholar back in 2004 meant a revolution not only in the scientific information search market but also in research evaluation processes. Its dynamism, unparalleled coverage, and uncontrolled indexing make Google Scholar an unusual product, especially when compared to traditional bibliographic databases. Conceived primarily as a discovery tool for academic information, it presents a number of limitations as a bibliometric tool.  The main objective of this chapter is to show how Google Scholar operates and how its core database may be used for bibliometric purposes. To do this, the general features of the search engine (in terms of document typologies, disciplines, and coverage) are analyzed. Lastly, several bibliometric tools based on Google Scholar data, both official (Google Scholar Metrics, Google Scholar Citations) and some developed by third parties (H Index Scholar, Publishers Scholar Metrics, Proceedings Scholar Metrics, Journal Scholar Metrics, Scholar Mirrors), as well as software to collect and process data from this source (Publish or Perish, Scholarometer), are introduced, aiming to illustrate the potential bibliometric uses of this source.

2 nov. 2017

Google Scholar: The big data bibliographic tool

Delgado López-Cózar, E. Orduña-Malea, E.,  Martín-Martín, A.,  Ayllón, J.M. (2017). 
Google Scholar: The big data bibliographic tool
In: Cantu-Ortiz, FJ. (ed.). Research Analytics: Boosting University Productivity and Competitiveness through Scientometrics. 
CRC Press (Taylor & Francis Group), 2017 p. 59-80 

The launch of Google Scholar back in 2004 meant a revolution not only in the scientific
information search market but also in research evaluation processes. Its dynamism, unparalleled coverage, and uncontrolled indexing make Google Scholar an unusual product, especially when compared to traditional bibliographic databases. Conceived primarily as a discovery tool for academic information, it presents a number of limitations as a bibliometric tool.  The main objective of this chapter is to show how Google Scholar operates and how its core database may be used for bibliometric purposes. To do this, the general features of the search engine (in terms of document typologies, disciplines, and coverage) are analyzed. Lastly, several bibliometric tools based on Google Scholar data, both official (Google Scholar Metrics, Google Scholar Citations) and some developed by third parties (H Index Scholar, Publishers Scholar Metrics, Proceedings Scholar Metrics, Journal Scholar Metrics, Scholar Mirrors), as well as software to collect and process data from this source (Publish or Perish, Scholarometer), are introduced, aiming to illustrate the potential bibliometric uses of this source.

10 oct. 2017

Global mapping of artificial intelligence in Google and Google Scholar

Omar M., Mehmood  A., Choi  G.S., Park, H.W. (2017).
Global mapping of artificial intelligence in Google and Google Scholar. 
Scientometrics, in press.

The worldwide presence of AI needs to be quantified. This study proposes a descriptive approach and the use of multiple methods and data. An extensive electronic corpus of books was utilized to see the worldwide drift of intellectuals’ minds toward AI and identified related terms, their future trends, and convergence. 
Using the best bigram proposed by Ngrams Viewer, this study explores the human mind through Google query data, linking popular regions, countries, and related topics to the concept of AI. URL datasets were collected using two popular search engines (SEs), Google and Google Scholar (GS). A URL analysis identified key entities (organizations, institutes, and countries) and their yearly trends. 
Top-level domains revealed the global web ecology and the annual information growth of AI in SE environments. Information gathered through one approach was fed into the other, revealing a complementary relationship. AI is popular across the globe, and has left traces in many different countries. In this field, GS dominates Google, in relation to the number of sites and domains it includes. Top results reveal the popularity of AI among professionals, artists, programmers, and researchers. The pros and cons of the approaches are also discussed. 
In addition, this study aims to predict the impact of AI on society, as interpreted through the lenses of well-established theories. The dominance of AI may trap society into aspiring toward an easy life, dependent on intelligent machines. Consistent policies are needed to smooth out future economic cycles in the AI field.

6 oct. 2017

Evolution of profiles of Consejo Superior de Investigaciones Científicas in Academia.edu, Google Scholar Citations and ResearchGate (2014-2015)

Ortega, JL (2017)
Toward a homogenization of academic social sites: A longitudinal study of profiles in Academia.edu, Google Scholar Citations and ResearchGate
Online Information Review, 41(6): 812-825


The purpose of this paper is to analyze the distribution of profiles from academic social networking sites according to disciplines, academic statuses and gender, and detect possible biases with regard to the real staff distribution. In this way, it intends to know whether these academic places tend to become specialized sites or, on the contrary, there is a homogenization process.


To this purpose, the evolution of profiles of one organization (Consejo Superior de Investigaciones Científicas) in three major academic social sites (Academia.edu, Google Scholar Citations and ResearchGate) through six quarterly samples since April 2014 to September 2015 are tracked.


Longitudinal results show important disciplinary biases but with strong increase of new profiles form different areas. They also suggest that these virtual spaces are gaining more stability and they tend toward a equilibrate environment.


This is the first longitudinal study of profiles from three major academic social networking sites and it allows to shed light on the future of these platforms’ populations.

3 oct. 2017

The lost academic home: institutional affiliation links in Google Scholar Citations

Orduña-Malea, E., Ayllón, J. M., Martín-Martín, A., 
Delgado López-Cózar, E. (2017)
The lost academic home: institutional affiliation links in Google Scholar Citations. 
Online Information Review, 41 (6), 762-781


Google Scholar Citations (GSC) provides an institutional affiliation link which groups together authors who belong to the same institution. The purpose of this paper is to ascertain whether this feature is able to identify and normalize all the institutions entered by the authors, and whether it is able to assign all researchers to their own institution correctly.


Systematic queries to GSC’s internal search box were performed under two different forms (institution name and institutional e-mail web domain) in September 2015. The whole Spanish academic system (82 institutions) was used as a test. Additionally, specific searches to companies (Google) and world-class universities were performed to identify and classify potential errors in the functioning of the feature.


Although the affiliation tool works well for most institutions, it is unable to detect all existing institutions in the database, and it is not always able to create a unique standardized entry for each institution. Additionally, it also fails to group all the authors who belong to the same institution. A wide variety of errors have been identified and classified.
Research limitations/implications
Even though the analyzed sample is good enough to empirically answer the research questions initially proposed, a more comprehensive study should be performed to calibrate the real volume of the errors.

Practical implications

The discovered affiliation link errors prevent institutions from being able to access the profiles of all their respective authors using the institutions lists offered by GSC. Additionally, it introduces a shortcoming in the navigation features of Google Scholar which may impair web user experience.
Social implications
Some institutions (mainly universities) are under-represented in the affiliation feature provided by GSC. This fact might jeopardize the visibility of institutions as well as the use of this feature in bibliometric or webometric analyses.


This work proves inconsistencies in the affiliation feature provided by GSC. A whole national university system is systematically analyzed and several queries have been used to reveal errors in its functioning. The completeness of the errors identified and the empirical data examined are the most exhaustive to date regarding this topic. Finally, some recommendations about how to correctly fill in the affiliation data (both for authors and institutions) and how to improve this feature are provided as well.

27 sept. 2017

Metric study of information literacy in latin america: from bibliometrics to altmetrics

Uribe-Tirado, A.; Alhuay-Quispe, J. (2017). 
Estudio métrico de ALFIN en Iberoamérica: de la bibliometría a las altmetrics. 
Revista Española de Documentación Científica,40(3): e180

This study identifies the presence, productivity and influence of Ibero-American authors that write about information literacy (InfoLit). Using bibliometric and altmetric indicators, it seeks to analyze the impact and subsequent use of their scholarly works on social and scientific platforms. Fifty-five authors with the highest productivity were identified, based on the results of bibliometric studies on InfoLit carried out on both an international and Ibero-American scale in searches of major databases as well as publications collected in a Latin American wiki. 

Subsequently an analysis of bibliometric and altmetric indicators at the author and publication level was carried out, based on the results of searches on eight scientific platforms (Google Scholar, ResearchGate, Academia.edu, Mendeley, ORCID, IraLIS, E-LIS and EXIT), three social networks (Facebook, Twitter and LinkedIn), and data provided by a commercial supplier (Altmetric.com). 

Overall we found a greater presence of authors in ResearchGate (58%), Academia.edu (51%) and Google Scholar (49%) as opposed to Mendeley (25%) and ORCID (18%). Furthermore, as to social platforms, the greatest potential influence lies with Facebook, due to its high number of followers ( / top 10 authors). In addition, an analysis with the Spearman rho statistic, shows among some sources and platforms, a low correlation between the number of citations in Google Scholar and readings in Mendeley (r = 382) and low negative for mentions in blogs (r = 0,-237), Google+ (r = 0, -214) and Twitter (r = 0, -183). In conclusion, both the productivity and the impact-visibility center on specific authors writing about InfoLit, and various measurement resources show that for these authors there is a positive two-way impact from bibliometric to altmetric and vice versa

Comparing the number of citations of Google Scholar and Researchgate in a sample of Colombian researchers

Aguillo-Caño I., Uribe-Tirado A., López-López W. (2017)
Visibilidad de los investigadores colombianos según sus indicadores en Google Scholar y ResearchGate. Diferencias y similitudes con la clasificación oficial del sistema nacional de ciencia -COLCIENCIAS. 
Revista Interamericana de Bibliotecología, 40(3), 221-230. 
doi: 10.17533/udea.rib.v40n3a03

The aim of this study is to contextualize the results obtained regarding the classification of researchers who work in Colombian institutions according to their public Google Scholar - GSC citation profiles (1390 with an H index equal or higher than 5). 
To this end, the study compares its findings with the data obtained from the collection of Colombian authors on a social network named ResearchGate - RG and the local information provided by Colciencias, which is the Colombian agency that publishes researcher classification from a platform named ScienTI. 
Findings revealed significant discrepancies between GSC and RG findings regarding the four classification categories Colciencias provided. This suggests that Colciencias must reconsider its assessment criteria including new sources and indicators. Considering that the two sources (GSC, RG) and the (h index, RG-Index) indicators behave differently regarding disciplines, Colciencias must also be careful with disciplinary assignments adopting International classifications and developing discipline related indicators. Colombian academic and research organizations should become more active in recognizing the potential and importance of Internet platforms to visibilize their research work and increase its impact (Ciencia 2.0)

25 sept. 2017

International Survey of Research University Faculty: Use of Bibliometric Ratings, Identifiers & Indicators

International Survey of Research University Faculty: Use of Bibliometric Ratings, Identifiers & Indicators
Report by Primary Research Group
ISBN No:978-157440-472-2

This study presents data from 325 faculty of major universities in the USA, Canada, the UK, Ireland and Australia about how they view bibliometric indicators such as the h-index, how trustworthy they are believed to be and how often they are checked or calculated.

Characteristics of the Sample 
Personal Awareness of the h-index 
Frequency of Checking the H-Index 
Use of Journal Impact Factor Ratings and Rankings When Deciding on Publication Venue 
Ease of Finding Journal Impact Factors 
Views on Journal Impact Factor Trustworthiness 
Use Scholar Identifiers 
Use of Thomson Reuters Research ID [sic]
Use of Orchid ID 
Use of the International Standard Number Identifier 
Use of arXiv 
Use of Web of Science 
Use of Google Scholar 
Use of Scopus 
Use of JSTORE [sic]
Use of Journal Citation 
Use of Scimago, bepress and SciVal 
Use of SciFinder 
Use of CrossRef 

The study presents data on the use of particular tools and indicators, giving specific data for all of the following: Web of Science, Scopus, Google Scholar, ORCHID ID, Thomson-Reuters Research ID, Scimago, bepress, SciVal, JSTOR, International, SciFinder, arXiv and CrossRef, among others.

Data in the report is broken out by tenure status, gender, age, semester teaching load, academic field, academic title, and political views of the survey participant, as well as by the country or origin, public/private status and world ranking of the universities of the survey participants.

Just a few of this 113-page report’s many findings are that:
  • Google Scholar was the most frequently used tool for bibliometrics, with 85% of the respondents reporting its use. All faculty age 30 or under reported using Google Scholar with this percentage declining to 77% of faculty 60 years and over.
  • SciFinder use was reported by 7% of respondents, especially at private institutions (10%), by faculty 30 years or younger (22%), 
  • 76% of faculty in the UK/Ireland had an ORCHID ID vs. only 35% in the USA.
  • Political conservatives are more likely than those with more centrist or left-wing views to feel that bibliometric measures are trustworthy.
  • Faculty in literature and languages found the most difficulty in finding journal impact factor data for use in their career planning

19 sept. 2017

Tracking Scholarly Publishing of Hospitals Using MEDLINE, Scopus, WoS and Google Scholar

Pylarinou, S., & Kapidakis, S. (2017). 
Tracking Scholarly Publishing of Hospitals Using MEDLINE, Scopus, WoS and Google Scholar. 
Journal of Hospital Librarianship, 17(3), 209-216. 

Scientific literature focuses on facilitating communication among researchers. Many studies have been conducted to compare effectiveness, coverage, and performance among databases available to researchers and/or librarians. In this study, the authors compared MEDLINE/PubMed, Scopus, Web of Science (WoS), and Google Scholar performance regarding searching for scholarly publishing of institutions such as hospitals. 

Query searches of scholarly publications of specific hospital personnel run and articles results were compared.  The MEDLINE/PubMed database, Scopus and Web of Science offer the option to search by affiliation. Affiliations in Google Scholar can be searched by running a query as an exact phrase. Queries were phrased in a way that was suitable for each source as well as to enable comparison. To facilitate comparison we limited research to 2016. Data were collected at the end of August 2016.

Natural language use when authors denote affiliation affects retrieval of scholarly publishing. Effectiveness of searching scholarly publishing of a specific institution is better served when there is concurrent use of many databases.

An affiliation-based search could be better served if searchers use multiple sources in combination. In this study, a comparison of bibliographic database results gave precedence to MEDLINE/PubMed. Between free available resources MEDLINE/PubMed and Google Scholar, MEDLINE/PubMed provided better results also.

15 sept. 2017

Is Google Scholar useful for the evaluation of Chinese journals?

Zhang, Y., Lun, H., & Yang, Z. (2017)
Is Google Scholar Useful for the Evaluation of Non-English Scientific Journals? The Case of Chinese Journals
iConference 2017 Proceedings (pp. 241–261). https://doi.org/10.9776/17025

This study aims to explore how useful Google Scholar is for the evaluation of non-English journals with the case of Chinese journals. Based on a sample of 150 Chinese journals across two disciplines (Library and Information Science, Metallurgical Engineering & Technology), it provides a comparison between Google Scholar and Chongqing VIP, which is an important Chinese citation database, from three aspects: resource coverage, journal ranking and citation data. 

Results indicate that Google Scholar is equipped with sufficient resources and citation data for the evaluation of Chinese journals. However, the Chinese journal ranking reported by Google Scholar Metrics is not developed enough. But Google Scholar is able to be an alternative source of citation data instead of Chinese citation databases. The Average Citation is a useful metric in the evaluation of Chinese journals with data from Google Scholar to provide a comprehensive reflection of journals’ impact. Overall, Google Scholar is useful and worthy of attention when evaluating Chinese journals.

19 jul. 2017

Google Scholar Citations the system that covers more publications by an author: The cases of B Cronin and WG Stock

Publication hit lists of authors, institutes, scientific disciplines etc. within scientific databases like Web of Science or Scopus are often used as a basis for scientometric analyses and evaluations of these authors, institutes etc. However, such information services do not necessarily cover all publications of an author. The purpose of this article is to introduce a re-interpreted scientometric indicator called ‘‘visibility,’’ which is the share of the number of an author’s publications on a certain information service relative to the author’s entire oeuvre based upon his/her probably complete personal publication list. To demonstrate how the indicator works, scientific publications (from 2001 to 2015) of the information scientists Blaise Cronin (N = 167) and Wolfgang G. Stock (N = 152) were collected and compared with their publication counts in the scientific information services ACM, ECONIS, Google Scholar, IEEE Xplore, Infodata eDepot, LISTA, Scopus, and Web of Science, as well as the social media services Mendeley and ResearchGate. For almost all information services, the visibility amounts to less than 50%. The introduced indicator represents a more realistic view of an author’s visibility in databases than the currently applied absolute number of hits in those databases.

18 jul. 2017

Scientific information discovery: Still a mission of the academic library? Google & Google Scholar empire

Rodríguez-Bravo, B.; Simões, MG; Vieira-de-Freitas, MC; Frías, JA (2017)
Descubrimiento de información científica: ¿todavía misión y visión de la biblioteca académica? 
El profesional de la información, 26 (3): 464-479

Access to quality content is key to research and one of the core values that scholars assign to the library. Bibliographic data play a fundamental role in university libraries, which devote abundant resources to obtaining and hosting them for access.
This study investigates where and how bibliographic information is discovered, and highlights the role of search engines, databases, repositories, and web-scale discovery services in that process. The effort that libraries have made in implementing these services seems to have paid off in relation to the increase in the use of collections. However, Google remains the top option for discovering scientific information. This is a review study, based on the analysis of original research and results from recent reports.

17 jul. 2017

An evidence-based review of academic web search engines (Google Books, Google Scholar, Microsoft Academic), 2014-2016: Implications for librarians’ practice and research agenda

Academic web search engines have become central to scholarly research. While the fitness of Google  Scholar for research purposes has been examined repeatedly, Microsoft Academic and Google Books  have not received much attention. Recent studies have much to tell us about Google Scholar’s  coverage of the sciences and its utility for evaluating researcher impact. But other aspects have been understudied, such as coverage of the arts and humanities, books, and non-Western, non-English  publications. User research has also tapered off. A small number of articles hint at the opportunity for  librarians to become expert advisors concerning scholarly communication made possible or enhanced by these platforms. This article seeks to summarize research concerning Google Scholar, Google Books, and Microsoft Academic from the past three years with a mind to informing practice and setting a research agenda. Selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas.

14 jul. 2017

Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation. Review of the Literature

Studies comparing GS to controlled databases such as Scopus, Web of Science (WOS) and others have been published almost since GS inception. These studies focus on its coverage, quality and ability to replace controlled databases as a source of reliable scientific literature. In addition, GS introduction of citations tracking and journal metrics have spurred a body of literature focusing on its ability to produce reliable metrics. In this article we aimed to review some studies in these areas in an effort to provide insights into GS ability to replace controlled databases in various subject areas. We reviewed 91 comparative articles from 2005 until 2016 which compared GS to various databases and especially Web of Science (WOS) and Scopus in an effort to determine whether GS can be used as a suitable source of scientific information and as a source of data for scientific evaluation. Our results show that GS has significantly expanded its coverage through the years which makes it a powerful database of scholarly literature. However, the quality of resources indexed and overall policy still remains known. Caution should be exercised when relying on GS for citations and metrics mainly because it can be easily manipulated and its indexing quality still remains a challenge.

• Google Scholar is constantly expanding and includes publishers content as well as content not available in controlled databases.
• Google Scholar provides citations counts that are broader than those covered by controlled databases.
• Google Scholar should be used with controlled databases especially when clinical information retrieval is required.
• Google Scholar is challenging when advanced searching is required.
• Google Scholar does not support data downloads and therefore is difficult to use as a sole bibliometric source.
• Google Scholar lacks quality control and clear indexing guidelines.

4 jul. 2017

Faculty Use of Author Identifiers and Researcher Networking Tools

This cross-sectional survey focused on faculty use and knowledge of author identifiers and researcher networking systems, and professional use of social media, at a large state university. Results from 296 completed faculty surveys representing all disciplines (9.3% response rate) show low levels of awareness and variable resource preferences. The most utilized author identifier was ORCID while ResearchGate, LinkedIn, and Google Scholar were the top profiling systems. Faculty also reported some professional use of social media platforms. The survey data will be utilized to improve library services and develop intra-institutional collaborations in scholarly communication, research networking, and research impact.

28 jun. 2017

Classic papers: déja vu, a step further in the bibliometric exploitation of Google Scholar

After giving a brief overview of Eugene Garfield’s contributions to the issue of identifying and studying the most cited scientific articles, manifested in the creation of his Citation Classics, the main characteristics and features of Google Scholar’s new service -Classic Papers-, as well as its main strengths and weaknesses, are addressed. This product currently displays the most cited English-language original research articles by fields and published in 2006.
What does Google Scholar’s Classic Papers offer?

The top 10 most cited English-language original research articles published in 2006 in each of 252 subject categories, according to the data available in Google Scholar as of May 2017. The total number of articles displayed in the product is 2515 articles .

In order to make it to this product, articles must meet the following criteria:

˗ They must have been published in 2006
˗ They must be journal articles, articles deposited in repositories, or conference communications.
˗ The documents must describe original research. Review articles, introductory articles, editorials, guides, commentaries, etc. are explicitly excluded.
˗ They must be written in English.
˗ They must be among the top 10 most cited documents in their respective subject category.
˗ They must have received at least 20 citations.
This product, as could not be otherwise, has the identifying traits of most of Google’s products:

- Simple and straightforward: a list of the most cited articles in each discipline, with a simple browsing interface.
- Easy to use and understand: organized by broad scientific areas and inside of them by subject categories. Three clicks are enough to reach the documents or the public Google Scholar Citations profiles of their authors.
- Minimal information: As a whole, the product displays just over 2500 highly cited articles. Each article presents the most basic bibliographic information.
- Little methodological transparency: It is common for Google Scholar not to declare in detail how their products are developed.

Regarding the las point, there are four critical aspects about which we should know more precise information. They are aspects that could compromise the reliability and validity of the product:
The first of them is related to what Google understands as a research article
The second aspect has to do with the subject classification of the articles.
The third aspect has to do with another crucial issue related to the way Google Scholar works: can we be sure they have successfully merged together all the versions indexed in Google Scholar of these documents? 
The fourth aspect has to do with the threshold selected to consider an article a “classic paper”

20 jun. 2017

A Novel Improvement to Google Scholar Ranking Algorithms Through Broad Topic Search

Google Scholar uses ranking algorithms to find the most relevant academic research possible. However, its algorithms use an exact keyword match and citation count to sort its results. This paper presents a novel improvement to Google Scholar algorithms by aggregating multiple synonymous searches into one set of results, offsetting the necessity to guess all potential search phrases for a research topic. This design science research method uses a broad topic analysis that examines search queries, finds synonymous phrases, and combines all keyword searches into one set of results based on current Google Scholar citation count algorithms. To support and evaluate this research-in-progress, several users will compare multiple niche search queries against old and new algorithms. The expectation of this design is to introduce modern algorithm techniques to academic search engines, resulting in greater quality, discoverability, and core topic diversity of published research.

17 abr. 2017

On Low Overlap Among Search Results of Academic Search Engines: Google Scholar, Semantic Scholar Microsoft Academic, Scopus

Mitra, A., & Awekar, A. (2017)
On Low Overlap Among Search Results of Academic Search Engines. 

Introduction To tackle this information overload, researchers are increasingly depending on niche academic search engines. Recent works have shown that two major general web search
engines: Google and Bing, have high level of agreement in their top search results.Various works have tried to predict coverage of academic search engines. However to the best of our knowledge, there is no existing work that systematically studies overlap in the search results of academic search engines.
MethodsWe collected over 2300 query terms from 2012 ACM Computing Classification System. We sent these 2500 queries to all four selected academic search engines. For each academic search engines, we looked at top eight results as each academic search engines returns at least eight results on the first page. We computed similarity between any two sets using the Jaccard similarity. 

ResultsFor all 2500 queries, intersection set of all four academic search engines considered together was always empty. In other words, for each query, no research article appears in the top results list of all four academic search engines. This shows strong diagreement among academic search engines
Therefore we observe that overlap in search result sets of any pair of academic search engines is significantly low and in most of the cases the search result sets are mutually exclusive.