This week, UKSG Insights published a provocative new article that contains important constructive criticism from research administrators and librarians that use bibliometrics and altmetrics tools, including Dimensions.
The article rightly challenges metrics vendors on issues of transparency, coverage, responsible metrics, open metrics and services, and indicator development. It gave us a lot to think about and to celebrate, too–according to the article, apparently, one in four respondents were already using Dimensions only a month after we launched!
After author Lizzie Gadd invited our response on Twitter, we thought it would be worthwhile to provide a lengthier, more detailed response than is possible in 280 characters.
In this post, I highlight how we think that Dimensions does a good job of meeting the community’s needs described in the article, and where we have room to grow.
What we think we do well
Partnering with the community
Dimensions was created from the ground up with input from over 100 community partners. We continue to develop Dimensions based on community feedback and suggestions.
Moreover, we recognize and respect community ownership and stewardship for bibliometrics data. After all, open citation data is what made Dimensions possible in the first place. We do our best to give back in the form of open data that the community can use to validate existing citation-based metrics and innovate by creating new metrics.
High-quality data as a starting point
As a ‘new kid on the block’, Dimensions has to focus on data quality. After all, we are up against other companies who have had decades to work on both coverage and quality.
From the beginning, we have opted for precision over recall, for example in the areas of institutional affiliation matching and researcher disambiguation. In both areas, our approach is to use open standards and infrastructure: our own GRID for research organizations, ORCID for researcher disambiguation, and so on.
Recall over precision means that we would rather split a researcher’s body of work into several profiles rather than run the risk of assigning the wrong publication to a person. This data is easily amendable by the researcher herself, who can curate her ORCID record, which will be picked up in the next disambiguation run.
Though we’ve gotten some good feedback from the community regarding our coverage and data quality to date, there is always room for improvement, which is why we have also listed “data quality” as something we can improve upon–more below!
More open, interoperable, and reusable data
We make our data available in open, standardized formats (primarily CSV and JSON), so that it is easy to repurpose our data in analyses, visualizations, and reports. We also make publication and citation data as open as possible (within the limits of some publishers’ restrictions) through the free Dimensions webapp and the open Dimensions Metrics API. Scientometrics researchers can also apply for free access to Dimensions Plus and the Dimensions API, so they can use our data in their own research with minimal red tape and restrictions. We have also put a lot of effort into the Dimensions API and developed a domain-specific querying language to support the easy ‘mash up’ of Dimensions data with other data sources.
Promoting the responsible use of metrics
We agree that metrics services have a “duty of care” to end users to help promote the responsible use of metrics. That’s why we are signatories to DORA, and why we only include a limited number of carefully selected, community approved metrics in our products. We also have begun working with the academic scientometric community to redefine the metrics and their (responsible) presentation in Dimensions, which will be made public in an upcoming Dimensions release. We do our best to organize as many educational opportunities as possible for Dimensions users, including user days and webinars, and are constantly looking to expand and improve the educational services we offer.
Article-level subject indexing
Survey respondents described the importance of accurate article-level subject indexing when benchmarking across institutions and niche subject areas. We agree that journal-level subject indexing (whereby an entire journal’s identified subject area is applied to all the articles that are published in the journal, rather than each article being analyzed to determine its specific topic) is less than desirable, which is why we have taken an article-level indexing approach for Dimensions.
We also have developed techniques to apply granular topics (based on the standardized Australian FOR subject areas) to journal articles, making the identification of research in niche subject areas much easier for those using our data in analyses.
That said, we are well aware that our subject area coverage is not always 100% accurate nor as granular as it could be, so we are constantly working to improve what we do. For example, we are currently working on improving the training sets for our machine learning based classifications using various methods, from subject matter experts’ input to journal level classifications implemented in the background, to broaden and improve the training sets.
Finding the balance between innovation and the basics
When working on the concept for Dimensions, the balance between perfecting the basics and offering innovation was front and center in our discussions. It was clear that building a tool that allowed only citation-based analysis was too narrow, and that for any analysis a robust and relevant citation graph was required.
We realized the value to be added if we created an inclusive, world-class publications index, and ‘bolted on’ other highly relevant data sources (such as a global grant database, patents, clinical trials and policy documents), consistently linking all the data together to one large dataset. In doing so, Dimensions allows a broader view on the trajectory of research from funding to later impact reflected in policy papers (to name just one example).
What we’ve ended up with is a tool that offers citation data for ‘old fashioned’ basic analysis, as well as a larger, comprehensive data set to be used by the scientometric community for the development of new metrics.
We have put also a lot of care into taking fresh approaches to common bibliometrics challenges, for example by developing machine learning approaches that make possible subject-level article classification based on textual analysis. We also recognize the importance of getting right “the basics” like researcher disambiguation and accurate article metadata, and work hard to do so.
How we think we can improve
Complete, accurate disciplinary coverage is in high demand from survey respondents, and rightly so. After all, it’s difficult to do departmental and institutional bibliometric analyses using incomplete datasets!
Providing complete coverage of all published research in all disciplines has major challenges, not the least of which include open data that can be reused without commercial restrictions, adequate metadata, and discoverability. Though we are currently one of the largest research indexes (at 96 million publications and counting!), we recognize that we are not as comprehensive as we could be.
The concept behind Dimensions is to provide a true database, which is as inclusive as possible, and not a curated database, where a decision-making body decides which research “makes the cut” for inclusion.
By “inclusive”, we mean that we strive to strike the balance between providing comprehensive coverage (i.e. all scholarly work ever produced, no matter the publisher, source, or quality) and providing access to high-quality research (i.e. a very selective, restricted subset of research deemed “excellent” by a small group of appointed experts).
We believe that the decision-making power over what research is relevant belongs in the hands of the user – different use cases require different data scopes.
In Dimensions, this has been realized by implementing various journal lists, like Pubmed and the Australian ERA 2015 journal list, which can be used to refine searches. And Dimensions is prepared to host community-provided lists to support discipline-specific journal sets, national selections, or quality-driven subsets of research; we are working with the community on integrating some at the moment.
We’ll continue to work within these constraints by partnering with like-minded organizations to get research into Dimensions so that it can be used by the larger community.
Improved education and in-product “signposts”
The article authors point out that all vendors can promote the responsible use of metrics by “making it very clear what their sources are, how the indicators are calculated and what their limitations are (e.g. sample sizes and confidence intervals)” and offering “easy-to-find and comprehensive list of data sources for [their products].”
We enthusiastically agree, and add that in-product “signposts” could be added to most research indices to offer additional context for end users–for example, listing metrics’ definitions, calculations, and limitations in pop-ups and knowledge base articles that are easily accessible wherever any metrics are displayed.
We launched Dimensions with in-depth documentation to help our end users use our data responsibly, and we recognize that there are many ways we can make this information more accessible and robust.
What do you think?
Given the thoughtful challenges put to Dimensions and other vendors by the UKSG Insights article, what do you think we are doing well? How do you think we can improve? Please do share your thoughts here in the comments, or by tweeting us at @DSDimensions on Twitter!