From open principles to reliable analytics infrastructure

In a recent Digital Science blog, Simon Porter wrote about “research information citizenship” and the idea that there are no shortcuts when building research infrastructure. Persistent identifiers such as DOIs, ORCID iDs and ROR IDs have strengthened the foundations of the scholarly ecosystem. But as Porter argued, building infrastructure is only the first step. The harder challenge is ensuring that the metadata flowing through that infrastructure is complete, consistently deposited and sustainably maintained.

For institutions and foundations, this is not an abstract issue. It directly affects the reliability of the data used for benchmarking, reporting, and strategic decision-making.

Recent discussions around affiliation metadata coverage in large open catalogues has illustrated how fragile research intelligence can become when upstream metadata flows are inconsistent. Where publishers do not deposit affiliation data into Crossref, downstream systems are often forced to supplement those gaps by scraping publisher websites, a method that depends on continued, often informal, access to those sites. When publishers introduce bot-detection measures or change their technical infrastructure, that access can disappear without warning.

For universities, foundations and funding bodies, the consequences are tangible. Publication counts can shift unexpectedly. Peer comparisons become harder to interpret. Grant-to-output tracking weakens. Reporting to boards, governments or donors becomes more difficult to defend. Even tools that rely on affiliation data for author disambiguation and research integrity checks can be affected.

Porter’s core argument is that openness cannot be claimed by declaration alone. It has to be earned through proper implementation and sustained stewardship. When platforms that aggregate research metadata from multiple sources achieve the appearance of completeness by drawing on information that is not genuinely open, they can disguise fragility rather than resolve it. Gaps at the source still propagate through the system; they are simply harder to see.

Dimensions demonstrates an alternative approach to building analytics out from an open core. Drawing on the shared infrastructure of Crossref, DataCite, ORCID and ROR as its foundation, Dimensions supplements gaps through formal data partnerships with publishers and structured metadata ingestion, rather than methods that depend on informal or fragile access. The goal is not to claim completeness, but to ensure that institutional analysis is grounded in metadata sources that are transparent and sustainable.

Beyond providing comprehensive analysis across open and closed data sources, the current metadata landscape means that formal data agreements are still necessary to secure basic access to complete records. Until the open core is truly complete, with affiliations, funding information and organisational identifiers consistently deposited at source, the way a platform fills those gaps matters. Where access depends on formal agreements with clear provenance, institutional analysis is far less likely to fluctuate because of changes in web access or incomplete metadata deposition.

Trustworthy research intelligence is not built overnight. But by combining open infrastructure with structured data partnerships, careful stewardship and a commitment to the kind of research information citizenship Porter describes, it is possible to build systems that institutions can rely on over time.

Want to see how Dimensions can deliver reliable, transparent research intelligence? Contact the Dimensions team to book a demo.