Detecting tortured phrases to unmask fake science

Take a well-used scientific phrase or term and put it through a ‘copy-paraphrase-paste’ grinder, where machine-paraphrasing tools handle the paraphrasing aspect, and the end product is a ‘tortured phrase.’ Breast cancer becomes bosom peril; kidney failure, kidney disappointment; solar energy, sun-orient force; and so on. The list of these tortured phrases is long, and what they represent is deception, misconduct, and fraud in science.

Guillaume Cabanac, Professor of Computer Science at the University of Toulouse, France, dubbed the deception sleuth by Nature, is on a mission to unmask the scientific fraudsters whose publications are marked by tortured phrases. In 2022 alone, Cabanac and his colleagues uncovered over 3,000 papers that contained tortured phrases, and some of these papers found their way into reputed journals from publishers like Elsevier and Springer Nature. So why are the tortured phrases considered “fingerprints” of scientific deception, misconduct, and fraud?

In a Dimensions webinar, “Using Dimensions to Weed Out Tortured Phrases Papers”, Cabanac explained that fraudsters began using machine-paraphrasing tools such as SpinBot to paraphrase text with synonyms and bypass the regular anti-plagiarism checks. The result, however, was often a paper that was riddled with awkward phrases in place of commonly used scientific terms. To detect these unreliable and often fake papers, Cabanac and his colleagues began submitting the tortured phrases fingerprint-queries to the Dimensions database.

Cabanac says that Dimensions was the database of choice because it provides an API to programmatically access metadata and full-text, and its coverage of the peer-reviewed literature is one of the most comprehensive. “Each bibliographic record comes with a set of metadata: title, byline, venue, publisher, publication year, DOI when available, among others. The document type (e.g., article, proceedings paper, monograph, book, and preprint), citation count, and Altmetric Attention Score are also provided,” Cabanac and his colleague and co-author Cyril Labbé write in their study Prevalence of nonsensical algorithmically generated papers in the scientific literature. To systematically and continuously track unreliable papers with tortured phrases, Cabanac and his colleagues developed the Problematic Paper Screener. This tool’s “Torture detector” scrutinizes the over 140 million publications indexed in Dimensions to uncover papers with tortured expressions.

Uncovering the problematic papers is, however, just one step in decontaminating the scientific literature, Cabanac explained in the webinar. The problematic papers, once detected, have to be flagged and retracted after due process. And often unscrupulous scientists cite each other’s work, and the retracted papers remain available in the public domain, he explained. In addition, the advent of large language models, like ChatGPT, that generate text could further complicate the problem of weeding out pseudo-scientific literature. But growing a community to detect problematic science and raising awareness about how to go about doing it is a step in the right direction, he said.

If you want to learn more about how Cabanac and his colleagues detect and uncover problematic science, we invite you to sign up to watch the recording of the Dimensions webinar.

If you want more information on Dimensions, contact the Dimensions team.