The famous and prestigious scientific journal Nature usually publishes a list of 10 scientists who have marked science during the year. For 2021, a Frenchman, Guillaume Cabanac, is one of the winners. A computer researcher at the University of Toulouse (France), he strives to detect the questionable content of numerous scientific articles. To better understand his approach, Sciences et Avenir interviewed him.

Sciences et Avenir: What are the terms “tortured prayers” referred to by you and your fellow researchers?

Guillaume Cabanac: When you are a researcher, you read articles to obtain information, you write articles to communicate new knowledge. With my scientific colleagues, we realized that in certain articles validated by the scientific community and published by the major publishers, we find incongruous, unexpected or even erroneous expressions, this is what we call “tortured expressions”. Instead of seeing “artificial intelligence” written in an article, it was marked as “false consciousness.” Thus, in the article, it can be found marked: “the falsified conscience allows the vehicle to move in the city.” Instead of “breast cancer”, you could write “breast danger”. It seems more of a poetic form, it can make you smile, but when you are a researcher and you work in a particular field, you have to use the terms that the profession establishes. If a person is working on “breast cancer”, he is not working on “breast danger”, it doesn’t make sense.

In 2005, researchers at the Massachusetts Institute of Technology (MIT) created software called SCIgen (Scientific Generator) that aimed to produce a science-like text from scratch, as a joke. They had used this software to create fake articles which they then sent to conferences or magazines that they deemed “predatory” (magazines employing fraudulent media, not allowing proper peer review, editor’s note). The researchers wanted to see if these companies would use these false documents in their productions and they did. In fact, it is a scam.

In practice, how can these strange scientific expressions present in many academic articles be detected?

These strange expressions are present in hundreds of articles yes, and in the best editorials. To identify these misstatements, we use a snowball mechanism. What is it about ? We scientists read articles and come across strange and surprising cases. We write them down and then add them back to our list. We check whether the expression, in all scientific fields, does not legitimately exist. We then run queries that we submit to Dimension, a search engine for scientific literature.

Thanks to a software (Problematic Paper Screener) that I created, every time we add a “tortured expression” to our application, we find more documents that contain nonsense. So every night in my lab, I run queries against my software to identify articles that contain nonsense, and in the morning when I wake up I look at what comes out, read it, and rate it. Is this an unfortunate use of a “tortured phrase” or is it the equivalent of fraud? Since there are thousands of articles and a scan takes 20 minutes, other scientists are helping us and posting the results on PubPeer, a post-publication review site.

In short, what contribution (s) do you hope (s) thanks to this research?

Each article containing “tortured sentences” reflects scientific misconduct, malpractice on the part of the authors. Most of the time, these authors went and copied the abstract of the articles they found interesting, then put it into software that changes the words to synonyms and pasted it pretending that they are the only authors. It is a new form of plagiarism that has not yet been detected by current anti-plagiarism software. I see it as contamination.

Scientific literature usually disseminates only proven facts, reliable information. The scientific article is that. You must be able to trust an article to advance your research. The American publisher Springer had created a complete book in 2019 using artificial intelligence. So imagine that if the sources used are not reliable, neither will the result.

For Covid-19, for example, there was a flood of scientific results and the doctors who were in the hospital did not have time to analyze the thousands of results that were falling every day. Therefore, abstracts were provided to them, supervised by software-assisted epidemiologists. But if we rely on unreliable sources, the synthesis can do more harm than good … Our work has already eliminated more than 800 publications and that is unprecedented!

