Error Culture Community Collection
The following table is a collection of practices from a variety of scientific disciplines that qualify as Questionable research Practices (QRP) and Honest Error (HE) in academic research. The table is explicitly not complete, meaning that practices not mentioned here do not qualify as QRP or HE. Example practices are roughly categorised chronologically according to the successive phases of an empirical scientific process (Column “Project stage”). We are aware that these phases are not represented in all areas of science. On a second level the research field or topic is narrowed down (Column “Topic”).
This table is not complete and we are seeking to identify more of such common practices from more areas of research or topics and project stages that are considered QRPs or HEs that regularly occur. We invite you to add lines into the table and extend the current variety of topics with anonymous examples, optimally providing sources to evidence of these practices. If your example falls into another category that does not yet exist both as a project stage or topic, feel free to create it.
| Project Stage | Topic / Field | QRP | Honest Error |
|---|---|---|---|
| Planning | Ethics | Change of plans After a research study was approved by an ethics committee, all relevant changes to the design that change the risk to (human) test subjects need to be re-evaluated (Mehta et al., 2023) and the approved protocols need to be followed (Silverman, 2007). | Anonymization Anonymization of personal data according to standards that are later found to be incomplete. In this case, anonymized data may be de-anonymized and personal data may be breached. (European Data Protection Supervisor and Agencia Espanola Protection Datos (AEPD-EDPS), 2021). |
| Planning | Pre-registration | PARKing Pre-registering after the results are known (Yamada, 2018). For example when (parts of) the data have already been collected and analysed*. Using this QRP, chances increase to be able to support the previously registered hypotheses and thereby increase one's own reputation. * particularly tempting when meta-analyses or review articles are registered and when existing datasets are reused, such as longitudinal studies or census data. | Unsuitable methods Pre-registering an unsuitable method for data capture, analysis, statistics, etc. (Krypotos et al., 2022). This could in principle be any error in the research process. As an example, think of planning (and pre-registering) parametric statistics when data are nonparametric, estimating the wrong number of observations based on effect sizes from literature. |
| Data generation | Simulation | - | Poor experimental design in simulation studies In the field of engineering and computer science, working with simulation studies to create artificial histories (e.g. to evaluate design decisions) is an important method. In the field of simulation studies, the so-called design of experiments is essential for the significance of the results; a poor experimental design can lead to a lack of validity of the results or even to erroneous results. This Honest Error is typically due to a lack of knowledge about this area of simulation. |
| Data generation | Transcription of interviews | Transcription decisions Using transcription decisions to influence data quality into a desired direction, e.g. omission of pauses indicating insecurity, or change of voice (### Ref). | Source destruction Losing original sources due to coffee spill on paper documents, defect of equipment (sound recorder, camera, etc). |
| Data generation | Sample documentation | - | Attrition in animal studies Many animal studies fail to report attrition (Holman et al., 2016) i.e. the number of animals that had to be excluded during an experiment because they showed effects seen as “outliers”, or because they died during the experiments. This practice is presumably relatively common, because the report of attrition is assumed to hinder a paper’s chances of publication. Yet, this information is crucial to estimate true effect sizes (Holman et al., 2016). |
| Data generation | Sample documentation | Attrition in animal studies Many animal studies fail to report attrition (Holman et al., 2016) i.e. the number of animals that had to be excluded during an experiment because they showed effects seen as “outliers”, or because they died during the experiments. This practice is presumably relatively common, because the report of attrition is assumed to hinder a paper’s chances of publication. Yet, this information is crucial to estimate true effect sizes (Holman et al., 2016). | Sampling bias In any field of research that does statistics on only a subset of cases from a given population, the selection of the sample can be unrepresentative for the given population by chance. In social sciences, psychology, medical research, etc, a sample can be unrepresentative when the search for participants is restricted to a specific kind of people. Searching for participants in a hospital will lead to a generally less healthy sample. In field research, sampling sites can be systematically biassed leading to false conclusions (Tancev & Pascale, 2020). Training data for Large language models may be biassed towards specific texts etc |
| Data analysis | Statistics | p-hacking Analysing data in various ways until a statistical test yields the desired p-value (often p<0.05), i.e. using the multiple comparisons problem strategically. In practice this could mean the analysis of many outcome variables of one experiment without a pre-specified hypothesis and reporting only the statistically significant results. This approach increases the likelihood of finding a significant result purely by chance and it's thus considered a faulty practice. (Stefan & Schönbrodt, 2023) | Spurious correlations Reporting a significant parametric correlation (e.g. Pearson’s R) between to measures which in fact is driven by an undetected (and true) outlier, independent subgroups (Makin & Orban De Xivry, 2019) or hidden factors which seem implausible (Young, 2001) |
| Data analysis | Post production | Filter parameters In time series data parameters of filters can be adapted to obscure specific details of the data. More seriously, temporal filtering can deliberately introduce effects. | Sampling parameters / aliasing The frequency of digitally sampling continuous time series data may obscure effects, e.g. (obviously) daily cycles in temperature can not be digitised when samples (temperature measurements) are taken only once per day. Annual cycles however can be shown with daily measurements. This effect is also highly relevant when sampling more rapid events or continuous signals like electric potentials, light intensity, sound, etc. Generally this problem is known as “aliasing”. |
| Data analysis | Machine learning | - | Machine learning without expertise Machine Learning methods can meaningfully augment traditional scientific methods, and techniques are often available “off the shelf” (Hutson, 2019). For researchers with low expertise in these highly complex methods, it is often hard to “do it right”. Thus, errors are prone to occur when ML is applied without a deeper understanding of the specific prerequisites and workflows (Kapoor & Narayanan, 2023) |
| Publication | Illustration | Image manipulation To give the readers a better understanding of the analysed data it is relatively common to include images of the actual data in a research article. To this end pictures are sometimes manipulated to be more beautiful for example by adding contrast or zooming in on the most relevant part, but also to show more information in one plot, for example by pasting together parts of multiple images to appear as one (van Rossum et al., 2022). Researchers recently found 19% of the screened papers from the medical literature containing “problematic images” (Berrío & Kalliokoski, 2024). For a deeper insight into the problem of image manipulation (especially for Western Blots) consider (Bik et al., 2016) | Figure captions / legends In the compilation of complex figures to illustrate an article mislabeled axes, plots, pictures, legends etc, often occur. While such mislabels can cause confusion to the reader, or may even lead to misinterpretation, it is an honest error that is hard to avoid. |
| Publication | Writing | Text-recycling It happens quite often that researchers are tempted to re-use text passages of their own previous texts, may they be published in articles, grant proposals or unpublished texts such as theses. Even when correctly citing the original (first) source of the text, this practice is sometimes seen as unethical. | - |
| Publication | Tactics | Salami Slicing Dividing a coherent body of research (or dataset) into smaller parts (salami slicing) can overrate the relevance of the original research from the perspective of the readers. Of course, the goal of Salami Slicing is to inflate one author’s publication statistics. Whether a body of research is unethically split or not, is often a matter of debate (Adams, 2022; Urbanowicz & Reinke, 2018). | - |
| Publication | Tactics | Parallel submission Submitting the same manuscript to different publishers increases the chances of publication for example because the author can then choose to respond only to the “milder” reviewer comments. When it leads to multiple publications of the same article it can even be categorised as FFP (Koçak, 2022). | - |
| Publication | Process | - | Last minute publications With a body of research that is growing faster every year, it is literally impossible for authors to be aware of all relevant publications in the field. This becomes especially relevant in quickly evolving fields. It often happens that highly relevant publications appear shortly before an own publication. Omitting to cite those last minute publications can be considered an honest error during the publication process. |