11 Culture and Ethics
-
Photo by Museo de Altamira y D. Rodríguez
11.1 Culture
Tools and strategies will only go so far. Culture is essential to building reproducible data analyses.
People who are scared to be wrong in public will not work as hard to find errors. They will be less likely to share their code and data and to adopt transparent and reproducible practies.
Instead of dunking on people for corrections or errors, we should celebrate their transparency.
11.1.1 Continuous Improvement
Continuous improvement is an ongoing practice to improve processes and outputs. Continuous improvement focuses on incremental improvement over breakthrough improvement.
There are many different continuous improvement models but most have at least three features:
- Analyze performance
- Identify areas for improvement
- Make incremental changes
11.1.2 Error Log
Code reviews, tests, and assertions are essential for analyzing performance and identifying areas for improvement.
Error logs are one tool for analyzing performance. Any time an error makes it past code review, document the error in a running tracker and note the plan for remedying the error. The errors can be analytic (e.g. 2 + 2 = 5
) or process (e.g. “we started too late on data collection and missed our deadline).
11.1.3 Blameless Postmortem
Incident postmortems are common in data engineering:
An incident postmortem brings people together to discuss the details of an incident: why it happened, its impact, what actions were taken to mitigate it and resolve it, and what should be done to prevent it from happening again. ~ Atlassian
A blameless postmortem approaches the incident postmortem without any cynicism or hidden agendas:
In a blameless postmortem, it’s assumed that every team and employee acted with the best intentions based on the information they had at the time. Instead of identifying—and punishing—whoever screwed up, blameless postmortems focus on improving performance moving forward. ~ Atlassian
The term “incident postmortem” hides some of the value of this approach.
- We don’t need an incident to host this type of discussion.
- This type of meeting need not be post-data analysis.
Holding regular meetings where it is assumed that everyone acted with good intentions to analyze performance, identify areas for improvement, and make incremental changes will improve collaboration and strengthen data analyses.
11.2 Ethics
We could go talk for eight more hours about the ethics of statistics and data science.
The social sciences are in a multi-decade transformation motivated by a series of major issues:
- Multiple testing
- Hypothesizing After Results are Known (HARKing)
- p-hacking
- Publication bias
Transparency and reproducibility help with some of these issues. Adopting version control at the beginning can help too. Pre-registration is a final tool that can help (Nosek et al. 2018).
Pre-registration is the process of submitting a pre-analysis plan. This differentiates hypothesis generation and hypothesis testing, which is necessary because the same data cannot be used to generate and test a hypothesis.