5 ways people screw up controls in their data

In the last few weeks I’ve reviewed a load of reports, papers, and some data for a colleague. Basically, I’ve spent a few weeks reading all about other people’s data. It’s nice but mildly frustrating that I’m not generating my own (about half my lab is down for maintenance).

The one constant in all the work that I’ve been reading has been the varying quality of controls. And by varying quality what I actually mean is that it’s mostly a steaming pile of garbage – I was just trying to be polite. Oh and while I’m trying to be polite, I obviously don’t mean the data of my colleague whose work is perfect and beyond reproach. In fact it seems odd I’d even mention that I was looking at their data alongside all this other terrible data… *ahem*.

Supervisor controls cropped

So to try and help people who are misguided as to the use of control data, I though I would summarise the top 5 ways I have seen people screw up running or presenting proper controls. If you are guilty of any of these then you deserve whapping on the nose with a rolled up virtual newspaper.

5. Guessing

Now you probably have a pretty good idea about what your control should do compared to your data. If you don’t then I think you might have problems beyond missing the control data.

But thinking you know what the control data will do is no excuse for using data that is the “theoretical baseline” or “zero”. Just because you think it should have no signal doesn’t mean that it does have no signal – who knows, maybe it’ll suddenly jump up and  sing Living La Vida Loca (only applicable to experiments in the early 2000s). You don’t know if you don’t run it.

So actually run the thing and check! I realise this means more work and possibly actual time in the lab but I do slightly feel this is a sacrifice probably worth making.

4. Measure a totally different thing

This sounds like a no brainer but when running a control, measure the same thing you are measuring in your experiment.

I saw some work where the sensor was a temperature probe inside some giant machine while it was running. The control wasn’t another type of sensor or even a calibration run of the probe it was the same temperature sensor slapped on the outside of the machine. Because obviously, internal and external temperature are the same things…

This isn’t a control run, it’s a totally different run of the sensor. That might actually be really interesting data, but it’s not a control.

3. Someone else’s data

Okay this one is a bit of a half point, because this is sort of okay in some circumstances, but they are rare so you’d better be sure before you go ignoring this step. I mean *really* sure… And I will check.

Sometimes you might be working with a technique or system that is crazy well established. Practically a standard in every laboratory in the world. So why run a control when you can just point to some excellent baseline/control data someone else has done?

Because that’s a terrible idea! What if there is some environmental issue in your lab that you’ve not realised or factored in. You won’t know because to you, it’ll look like good data when actually it’s because the lab cat likes to sleep on the machine while it’s running overnight. The whole point of using controls is to rule out eventualities and error you can’t predict. If you just download that data then you’re not really doing that. Personally, I’d argue it’s way worse than downloading a car.

2. Using old data

Stuff changes. As you develop your experiment, you will tweak and improve things. Often the best and most thorough controls you ever run are at the start of this process. And they’re the ones you might keep linking back to as you slowly optimise your clever thing.

Changing reagents or manufacturing conditions is mostly driven by how many tiny things you can now detect – but there are things that can magically change and mess with performance without the tiny things anywhere near it. Who knows what kind of weird effects you might get. You might find that drawing blue lines on pregnancy tests in felt tip pen really improves how well your machine reads them. But it’s probably going to affect the number of false positives on negative samples.

Unlike dogs, controls are not for life – they are only good until you go fiddling with something and should be discarded frequently. Even if you’ve named them.

1. What are controls

I haven’t quite worked out why but there are a large number of scientists with a severe crippling phobia of running controls. At least I assume they have a phobia, because it’s the only reason I can think of why they are seemingly so rare in papers. Maybe it’s just that control data is shy and even when written in to papers, it hides behind the acknowledgements.

Either way, there are plenty of papers where the data is presented all on its lonesome, without any sign of a control. I’m not sure if I’m more annoyed that people would publish it or that without the controls, it still managed to get past peer review!

The number of new sensors systems I see presented showing a nice strong reaction to X but no data for what happens if you don’t give it X.

If you take anything away from this blog post, it should be DO YOUR CONTROLS. The worst that could happen is that your idea isn’t actually any good, which is great – you now know not to waste your time with it and can move on to something far more productive, like spending an afternoon writing a blog post.

N.B. any comments about my use of the term data as a plural and a singular should be directed to this infographic

One thought on “5 ways people screw up controls in their data

  • May 11, 2016 at 16:05

    I’m quite tempted to bang my head against a desk when I don’t see controls in a paper in Ecology. ‘We changed x and the wild animals out in a largely uncontrolled natural setting did this. Textbooks suggest that without changing x they would have done something else’.


Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: