How to deal with data errors – a ‘Text Adventure’

Welcome to an Errant Science text adventure. This text adventure is designed to fully test your ability to handle data in a lab environment. You will be faced with choices that will interrogate the very core of your understanding of data…. well, one choice. But that choice is so crucial that from it, we will be able to gaze into your very soul…. Or make snap judgements about one specific thing. Look, just enjoy it!

The Story begins

HamstersYou are in a lab. The Lab is small, mostly white-walled with a stained blue floor. You are a young research science student fresh out of graduate labs. You have a deep, and possibly tragic backstory that will have significant resonance with the plot. However, this is a science blog and I didn’t really want to spend too much time on it so please, just imagine that backstory as best you can – but one key detail needs to be that you have a hamster you love very dearly.

You have been tasked with performing a simple experiment to test out a new sampling process. It’s a quick experiment and only takes about an hour to collect the 10 data points you’ve been asked for. Like any good researcher, you record this data on the nearest scrap of slightly burnt, chemical stained paper.

Now, without going upstairs to process the data you can’t be sure but you think there is a chance that your first data point might not be quite right. In that moment of realisation the universe splits in to 5 parallel universes, each following the path of a different choice which you are about to make. If this was a movie, there’d be some kind of dramatic music signifying this – it would probably go dun dun dun DUUUUUN! But sitting in the lab you don’t realise this, you simply look at the data and…

…scrub out the first data point and do an extra run?

…kinda make that 3 into a 1 and fudge the data back into shape?

…leave the point in and do an extra run to balance out the bad one?

…leave the point in and go upstairs to do some basic data analysis?

…have a stretch and then sleep on the piece of paper that you wrote the data on?

Click on your chosen option to jump to that section and read the consequences of your split second decision. Or just scroll though and read the lot.

Scrub the first run

You data scrubbedYou look down at the data before you and smartly put a line through the first data point. You then realise it wasn’t quite straight and so put another one through to make it a kind of cross but that one isn’t quite central so you do a couple more to make it even. Eventually the data point is just a black smudge completely coloured with biro ink and has even broken through the paper in a couple of places. Satisfied with your data eradication skills you repeat the experiment one more time and add the additional point to the side. The data point satisfyingly matches all the other data. You report your findings, everyone is happy and no one suspects the error from before.

A few months later you are doing the same experiment again, this time the conditions have changed but it’s the same basic experiment. You run 10 sets and again, the first data point is wrong. You think nothing of it and again simply add an extra data point and remove the first one with the same furious biro application as before.

While these experiments are currently only small, eventually this work becomes part of a bigger project. Your data is one of a minor set behind the stability of an active compound used in a very specialist form of medicine for Hopping Hamster Syndrome. One of your reasons for wanting to help with this project is that your own hamster, Amsterdamus, developed this disorder after accidentally watching a TV segment on climate change denial, which caused him to hop with rage.

By calling in every favour you can think of, you get him into the trial of the new drug as his one hope of peace from his hopping condition. Amsterdamus’ trial day arrives and thanks to the use of an alphabetically sorted subjects list, Amsterdamus is first of 10 to be injected. One by one the other hamsters get better but little Amsterdamus continues to hop, the drug seemingly failed on him – there was something wrong with the first preparation which not only didn’t cure it but actually made things worse.

You trace the error and realise that signs of this problem stem back to your early experiments, which clearly showed a bias in that first data point. If only you’d made note of your error in your data you would have seen this pattern emerge earlier and saved your hamster. Unfortunately, by removing the data you doomed your poor hamster to constantly hopping himself to death. RIP Amsterdamus.

Fudge the data

Your data Fudge it!You look round quickly at the empty lab to check that no one is watching and very carefully (with your hand cupped round you writing) you change the 3 to a 1. You lean back and try to look relaxed but deep down you know you’ve crossed a line, you’ve just falsified lab data. But you remind yourself that it’s just a simple experiment and besides, the data was clearly going that way anyway.

After composing yourself, you go upstairs and present the data to your boss. He sits behind his desk and looks over your data set. Slowly, a frown deepens on his brow. “This is excellent work!” he says reaching for his phone. He dials a single number and after a short while says “Yes, we have one.” and hangs up. Almost before the handset has dropped into the base station, the door opens and a man in suit walks in. “Come with me please..” he says to you. You look to your supervisor who just nods, so with some confusion you follow the suited man from the room.

What happens next is a blur of poor writing and if this was a movie, then the rest of this paragraph would be narrated by a deep voiced actor with a motivation soundtrack, as a montage. You are taken to a new research institute where they explain that you are to be trained as one of the finest scientists ever to have lived. You are given access to facilities and resources you never dreamed of, and produce discoveries and inventions that other people can’t even comprehend. The weeks fly by and soon you have been there for months.

One day the man in the suit returns, with a box. “We may have made a mistake, we have doubts about your selection…” he explains. You show him your work and plead to stay. He explains that there have been questions over the results of your first experiment. “They want proof that you are committed and loyal to working here.” he explains further. He whips a theatrical cloth from off the box to reveal your hamster, Mittens. “I want you to look your hamster in the eye and tell him that you did the experiment honestly”.

You think about all that you have and all that you can now achieve with these resources you have available to you. With that in you mind, you lean down and look Mittens in the eye as he looks back full of love and admiration, and say “I did the experiment honestly and recorded all the data I collected.” In that moment you hear the man in a suit sigh with relief but also see a look of betrayal and disgust in Mittens’ eyes as he sees through your lie, as only a true pet can. The man in the suit slaps your back and says “I knew you were legit, thanks, keep up the good work.”

Over the months you and Mittens try to move on but he never trusts you again. Every time you hold him you can see his sad stare of betrayal. A few months later he dies of a broken heart, leaving you alone. Late at night all you can think about is “If only I’d not changed that data!” RIP Mittens

Do an extra run

You tut to yourself, and sigh – 10% of your data is junk! You immediately repeat the run, adding another data point to it. When you are finally finished, you look at the data and realise that now 9% of your data is junk – that’s still too high! You look at your watch irritably, it’s now just a little after going home time. But you’re not letting this win. You work out that to reduce the error rate down to a very reasonable 5%, you need to do almost double the results again. Fine, you grit your teeth and get it done – you’d rather have good data than go home on time today.

At run 19 your heart skips a beat. You look at the readout – another bad reading! “Fuck!” you curse. It was probably just tiredness that caused it, you reason. But you’ve worked well into the evening now and you’re not about to quit now! You quickly work out that if you work through the night, you can get 40 runs done and even out the bad data.

You data addedTwo days later, you are still in the lab. You haven’t slept a wink since that first experiment. As you continue to do the experiment, the error in the experiments keeps happening – seemingly at random. Sometimes you mange 20-30 runs without an error, before it goes wrong again and messes up the data! You are driven now by a burning desire to not let this stupid experiment win!

You wake up the following day – a fellow research is standing over you looking concerned. You try to explain that you can’t stop and that you have to continue the experiment. The researcher looks confused and tries to get you to go home. Partly out of exhaustion and partly because he’s a lot bigger than you and has a very determined look in his eyes, you go home. Stumbling in to your flat, you feel broken and lost and as if to add salt to your wound, you realise that your poor diabetic hamster, Datum, has sadly passed away – a victim of your obsession. RIP Datum.

Do some data analysis

You data labledYou pick up your data and head upstairs to your computer and begin processing the results. Like a good researcher, you instantly exclude the data point from the analysis and make a note in your lab book and the data sheet explaining that this is an outlier point. You also add a few notes on possible reasons for it, based on your observations at the time.

Months later you repeat the experiment with slightly different conditions – and the same pattern emerges. At first you don’t notice as months go by, but in looking over your past experiments you quickly realise that there is a similarity in the error between the experiments. You rush to the lab and in just a few short experiments, it turns out you’ve discovered cold fusion.

You immediately inform the world’s science community by publishing your work in PeerJ – an open access publication. (who is more than welcome to pay me in freebies for their inclusion in this blog post and the subsequent flood of free publicity it will surely generate, I bet as many as 1 or maybe even 2 people click that link!!) From the open source designs you upload to the internet, your work is soon replicated and providing free power around the world.

Within weeks, a special meeting of the Noble Prize committee is called and they award you a Nobel medal for “Best at Science, ever!” along with a billion dollar check, to fund any future research you want.

You return home after your trip to Sweden to pick up your prize, to your pet hamster Algernon. On seeing you, Algernon looks up from the special hamster mansion you built for him and greets you with the warm, fuzzy smile of a proud hamster.

Sleep on the paper

You data CATWhile looking at the data, you realise that you have a decision to make – you have a stretch and begin walking around on the piece of paper in circles. It is at that moment you remember you are a cat.

Like all cats, you are inexplicably drawn to sleeping on important paperwork and/or keyboards. From birth your mother, Berúthiel, taught you your skill for locating and selecting paperwork with vital significance, in deference to any other non-important pieces of paper. As you grew from a kitten, your desire to find important paper to sleep on drove you more than any other cat. While others were happy to simply sleep on paper they found around them, you wanted to get to paper that no cat could ever find by chance.

So you begin to enact a plan to slowly interact with the human world, become one of them, blend in. From this position, you build up status, and remain seemingly undiscovered during your search for important paper. Your research leads you to university where you study science to learn about the important ‘papers’ the humans keep talking about.

Aside from one near miss with a co-ed one drunken night while a student, no-one has realised your true identity. You even buy a pet hamster (which you name Luncheon) as part of your cover. Eventually you manage to get through graduate school and become a researcher in a prestigious lab, where you find yourself today.

And as you stare down at the paper in front of you, your feline instincts tell you that this paper has power, this paper has behind it, a number of important choices – all of which you can interrupt by sleeping on it. And so after years of planning and manipulation, you curl up and sleep.

You sleep the sleep of a victorious cat that has finally achieved its life’s work. You dream of your pet hamster, Luncheon, and how you will eat him as a celebration of your victory. RIP Luncheon.

/u/Lycopodium posted a great comment on /r/Labrats

I’m wondering about the difference between doing extra runs and recording outliers. Wouldn’t you want to do extra runs, not to bring the error rate down or reach some p-value, but to investigate the cause? Rather than just noting possible explanations…actually test them (if reasonable)? The payoff of a more reliable protocol, catching a problem, or a better understanding of what’s going on can be pretty valuable

I’d add this the post as it’s really good advice but I feel I’ve written enough harrowing hamster stories for one day so you’ll have to make do with the comment 🙂


