Papers, thousands of them!

Last week, I started a brand new, super exciting project. Aside from quickly brain-storming an awesome name for the project (VADR) I’ve spent the last week much like many researchers start a project – reading papers.

One of the biggest elephant-sized problems with this early stage project work, is the ever present horror that is the journal paywall. I’m not going to talk about that too much here because others have got that pretty well covered. What I wanted to talk about was the ancient art of filtering the torrential output of all of science down to just a couple of papers to read.

I think every PhD student or researcher has had that moment of horror at realising how many papers they need to read through to find That One.

Annually, there are 1.8 million articles published in any one of 28,000 journals, by somewhere between 5 and 9 million different researchers. Cumulatively, it has been estimated that there are around 50 million published papers in total.

Knowing which of these papers to read is a special skill that takes time to hone. On at least two separate occasions, I have finished working on experiments that last 1-2 months only to find that someone had already published that work. So to aid any research or PhD student needing to filter their impossible pile of papers down to something reasonable, here is my process for paper filtering.

Normally, over a period of several weeks, I horde papers like first edition books. If I see anything that might even be tangentially relevant to my work, I file them all neatly in my paper indexing program. The best way of finding some kind of structure is to find a good review paper that neatly covers the area you are working in.

Once my papers reach a kind of critical mass where I can’t possibly read or even scan-read all of them, I then start playing detective. Starting with the paper that looks like it has the most relevance for me, I read it carefully and then dissect every referenced factoid. With each of these referenced facts, I go to a new paper and quickly scan-read it for relevant information.

Before I do anything silly (like read the whole paper) the very first thing I do when chasing down important information is make sure that the particular fact I’m following is actually from the paper that referenced it. It’s not uncommon for a particular value to be referenced and re-refrenced several times. Obviously, these intermediate papers can be of value/interest but I can’t stress how important it is to make sure that the referenced fact/value you are using is referenced correctly.

Dammit Dave, read your papers properly
Dammit Dave, read your papers properly

A big part of the project proceeding my PhD research was based around a single value which had been re-refrenced 2x. Unfortunately, it turned out that the first person to reference the value had mis-read it and actually used the wrong one and none of the following 3-4 authors had noticed.

But I’m getting side tracked… back to scan-reading papers.

Scan-reading papers is not like scan-reading books or a magazine – when I scan-read a paper I focus on three things, in this order:

  1. Images
  2. Abstract
  3. Conclusions/discussion

[Cartoon… or well where a cartoon should be if I didn’t forget my cartoon drawing kit today. I’ll add one in later, for now close your eyes and imagine a hilarious stick drawing]

The images are the best place to start, because good authors often add some kind of image to explain their concept with better clarity than any 500 word abstract could ever manage. Images will often also have detail about experimental set-up and data structure – none of which is traditionally in the abstract.

Speaking of which, if the images look interesting, the next place I look is the abstract. The abstract will often give you some clue as to how focused the paper is likely to be on your particular area. Some authors and journals try not to put some details (such as the final conclusions) in the abstract. This is what I would technically term “being awkward” and in conjunction with my comments on paywalls, is a practice that should be punishable by something involving tar and feathers…

If the paper is still looking interesting/relevant, the last place I look will be the conclusions/discussion. This final chapter is often the most important part of the paper as this is where the author will clearly sum up their work and arguments, essentially providing a nice breakdown of any important details that might have been missed during a scan-read.

If by some miracle the paper still looks relevant, only then do I actually read it. If it fails any of the above checks, then it occasionally gets a few keywords added to it (to help searching later) and then it is returned to my giant digital library.

Reading papers can be daunting if you aren’t efficient in processing them. Finding ways to decimate information from them is vital to keeping track of relevant work in your area. While I have laid out my method here, I should stress that while this works for me – it might not work for you. Scan-reading information is a very personal thing based quite a lot on how you absorb information.

But everyone needs a method, because by my calculation – if you were to read everything published by the world’s researchers, then you’d be reading approximately 17 words a minute all day every day, 7 days a week 365 days a year.

Update: My terrible math was fixed by @NanoWire. You would need to read approximately 10,000 word per minute (or 3.5 papers per minute) not 17wpm. I will now go hang my head in shame.

Leave a Reply

Your email address will not be published. Required fields are marked *