Five Do’s and Don’ts for Using Digital Newspapers

By Nathaniel Zelinsky

Nathaniel Zelinsky is an MPhil student in Historical Studies at the University of Cambridge.

Digitized newspaper databases are an increasingly popular resource for young students of history. It is easy to understand their appeal to the “Google” generation: from the comfort of your own bedroom, you can access countless primary sources without going to a library. I personally use a lot of digitized newspapers in my MPhil thesis on Second World War propaganda. Unfortunately, I think too many professors, especially older ones, often point their students to newspaper databases without much practical advice.

This post contains my top five “do’s” and “don’ts” for first time users of digitized newspapers. This advice might be especially helpful for undergraduates or even high school students.

Lesson One: Do Contextualize[1]

Let’s start with an example: imagine that you’re interested in researching how the American press portrayed Adolf Hitler during the Second World War. You run a word search for “Hitler” in a newspaper database and receive 10,000 hits.

“Ask yourself: Was the article visually appealing? Did readers simply glance over it or read it in depth?”

Not all of these 10,000 newspaper articles are of the same historical worth. In real life, a front-page article was much more influential than an article that ran on page 23 tucked into the lower left corner, underneath two advertisements. However, on your computer screen, the front-page headline and the page 23 article both appear as PDFs divorced from their larger context.

Always click the “full-page” option to see where an article appeared on the physical page. Ask yourself: Was the article visually appealing? Did readers simply glance over it or read it in depth?

Just as all articles are not equal, not all newspapers are the same. Everyone knows the difference between the Daily Mail and The Times. But, sometimes, digital databases contain obscure papers. Before you cite a new source, research it. What were its political biases? What type of readership did it have? These are basic questions for historians, but they are easy to forget when you’re using digital databases that present every newspaper equally.

Lesson Two: Don’t Cherry Pick

Let’s assume you decide to research something obscure, like how the American press portrayed Franklin Roosevelt’s pet dog. You run a search and only find five articles. It is very tempting — but incorrect — to draw conclusions based on this limited evidence. You could easily and convincingly write:

“The press portrayed Roosevelt’s dog as an all-American icon. For example, consider the following five articles…”

Your reader would never know that only those five articles actually discussed the dog. With so many newspapers at your fingertips, you will always find something about your topic. Be honest with yourself when a few articles are actually insignificant.

Lesson Three: Do Some Math, Just a Little Bit

But what if you do find a treasure trove of information about Roosevelt’s pet dog? You will want to show your reader the wealth of data that you’ve uncovered. In this case, do some simple math!

Consider the following two statements:

(A) “The press mostly portrayed Roosevelt’s dog as an American icon. Here are some examples…”

(B) “Of 100 articles that mention Roosevelt’s dog, 90 portrayed it as an American icon. Here are some examples…”

Statement B is much more convincing than statement A. With digitized newspapers, it’s easy to tally up simple statistics. However, if you want to do this type of analysis successfully, you need keep meticulous track of every article that you read from the moment you start researching, so plan ahead.

Lesson Four: Do Know that You Can’t Search For Everything

There are some things for which you can’t search digitally, in particular comics, advertisements and photos. Keep this in mind, especially when you’re researching an image-based newspaper like the London Illustrated News.

Lesson Five: Do Use Microfilm and Hard Copies

When I started at Cambridge, I was unhappy to learn that the University Library did not subscribe to every available database of newspapers. Worse, many important wartime British papers have not yet been digitized. With a sour face, I reluctantly trudged up to the library’s dingy microfilm reading room.

“Had I relied on key word searches alone, I would have lost some of this broader context.”

In the end, though, being forced to use microfilm was an immeasurably useful experience. In the process of reading newspapers from “cover to cover,” I unintentionally immersed myself in the specific nuances of the precise moments of history I was studying. Had I relied on key word searches alone, I would have lost some of this broader context. I also found a number of important sources I would not have otherwise come across. In future projects, I think I might always begin with a microfilm edition of a newspaper for background research and then hone in on specific articles with a digital database.

You could also try to read a newspaper “cover to cover” electronically, but I find this is a slow and cumbersome, process because you have to wait a few seconds for each new newspaper page to load.

[1] On this lesson, I was largely influenced by an excellent article by Adrian Bingham: Adrian Bingham, “The Digitization of Newspaper Archives: Opportunities and Challenges for Historians,” Twentieth Century British History, Vol. 21, No. 2 (2010): 225-231.

2 Comments
  1. Thanks Nathan, this is an excellent article which I am sure that other students (and other researchers) will find very valuable. I would just add a couple of other points which might be useful to consider consider in addition.
    1. Take into account possible errors in digitisation which may affect your search. Old or broken typefaces, creases in the paper and manual errors such as putting the newspaper in the scanner the wrong way up (yes this really does happen more often than you think!) will result in gobbledygook text which the search engine can’t read. Such errors increase in probability when you have older material with source text which is not easily machine-readable, and/or the material was digitised a long time ago (because the technology driving optical character recognition and natural language processing has improved dramatically in recent years)
    2. If you are using any kind of quantitative measure to make claims about the representativeness of some element of your data (eg the portrayal of Roosevelt’s dog) it is worth thinking up and making clear to the reader what you know about the collection strategy which underlies the digitised content you are searching (was the original archive created in a systematic effort by some major cultural institution eg a national library like the BL or the Library of Congress, or is it a more ad hoc collection driven by the specialist interests of a particular collector? This is not to say that one kind of collection is more valuable than the other, that depends on the question you are trying to answer, but search engines are good at ‘smoothing out’ such differences.
    3. Finally, in relation to the last point, I absolutely agree with this and would add further that it is extremely important to remember that when you ‘read’ digitised newspapers through a search engine you are reading them in a way which nobody at the time had the option of doing (a point which is not true of contemporary newspapers as they are frequently read through all sorts of different filters (search engines, twitter, your friends’ Facebook walls, email, apps and via the newspaper’s own website).

    May 9, 2014
  2. Nathaniel Zelinsky #

    Hi Anne: I completely agree on all counts.
    To piggy back on your first point re: digitization quality. This might just be my own experience, but I have found the problem has a tendency to become especially worse with non 20th century newspapers. I’ve come across 18th and 19th century papers (in this case, American papers in the Readex database) with pretty spotty digital recognition. Not only were the original hard copies of poor quality leading to bad digitization, but the irregular typesetting and spelling meant the software had trouble picking up keywords.

    May 9, 2014

