Color Combinations and Readability

It goes without saying that the readability of text on a web page can be effected by the combination of colors a designer chooses to differentiate the foreground from the background. We have all visited a page where someone used something like black text on dark purple where successfully gleaning information from the page is virtually impossible. This problem would go away if all web pages were just black text on a white background, but this would be a pretty boring web. Further, we’ve also all been to pages where color combinations besides black on white were used and the text was still simple to use. The question, then, is where do we draw the line? When is the color combination unreadable to most people, and how can we “officially” know without simply relying on our own judgment?

Gurus

Web design guidelines often include recommendations for appropriate color combinations, many of which recommend high contrast between text and background with particular emphasis on the traditional black on white. “Web gurus” are quick to make definitive statements about design and readable text, as exemplified by Jakob Nielsen:

Use colors with high contrast between the text and the background. Optimal legibility requires black text on white background (so-called positive text). White text on a black background (negative text) is almost as good. Although the contrast ratio is the same as for positive text, the inverted color scheme throws people off a little and slows their reading slightly. Legibility suffers much more for color schemes that make the text any lighter than pure black, especially if the background is made any darker than pure white (Nielsen 2000, p 125).

Unfortunately, Nielson does not offer any references for this statement. In fact, an examination of the small amount of research that exists on this topic indicates that the relationship between text-background color combinations and readability is not at all clear-cut.

Research

Most of the research that examined the readability of text on a computer screen as a function of foreground-background color combinations was done prior to the world wide web. One of the most consistent findings is that the effects of colors on readability are not consistent (Radl 1980). For example, one study failed to find any significant difference among 24 different color combinations on performance with a text search task (Pace 1984). On the other hand, regardless of the specific color combination, higher levels of contrast appear to lead to better readability (Bruce and Foster 1982; Radl 1980).

There are very few experimental studies of readability of web pages, based on font/background colors (Hill and Scharff 1997). One exception is a series of two experiments conducted by Hill and Scharff (Hill and Scharff 1997, Hill and Scharff 1999). The results of these studies are consistent with the pre-web research in that higher contrast was generally found to be more readable. However, again, the relationship was far from perfect. For example, they found that green text on a yellow background and black text on a light gray background were both more readable than black on white in different experiments. Just to make matters more complicated, their results often differed as a function of font-type. So, for example, the readability of green on yellow mentioned above was primarily with times new roman, but was not as strong with aerial font.

Now, at some point, you might wonder how these researchers determined whether or not a given color combination was more or less readable. If you did wonder this, it’s very good, because this is always an important question you should ask about research that involves humans and theoretical constructs like “readability”. In virtually all of the studies mentioned above readability was measured via a simple ("low level") task such as a search task. So, for example, a user is asked to find a given word within some body of text, and the quicker the users find the word, the more “readable” the font-text combination is deemed to be.

Another method of conceptualizing “readability” would be to ask users to simply rate the readability of given combinations. Although this is less “objective” on one hand, it has the advantage that the human may know to take into account other factors than just those things that would lead to better performance on a search task. “Reading” usually involves more than simply finding a given word within a body of text. In fact, this is exactly what Hill and colleagues did as a pre-requisite for one of the web experiments. They collected data from a web site where they asked many subjects to rate the combinations of many different color combinations. Their general findings were: 1) Black and white were consistently rated as the most readable; 2) Color combinations that included black were rated more readable than those that did not; and 3) Darker text on lighter backgrounds were rated higher than lighter text on darker backgrounds.

Retention

A student of mine, Patrick Hanna, and I conducted an experiment that addressed this issue of text/background colors combinations and readability with some twists (Hall & Hanna, 2003). First of all, instead of using basic processing measures as our objective measures of readability, we used recall tests, which is a much “higher level” type of processing. In education sites, of course, recall is an important factor, but it is also important in other contexts, such as e-commerce, where the site is much more usable if a customer can retain information from one page to the next. We also explicitly included different types of information. We had participants study pages that contained educational information about a nerve cell (neuron), and we had them study information about a fictional television (the “hallaview”). Our study was also different from those above in that we included various rating measures. We not only asked participants to rate the “readability” of the pages, but we asked them to rate issues related to aesthetics and the degree to which this effected any tendency to buy. More specifically, our ratings measures fell into four categories: readability, aesthetics, professional looking, and behavioral intention (the degree to which the colors made the participant want to buy a product).

Participants read one of the text passages (education or commercial) then completed a recall test and rating, they then read the other text passage followed by the same measures. To avoid bias for order effects, order of passages was counter-balanced for participants, so half read the education passage first and the other half read the commercial passage first. We selected four different font/background combinations: black/white, white/black, dark blue/light blue, and teal/black. (Materials we used). The participants were randomly divided into four groups based on the color combination of the text, so that each participant only viewed one combination

We found that participants’ recall did not significantly differ as a function of font color. So the color combination of the text had little effect on how well they retained the information. This is certainly not something you would predict based on Nielsen’s quote above. As for the ratings, on the other hand, the students did rate the black text on white text as significantly more readable than others. Interestingly, they rated the light blue on dark blue very high as well, in comparison to the white text on black background. This is interesting, because the white text on black background is maximum contrast difference. Though this commentary is about readability, not aesthetics and behavioral intention, in case you’re wondering, participants rated the dark blue on light blue as most pleasing and the black on white as most professional. They did not rate any specific color-combination as significantly influencing tendency to want to buy a product.

Algorithm

Description

It would certainly be helpful to us, as web designers, if we had some sort of formula based on some quantifiable characteristic of the colors to determine “readability”, so that we could just apply this algorithm and know for sure if the combination was readable or not. In fact, such an algorithm does exist though (as we will see) it’s not perfect. The w3c has a specific recommendation for acceptable color combinations that are based on the RGB levels of given colors.

W3c’s recommendations are based on two formulas that yield two different contrast tools. One formula represents difference in hue and the other represents difference in brightness. For both, you only need to know the RGB representation of the foreground and background.

The following is a formula for representing difference in hue

Maximum ((Text R – Background R), (Background R – Text R)) + Maximum ((Text G – Background G), (Background G – Text G)) + Maximum ((Text B – Background B), (Background B – Text B))

The following is a formula for representing brightness

((R X 299) + (G X 587) + (B X 114)) / 1000

The difference in brightness is the absolute value of the difference between the text vs. background brightness.

According to w3c, a brightness difference score of 125 or higher and a hue difference of 500 or higher represents acceptable contrast for readability.

Not surprisingly there are web sites where these scores can be calculated automatically. For example, Juicy Studio: Color Contrast Test allows you to make these comparisons by entering the hexadecimal code for each color.

Validation

The w3c recommended algorithm was developed by Chris Ridpath and colleagues. They also published a validation study on the web. In psychological measurement (often referred to as psychometrics), the term validity refers to the degree to which a measure actually represents what it is supposed to measure. So, in the case of these readability formulas the question is, do they actually represent the degree to which people find a given combination readable? In order to answer this question, Ridpath and colleagues created 42 versions of text on a web page background and classified each on a seven point scale from most readable to least readable, based on their hue and brightness difference scores. They then had several participants rate these pages based on readability. They found that there was, indeed, a strong and significant relationship between the readability ratings the subjects gave and the readability difference scores their formulas yielded. However, the relationship was not perfect and there was quite a bit of variance, in particular, there were some substantial outliers. There were some cases, for example, where some participants rated the web pages high on readability where the algorithm rated them as quite low and vice versa.

References