The idea that the brightness of colors is associated with positivity is not new in terms of everyday experience nor in terms of scientific thinking. Numerous studies have indicated a brightness-positivity association that applies cross-culturally (Adams & Osgood, 1973; Saito, 1996; Valdez & Mehrabian, 1994). This notwithstanding, only a handful of studies (Hemphill, 1996; Lakens, Fockenberg, Lemmens, Ham, & Midden, 2013; Lakens, Semin, & Foroni, 2012; Meier, Robinson, & Clore, 2004; Meier, Fetterman, & Robinson, 2015) tested this association directly.

Different theoretical accounts have been proposed to explain the brightness-positivity association. One account, supported by Meier and colleagues (2004, 2015) and Huang, Tse, and Xie (2017) bases itself on conceptual metaphor theory (Lakoff & Johnson, 1980). According to conceptual metaphor theory people can represent abstract concepts (such as valence) in terms of concrete concepts (brightness) through metaphoric associations. Thus, since brightness is metaphorically connected to valence a brightness-positivity association arises. This account suggests that the association between brightness and positivity is a learned association. In contrast, based on the indications of high cross-cultural similarity (e.g., Adams & Osgood, 1973; Saito, 1996) we (Specker et al., 2018) recently postulated that the association between brightness and positivity is an automatic and universal association that most likely arises early in development. We used the Implicit Association Test (hereafter: IAT, Greenwald, McGhee, & Schwarz, 1998) as well as an explicit rating task to assess the association between brightness and positivity. Our results were consistent with these hypotheses, indicating that this association is both automatic as well as universal.

These notions are also in line with accounts such as Lakens et al. (2013) who proposed that people have a “brightness bias” where the overall brightness of a stimulus gives it an implicit marker of positivity. More recently, building on this idea of brightness as an implicit marker of positivity, Schietecat, Lakens, IJsselsteijn, and De Kort (2018) proposed a dimension-specificity hypothesis. This hypothesis builds on the work of Osgood, Suci, and Tannenbaum (1957) whose research focused on the meaning that people assign to concepts, in our case assigning the meaning of “positivity” to bright stimuli. In their seminal work they distinguished three broad categories of meaning, namely evaluation, activity, and potency. Within this context, the dimension-specificity hypothesis proposes that the attribution of meaning (e.g., “positivity”) depends on the saliency of the context. They propose that brightness will be associated with calmness when the evaluation dimension is salient (essentially showing a brightness-positive association). In contrast, this hypothesis predicts that brightness will be associated with aggression when the activity dimension is salient (essentially showing a brightness-negative association).

Our previous findings (Specker et al., 2018) would be in line with the dimension-specificity account. In the studies by Schietecat et al. (2018) the assumption is made that brightness is automatically associated with positivity and that thus by manipulating brightness of the stimulus one can make the evaluation dimension salient. Both our and their results seem to support this idea and indicate that brightness is in fact automatically associated with positivity.

Nonetheless, previous studies supporting this idea have been limited. As mentioned, only a small set of studies directly investigated this effect (Hemphill, 1996; Lakens, Fockenberg, Lemmens, Ham, & Midden, 2013; Lakens, Semin, & Foroni, 2012; Meier, Robinson, & Clore, 2004; Meier, Fetterman, & Robinson, 2015), crucially, the majority of these studies investigated achromatic brightness differences (white vs. black). The information these studies provide about the association between brightness and positivity when looking at luminous brightness differences (bright vs. dark) is therefore limited. Especially because in regard to the two studies who did not use achromatic stimuli, one used color patches of different hues that were not controlled for brightness or saturation (Hemphill, 1996) and the other (Lakens et al., 2013) used complex naturalistic pictures.

Both our study (Specker et al., 2018) and the study by Schietecat et al. (2018) addressed this issue. A benefit of our design was the more pure focus on finding the boundary conditions of the effect as well as having better controlled stimuli in comparison to Hemphill (1996) and Schietecat et al. (2018). In the case of Hemphill (1996) participants saw colored cardboard rectangles of 10 different hues. Unfortunately, information on how these cardboard rectangles were produced or under which conditions they were viewed was not reported. In the study by Schietecat et al. (2018) viewing conditions were not held constant. Since participants completed the experiment at home, the monitor settings were not controlled and color presentation may have varied widely as a result. Additionally, in both cases, participants may have completed the task at different times of the day leading daylight conditions to vary which potentially influences the results. In fact, Schietecat et al. (2018) explicitly manipulate this in Experiment 2A and 2B and found that if the surrounding light is brighter this also acts as a cue for positivity, though only with achromatic stimuli.

In sum, our original study (Specker et al., 2018) extended the previous research by improving methodological design and showing consistent effects. In addition, it has the potential to be useful for further research efforts (e.g., investigating the dimension-specificity hypothesis) that assume that an automatic association between brightness and positivity exists. To add more substance to the claims made in our original study we decided to replicate our results. We preregistered the replication on the Open Science Framework: https://osf.io/uafyb/. In the current study we present a direct replication of Study 1 and Study 2 of Specker et al. (2018). Both studies use the IAT; in Study 1 color patches of the same hue are used, in Study 2 black-and-white stimuli are used. In our original study (Specker et al., 2018) we improved the control over the chromatic color stimuli in comparison to previous studies, nonetheless, our control was not perfect and our manipulation of brightness also led to small variations in hue. Therefore, we conducted Study 2 with achromatic stimuli to exclude potential confounds of hue and/or saturation. In the original study only Study 2 included an explicit measure, for our replication we also included this measure in Study 1. If an association between brightness and positivity exists this should apply to both stimulus types as well as both measurement methods. We will first discuss the replication of Study 1 (also Study 1 in our replication) and then the replication of Study 2 (also Study 2 in our replication). Finally, we will present results of an explorative meta-analysis of both the original study and the replication to allow for a comprehensive overview of the effect and investigate potential moderators. Though the term explorative is not usually applicable to meta-analysis we use it here to emphasize that this part of our analysis was not included in our preregistration.

Study 1

Participants

We based our power analysis on the effects reported by Specker et al. (2018), the smallest effect size we reported in that study was a Cohen’s dz of .90, this was the effect size of the explicit association between brightness and positivity when using black and white stimuli in a Western cultural context. To err on the side of caution we used a slightly smaller estimated effect size of .80 for our power analysis using the G*Power software (Faul, Erdfelder, Lang, & Buchner, 2007). This indicated that we needed 15 participants to achieve a power of .80 with an error rate probability of 0.05. We therefore collected data from 15 participants. All participants were psychology students of the University of Vienna and received course credit for their participation. Eight participants were female and the mean age was 22.93 (SD = 3.96). The experiment was carried out in accordance with the Declaration of Helsinki and in a procedure which was approved by the local ethical committee of the University of Vienna. We pre-registered an exclusion criterion of a minimum accuracy of 60%, meaning that anyone below that threshold would be excluded. Since none of our participants met our exclusion criteria we included all participants in our analysis.

Method

Our method, materials and procedure was identical to Study 1 of Specker et al. (2018), thus we used an IAT the only difference being that we included an explicit rating task. The IAT and the explicit rating task were counterbalanced. Meaning that half of the participants first completed the IAT and then gave explicit ratings and the other half first gave explicit ratings and then completed the IAT.

Materials

IAT task. We used the standard IAT design, that includes error feedback for incorrect responses. In this case, the error message remains on the screen until the correct key is pressed. Key assignments were visible throughout the task as recommended by Gawronski, Deutsch, and Banse (2011). We opted to counterbalance the key assignments in order to be able to eliminate block order as potential confound. Meaning that half of the participants first had a congruent blocks (i.e. bright/positive and dark/negative) and then incongruent blocks (i.e. bright/negative and dark/positive) and the other half first had incongruent blocks followed by a congruent blocks.

Explicit Rating Task. In the explicit rating task people were asked to rate each stimulus on positivity. People were asked: “How positive is this color patch?” and answered on a 7 point scale ranging from 1 (negative) to 7 (positive).

Stimuli. We created 18 color patches differing on brightness as stimuli for the IAT task. All color patches were blue. Nine color patches represented the bright category and nine the dark category. The bright category ranged from 10–30% brightness with increasing steps of about 2–3%. The dark category ranged from 70–90% brightness with increasing steps of about 2–3%. All images can be found here: https://osf.io/w8479/. As a manipulation check all stimuli were measured by use of a colorimeter (i1Xtreme by X-Rite) this was done by presenting the stimuli full screen on a CRT monitor and then taking the measurements. Measurement was carried out on the monitor that was used by the participant during the test. For this manipulation check we focused on the luminance measures of the LAB color model. This showed that the dark category was lower in luminance (and therefore darker) than the bright category, t(16) = –15.49, p < .001. All stimuli were presented against a grey background (RBG: 128, 128, 128; LAB: 64.3, –4.4, –23.1). To ensure consistency of color presentation across subjects the IAT task was conducted in a windowless room with constant lighting conditions and on a CRT Monitor.

To represent the positive and negative category, we selected words from the Berlin Affective Word List Reloaded (BAWL-R, Võ et al., 2009). The BAWL-R is a cross-validated resource of affective words in German that have been rated for valence and arousal. Negative words included words such as asocial, sad, and loveless (respectively, “asozial”, “traurig”, and “lieblos” in German) and the positive words included words such as loved, great, and coureagous (respectively, “beliebt”, “toll”, “mutig” in German). The words selected to be in the positive and negative category differed in valence, t(16) = 63.41, p < .001, but not in arousal, p = .77. We matched for word length, on average negative words had a length of 7.22 characters and positive words of 6.44 characters, this difference in length was not significant, p = .244.

Experimental scripts and related stimuli material is freely available on the Open Science Framework: https://osf.io/w8479/.1

Results

As in Specker et al. (2018) a D-score was calculated as recommended by Greenwald, Nosek, and Banaji (2003) for every person. The D-score ranges from –1.5 to +1.5. Typically, D-scores of > .65 are seen as large effects. Since the IAT is a relative measure a positive D- score indicates a larger association between brightness and positivity than brightness and negativity. As a consequence this also means a larger association between darkness and negativity than darkness and positivity. This indicated a large effect with the mean D-score in our sample being .65 (SD = .37). As in Specker et al. (2018) we then conducted a one-sided one sample t-test. This showed a significant difference from 0 in the expected (positive) direction, t(14) = 6.58, p < .001, Cohen’s dz = 1.73,2 CI [.91, 2.51]. For the explicit association, as in Specker et al. (2018), a difference score was created by subtracting the standardized mean rating of the dark category from the standardized mean rating of the bright category. We then conducted a one-sided one sample t-test over the difference score. This did not show a significant difference from 0, t(14) = –.90, p > .05, Cohen’s dz = 0.23 CI [–.74, .28]. Finally, as in Specker et al. (2018) we computed the correlation between implicit and explicit association. The correlation between implicit and explicit association was r = –.39, p > .05, as illustrated in Figure 1.

Figure 1 

Scatterplot of the correlation between Explicit and Implicit (D score of the IAT) association. To aid interpretation lines are drawn on the 0 point of both axis. Because both measures represent a difference score, a positive score, thus above the 0 point on the horizontal axis and right from the 0 point on the vertical axis, represents an association between brightness and positivity.

Discussion

Study 1 replicated the findings of Study 1 of Specker et al. (2018) meaning that both studies found a large effect for the implicit association of brightness and positivity. Study 1 did not show an explicit effect for the association between brightness and positivity that should theoretically have been there, we therefore interpret this as a failure to replicate. The most probable reason for this failure to replicate – besides a simple Type 2 error – is the fact that when varying brightness we did not perfectly control for variation in hue or saturation. Therefore, it is possible that effects found are due to either hue and/or saturation rather than brightness. This was also the main reason of the original study to perform a second study with achromatic stimuli. If similar effects are found with achromatic stimuli, one can be more secure in arguing that this was due to brightness. For an absence of the effect, it might be that because these stimuli were less controlled, the effect disappeared. This becomes more likely when combined with the findings of the original study that the explicit rating effects are smaller than the implicit rating effects. Though it has to be mentioned that the smaller effect size is likely due, at least in part, to the fact that reaction time based measures with a lot of trials have less variance and therefore larger effect sizes than a one-trial explicit rating. Another likely explanation for the absence of an explicit effect, is that in the explicit rating task people confounded “positivity” with “liking”. In other words, if someone saw an image they liked that was dark they may have perceived it as more positive, similarly, when they saw a bright image they disliked they may have perceived it as more negative. In any case, this seems to suggest that stimulus type and measurement type influence the size of the effect. In this case, the odds were stacked against finding the effect since it combined lesser controlled stimuli as well as explicit rating as a measurement. This is a notion we will investigate in more detail later by use of meta-analysis.

Study 2

Participants

We based our power analysis on the effects reported by Specker et al. (2018), the smallest effect size we reported in that study was a Cohen’s dz of .90, this was the effect size of the explicit association between brightness and positivity when using black and white stimuli in a Western cultural context. To err on the side of caution we used a slightly smaller estimated effect size of .80 for our power analysis using the G*Power software (Faul et al., 2007). This indicated that we needed 15 participants to achieve a power of .80 with an error rate probability of 0.05. We therefore collected data from 15 participants. All participants were psychology students of the University of Vienna and received course credit for their participation. Eleven participants were female and the mean age was 19.67 (SD = 1.39). The experiment was carried out in accordance with the Declaration of Helsinki and in a procedure which was approved by the local ethical committee of the University of Vienna. We pre-registered an exclusion criterion of a minimum accuracy of 60%, meaning that anyone below that threshold would be excluded. Since none of our participants met our exclusion criteria we included all participants in our analysis.

Method

Our method, materials and procedure was identical to Study 2 of Specker et al. (2018) and Study 1 of the current paper with the exception of the color stimuli. Rather than having variations of blue we constructed achromatic stimuli. These employed a ratio from black-to-white. The bright category consisted of 8 color patches which were filled 10 or 20% black (with 90 or 80% white respectively). The dark category consisted of 8 color patches which were filled 10 or 20% white (with 90 or 80% black respectively). All images can be found here: https://osf.io/w8479/. Again, the IAT task and the explicit rating task were conducted in a windowless room with constant lighting conditions on a CRT monitor.

Results

We used the same analyses as in Specker et al. (2018) and Study 1. Thus, we calculated a D-score for every person as recommended by Greenwald, Nosek, and Banaji (2003). This indicated a large effect with the mean D-score in our sample being .95 (SD = .23). We then conducted a one-sided one sample t-test. This showed a significant difference from 0 in the expected (positive) direction, t(14) = 15.77, p < .001, Cohen’s dz = 4.07, CI [2.49, 5.64]. For the explicit association, we created a difference score by subtracting the standardized mean rating of the dark category from the standardized mean rating of the bright category. We then conducted a one-sided one sample t-test over the difference score. This showed a significant difference from 0 in the expected (positive) direction, t(14) = 11.88, p < .001, Cohen’s dz = 3.06, CI [1.83, 4.29]. Finally, we computed the correlation between implicit and explicit association. The correlation between implicit and explicit association was r = .63, p < .01, as illustrated in Figure 2. As can be seen from Figure 2 this correlation was mainly driven by one data point (the one on the far left), to test this hypothesis exploratively we calculated the correlation without this outlier. Without the outlier the correlation was r = .067.

Figure 2 

Scatterplot of the correlation between Explicit and Implicit (D score of the IAT) association. To aid interpretation lines are drawn on the 0 point of both axis. Because both measures represent a difference score, a positive score, thus above the 0 point on the horizontal axis and right from the 0 point on the vertical axis, represents an association between brightness and positivity.

Discussion

Study 2 replicated the findings of Study 2 of Specker et al. (2018) meaning that both studies found a large effect for the implicit and explicit association of brightness and positivity. Taking these findings together with the findings of Study 1 we can conclude that we replicated the effect in 3 out of 4 cases. There are two possible explanations as to why we were able to replicate the explicit rating effects in Study 2, one would be that it is due to the higher level of control over our stimuli, the other option would be that positivity (i.e. the evaluation dimension) is more salient in achromatic stimuli. Since black and white are stronger bipolar opposites than luminous differences in brightness this increases saliency and therefore leads to larger effects. In any case, these results like the results of Study 1 seem to suggest a moderating effect of both stimulus type as well as measurement type on the size of the effect. We will now investigate these notions in more detail as well as to give a more comprehensive estimate of the effect by use of an explorative3 meta-analysis.

Meta-Analysis

We ran a fixed-effects model over the results of both the original study Specker et al. (2018) and our replication. We chose a fixed-effects model rather than a random effects model because the fixed-effects model assumes homogeneity in estimated effect size. Though this assumption often does not hold up when conducting a meta-analysis over a large amount of different studies, the high similarity across our studies made this the best suited model. The results are illustrated in Figure 3. The fixed effects model indicated a meta-analytic effect of 1.31 [1.12, 1.51]. However, there was residual heterogeneity not explained by the model, Q(8) = 86.88, p < .001. This was coherent with previous descriptive comparisons of our included studies which indicated that the small variations of our study design seemed to influence the effect. We therefore ran a fixed-effects model with moderators to see if this was able to explain the residual heterogeneity. As discussed above, our studies employed two different stimulus types (chromatic vs. achromatic) where the achromatic stimuli represent better controlled and more salient stimuli. In addition, when comparing the results descriptively both this replication and the original study reported larger effects when measuring the effect implicitly in comparison to an explicit measurement. Though it has to be mentioned that the smaller effect size is likely due, at least in part, to the fact that reaction time based measures with a lot of trials have less variance and therefore larger effect sizes than a one-trial explicit measure. We therefore used stimulus type (chromatic vs. achromatic) and measure type (implicit vs. explicit) as moderators. The fixed-effects model with moderators showed a significant effect of the moderators, Q(2) = 81.43, p < .001. In addition, there was no residual heterogeneity, Q(6) = 5.44, p > .05, this indicates that all heterogeneity in the effects can be explained by the moderators. The regression coefficient of stimulus type is –1.37, p < .001. This indicates that when stimulus type is chromatic the estimated effect (strength of the association between brightness and positivity) decreases with 1.37. The regression coefficient of measure type is 2.06, p < .001. This indicates that when measure type is implicit the estimated effect increases with 2.06.

Figure 3 

Forest plot of the Fixed-Effects Model. All effect sizes within the plot are Cohen’s Dz values. All confidence intervals represent 95% confidence intervals. The size of the boxes represents the sample size (N) of the study.

General Discussion

We were able to replicate the effects of Specker et al. (2018) in three out of four cases. In addition, the meta-analysis indicated a large effect. In sum, this indicates that there is a strong and replicable association between brightness and positivity. However, the size of the effect is dependent on measurement type as well as stimulus type, as indicated by the meta-analysis. The effect is stronger when measured implicitly than when measured explicitly. As noted, this is likely due to the fact that reaction time based measures with a lot of trials have less variance and therefore larger effect sizes than a one-trial explicit measure. Another likely explanation would be that people confounded “positivity” with “liking”. In other words, if someone saw an image they liked that was dark they may have perceived it as more positive, similarly, when they saw a bright image they disliked they may have perceived it as more negative. In addition, the effect is stronger when using better controlled and more salient stimuli. These findings offer researchers interested in the effect concrete tools when designing a study investigating the effect with regard to effect size estimates for power analysis as well as stimulus and measurement design. Especially researchers who want to use brightness as a cue (e.g., to make the evaluation dimension salient) can use these findings in their research design. To facilitate future research efforts we have made the experimental scripts, analysis scripts, all related stimuli material and our data freely available on the Open Science Framework: https://osf.io/w8479/.4

Lakens et al. (2012) have suggested that the association between brightness and positivity may be explained by the strong association people have between black and negativity rather than white and positivity. Due to the relative nature of the IAT one cannot exclude this explanation based on our data. However, the inclusion of chromatic stimuli poses a potential argument against this reasoning since it is based on associations with achromatic stimuli. As noted in Specker et al. (2018), the hue we used (blue), is typically regarded as the most preferred color (e.g., Hemphill, 1996) while being at the same time being stereotypically associated with sadness (Soriano & Valenzuela, 2009). Thus, one can assume blue would be seen as positive (due to being liked) as well as negative (due to its association with sadness). We think it is more likely that people have a general brightness bias as later postulated by Lakens et al. (2013). This more symmetrical relationship is in line with the metaphorical account (Huang et al., 2017; Meier et al., 2004; Meier et al., 2015) where people can represent abstract concepts (such as valence) in terms of concrete concepts (brightness) through metaphoric associations. This notwithstanding, metaphors have to be learned. This is incongruent with the high cross-cultural similarity we found in Specker et al. (2018). In addition, the fact that the effect is larger when using implicit than explicit measures as indicated by the meta-analysis suggest an automatic association that does not rely on explicit metaphors. Admittedly, through learning processes people would be able to internalize explicit metaphorical associations which would lead to implicit associations. Nonetheless, if this were the case one would assume that the original explicit (metaphorical) association would be at least as strong if not stronger than the resulting implicit association. This is not the case, making the metaphorical account less plausible. Thus, our results are in line with accounts such as recently postulated that the association between brightness and positivity is an automatic and universal association that most likely arises early in development.

In sum, the study shows clear support to conclude that there is a strong association between brightness and positivity as well as it providing new insight as to under which conditions the effect is present.

Data Accessibility Statement

As noted in the main body of the text, our experimental scripts, analysis scripts, all related stimuli material and our data are freely available on the Open Science Framework: https://osf.io/w8479/. The original pre-registration can be found at: https://osf.io/uafyb/.