An overwhelming majority of behavioral science research participants are “WEIRD” (Westernized, Educated, Industrialized, Rich, Democratic; Henrich, Heine, & Norenzayan, 2010) and sampled from the convenient population of university undergraduate students; however the motivation of the research participants can vary. Often, research participants are motivated to participate by being compensated for their time with money, although many universities also encourage students to participate in research for partial course credit. Thus, some studies include participants who were compensated with partial course credit, some include those who were compensated with money, and some include a mix of these two groups of participants. The motivation of participants and method of compensation is sometimes, but not always, reported in manuscripts, and is rarely considered in analyses.
Despite the important practical implications for understanding whether the motivation to participate in research influences the results of a study, this question has receive little empirical consideration. In the discipline of economics, examining differences in performance between cash and credit incentives has received some attention, with most results suggesting that type of motivation had little effect on the results. For example, Brown, Kruse and Thompson (2001) compared how money versus extra class points influenced behavior during a risk-mitigating investment task. They found that the average amount participants would “pay” to avoid risk did not differ between the two forms of incentives. Komai and Grossman (2006; as cited in Luccasen & Thomas, 2014) found that monetary incentives versus extra course credit led to similar results in a group investment game. Finally, Luccasen and Thomas (2014) examined participant behavior in a partnered trust game with either monetary rewards or extra-credit at stake. They found that overall, participant performance was not significantly different for those in the cash versus credit incentive condition.
Of important note, in the economic studies detailed above, participants are performing tasks where the amount of money or course credit they subsequently receive is not compensation for their time, but is a direct result of their performance on the task. Many studies have been conducted to determine whether financial incentives influence task performance, particularly in decision making paradigms and studies testing economic theory (see Hertwig & Ortmann, 2001 for a discussion). Unlike these studies, in typical psychology experiments, participants will receive the money or course credit as compensation simply for performing the experimental task, regardless of whether they perform the task well.
In a series of experiments Brase, Fiddick and Harries (2006) found that, overall, participants who were compensated with pay outperformed those compensated with partial course credit on a cognitive task—a medical diagnosis problem. This effect was bolstered if the participants were from a top-tier compared to second-tier university and those participants who were from a top-tier university and compensated with pay performed the best. In a related study, a large sample of participants were asked to solve a Bayesian reasoning task (Brase, 2009). One third of the participants were given partial course credit, one third were given a cash compensation ($5) and the remainder were given cash compensation ($3) plus the incentive to earn additional money ($9) if the problem was solved correctly. Participants in the latter group, who received performance based cash incentives in addition to cash compensation for participating, performed better than the credit and flat-fee compensation group.
Although these results suggest that pay may enhance performance, a different conclusion was reached in a recent study examining performance in a more typical workplace environment. Kvaløy, Nieken and Schöttner (2015) asked students to come to the lab to enter data for 2-hours for pay. They found that when participants were given a bonus in their pay for performance, their performance actually suffered. This reduction in performance occurred unless participants were given a motivational talk in addition to receiving a monetary bonus. Their findings suggest that pay for performance – in addition to pay for coming in to perform a task – might actually decrease motivation to perform the task well. This performance cost is in line with evidence of a “hidden cost of reward”, when the intrinsic motivation to complete a task is reduced by monetary pay for performance (Deci, 1971). This is sometimes referred to as the “overjustification effect” (Lepper, Greene, & Nisbett, 1973) whereby intrinsic motivation to perform the task well is diminished when extrinsic rewards are introduced.
Although not the majority, some fields within psychology (e.g., decision-making research) offer financial incentives for task performance in addition to compensating participants for their time. As noted above, offers of extra incentives for performance can have different results depending on the task and the underlying motivation to participate of the person performing the task. One area of research in which the motivation to participate and thus the method of participant compensation might be critical to the results of the study is in experiments examining the effects of motivation on task performance. In the design adopted by many studies examining motivation (e.g., Adcock, Thangavel, Whitfield-Gabrieli, Knutson, & Gabrieli, 2006; Murayama & Kitagami, 2014; Murty, Tompary, Adcock, & Davachi, 2016; Shigemune et al., 2010; Spaniol, Bowen, Wegier, & Grady, 2015; Spaniol, Schain, & Bowen, 2013; Wittmann et al., 2005), in addition to compensation for time, rewards are manipulated within task and used as incentive for participants to earn additional monetary rewards based on their task performance. In these studies, the overall finding is that memory for items associated with a high reward are remembered better than items associated with a low reward. In the current study we examine whether the results on a rewarded cognitive task—a memory test—differ for a sample of participants who signed up to participate with different motivations from the outset. Specifically, participants compensated for their time with partial course credit were compared to those who signed up to participate for cash. Given the mixed findings in the prior work cited above regarding how the motivation of the participant and motivation for good task performance can interact and the increase in psychology studies examining reward effects on task performance, particularly in cognitive psychology, this is an essential research question.
We are unaware of any studies that have explicitly tested whether the results of a study differ depending on the motivation of the participant to participate (i.e., type of compensation offered) when the research questions being tested pertains to motivation. As noted above, one study (Brase, 2009) found that offering a flat-fee for compensation in addition to performance based rewards resulted in better performance, however in that study there was no group who received credit in addition to a performance based bonus. An additional study compared participants who were compensated with cash to those compensated with credit on a working memory task with reward cues (Zedelius, Veling, & Aarts, 2013). The results suggested that high value compared to low value cues decreased working memory performance for those participating for cash but not credit; those participating for credit performed similarly when there was a high or low value cue in the environment. In that study, the high and low value cues were not tied to performance-based rewards, leaving open the question of whether such a difference in performance between the cash versus credit groups would exist when the value cues were linked to rewards participants could earn for their task performance. The present study addressed this open question.
In the first experiment reported here, participants were recruited to participate in a memory study and at the time of recruitment they chose whether they wanted to be compensated with cash (cash group) or credit (credit group) for their time. Both the cash and credit groups were given the same monetary rewards for their performance on the task, regardless of their motivation to participate and how they chose to be compensated for their time. Given the results above (Brase, 2009) we hypothesize that those compensated with cash will be more motivated by the additional pay on the task compared to those compensated for their time with credit. In line with the overjustification effect, those who choose to be compensated with for their time with credit, may be more intrinsically motivated to participate thus extra performance based incentives may interfere with their willingness or ability to perform the task well. We chose to examine this within the context of a rewarded memory task where better memory for high versus low reward (e.g., Adcock et al., 2006; Murty, LaBar, Hamilton, & Adcock, 2011; Murty et al., 2016) usually only after a period of consolidation (e.g., Murayama & Kitagami, 2014; Murayama & Kuhbandner, 2011; Spaniol et al., 2013) is a robust finding, but the motivation and compensation for completing the study is rarely reported nor included as a factor in the results.
The study was approved by the Institutional Review Board at Boston College and all participants gave informed consent. Participants were compensated for participation with either partial course credit (1 credit/hour) or cash payment of $10/hour. All participants (both cash and credit) were recruited through the Boston College online undergraduate participant pool. Over the Fall 2015 and Spring 2016 semesters, the two different groups of participants were recruited using separate but identical (other than method of compensation) advertisements; one indicated that participants would be compensated for their time with cash and another indicated that participants would be compensated with partial course credit. No one was permitted to sign up or participate for both cash and credit. In addition, all participants had the opportunity to earn monetary rewards for their performance on the memory task, but participants were not aware of this until they arrived for the study to ensure that these performance-based rewards were not incentive for participation. To reduce confounds associated with neurological and psychiatric disorders, we tested only a healthy sample of younger adults, and all interested participants completed a medical screening questionnaire to assess past and current medical conditions such as psychological disorders, head trauma or injuries, and medications that could affect central nervous system function. To minimize possible sample differences in the two conditions, any individual who had sustained a head injury resulting in loss of consciousness or any individual who reported a current or prior neuropsychological or psychiatric diagnosis (e.g., epilepsy, depression), or current medication that could affect central nervous system function was not scheduled for an appointment. Four participants were excluded from analyses, two due to experimenter error and two because of high scores (greater than 13) on the Beck Depression Inventory indicating the participants were currently experiencing symptoms of mild depression, leaving 22 eligible participants in the credit and 23 in the cash group. See Table 1 and the results section for participant characteristics.
|Characteristic||Cash (Expt 1) (N = 23)||Credit (Expt 1) (N = 22)||Extra Credit (Expt 2) (N = 23)|
|Age (years)||20.04 (1.74)||19.00 (.93)||19.04 (1.15)|
|Education (years)||14.13 (1.49)||13.09 (1.19)||13.04 (1.33)|
|Shipley||32.74 (3.64)||30.18 (3.02)||31.48 (3.07)|
|BIS||25.96 (3.43)||25.41 (4.23)||27.70 (2.44)|
|BAS-Drive||14.96 (1.52)||16.41 (2.28)||14.43 (2.63)|
|BAS-Reward Responsivity||23.04 (2.06)||22.64 (1.62)||22.30 (2.80)|
|BAS-Fun Seeking||15.83 (2.08)||16.69 (1.84)||14.57 (2.62)|
|TRAIT||36.83 (9.54)||38.91 (10.06)||38.43 (10.82)|
|STAI-1||31.65 (6.49)||36.32 (13.34)||37.43 (9.96)|
|STAI-2||29.78 (7.39)||33.32 (11.48)||35.39 (13.18)|
|BDI||2.78 (3.10)||3.23 (2.81)||5.22 (4.06)|
|Digit Span||70.87 (10.96)||71.18 (7.72)||67.26 (11.23)|
|Earnings 1||$3.65 ($1.42)||$3.10 ($1.62)||80.39 (24.95) points|
|Earnings 2||$2.49 ($1.25)||$1.77 ($1.61)||30.50 (42.37) points|
|Total Earnings||$6.13 ($2.45)||$4.86 ($2.90)||110.91 (57.10) points|
The study design included two within-subjects factors of reward (high versus low) and delay (short [approx. 10 minutes] versus long [24-hour]) and one between-subjects factor (compensation method: cash versus course credit).
Given that this was a between subjects design a series of questionnaires were administered to provide additional information about the samples’ general cognitive abilities, current mental health status, and one questionnaire that assesses motivational drives in each of the compensation conditions. The placement of three questionnaires also provided a short delay between the encoding and same-day retrieval task.
Beck Depression Inventory (Beck, Steer, & Carbin, 1988). This inventory was used to determine whether participants were experiencing depressive symptoms. Because depressive symptoms can diminish motivation, we set this measure as an exclusionary measure prior to data collection, and any participant with a score above 13 was not included in the experimental analyses.
Spielberger State-Trait Anxiety Inventory (Spielberger, Gorsuch, Lushene & Vagg, 1970). This self-report measure indicates the intensity of feelings of anxiety; it distinguishes between state anxiety (a temporary condition experienced in specific situations) and trait anxiety (a general tendency to perceive situations as threatening).
Digit Symbol-Coding. On this subtest of the Wechsler Adult Intelligence Scale (WAIS-III; Wechsler, 1997), numbers are paired with symbols on a key and the participant has 90 seconds to go through a grid of 93 numbers and place the correct symbol above each number. This task measures visual-motor speed and complexity, motor coordination.
Shipley Institute of Living Scale: Vocabulary (Shipley, 1940). This vocabulary test consists of 40 items, providing a measure of verbal, crystallized intelligence. Participants are asked to select a synonym for a target word from a list of 4 choices.
BIS/BAS scales (Carver & White, 1994). This scale was included to assess individual differences in sensitivity to approach and avoidance motives, particularly to establish whether the two groups differ on these measures of motivation that could explain any differences in reward motivated memory. The behavioural avoidance (or inhibition) system (BIS) is said to regulate aversive motives, in which the goal is to move away from something unpleasant. The behavioural approach system (BAS) is believed to regulate appetitive motives, in which the goal is to move toward something desired. The BAS portion of the scale consists of 3 subscales: BAS drive, BAS fun seeking, and BAS reward responsiveness. The drive scale includes items relating to the persistent pursuit of desired goals. The fun seeking scale has items pertaining to a desire for new rewards and the propensity to engage in spontaneous behaviour and events in order to gain potential rewards. Reward responsiveness contains items that focus on positive responses to the occurrence or anticipation of rewards.
Stimuli & Apparatus
Stimuli consisted of 248 indoor and outdoor scenes used in a previous reward and memory study (Spaniol et al., 2013), originally from a picture database in CorelDraw. None of the images contained humans or animals. The stimuli were divided into 2 stimulus lists for encoding counterbalancing purposes. Each list contained half indoor and half outdoor scenes and assignment of either indoor or outdoor images to high-reward and low-reward status and target or distractor status was counterbalanced. Four stimulus lists at retrieval and assignment of specific stimulus sets to high-reward target status, low-reward target status, and distractor status and immediate versus delayed recognition test were counterbalanced within each participant group resulting in 8 counterbalancing conditions. The order of trials within a stimulus list was randomized for each person. E-Prime (Psychology Software Tools, Inc.) was used for stimulus presentation and response collection on a desktop computer with a 17” screen. All stimuli were presented centrally against a black background. Instructions and cues appeared in white 18-point Arial font. The responses were made using keys 1–6 on the keyboard to rate the confidence in their memory from “sure new” to “sure old”.
See Figure 1 for a schematic of the paradigm. The experiment took place over two days. Session one took approximately 1 hour and session two took approximately 30 minutes. Upon arrival for session one, participants read and signed the consent form and were then given detailed instructions by the experimenter about the task, including the reward structure. Participants completed practice trials of the encoding task and were encouraged to ask any questions during this time. Practice trials were not included in the analysis.
During the encoding phase, each trial began with a cue indicating how much the upcoming stimulus was worth if the participant successfully remembered it on the subsequent memory test. Each stimulus was worth either $.25 (high reward) or $.01 (low reward). After the cue, participants were shown the stimulus and asked to make a judgment about whether it depicted an indoor or outdoor scene.
After encoding, participants were asked to complete some paper-pencil questionnaires detailed in the section above, to create a short, filled delay interval. After completion of the questionnaires (approximately 10 minutes) participants practiced the retrieval task and were then given the experimental version of the task. Half of the studied items were tested during session 1 (intermixed with an equal number of new items) and the other half of the studied items were tested the next day during session 2 (intermixed with new items that had not been seen during session 1). During retrieval, participants made a sure-new to sure-old judgment using numbers 1-6 on the keyboard. To streamline the results, analyses collapse across confidence (i.e., responses 1, 2 and 3 = new and responses 4, 5 and 6 = old); the same patterns of results held even when only high confidence responses were analyzed and trials with low confidence (i.e., responses 3 and 4) were removed. Both target “old” stimuli and distractor “new” stimuli were presented in random order during the retrieval phase. For each correct “old” response (regardless of confidence), participants earned either the $.25 or $.01 that the stimulus was previously cued with during encoding. To deter participants from simply pressing “old” to every stimulus, they were penalized $0.13 for false alarms. Participants were paid out in cash the full amount they earned on the memory test after both the immediate and the delayed test. Participants did not receive feedback on their performance until the end of each retrieval phase.
To examine the question of whether the pattern of results is different depending on the motivation for participation, we present three sets of results. First we present analyses to reveal the pattern of results with all participants, including group (i.e., participants recruited with compensation of monetary payment or participants recruited with compensation of partial course credit) as a between subjects factor. To follow up the results of this analysis, we then examine the results separately for the credit and cash group. Note that hit rates are reported rather than a measure of sensitivity such as d’ because, due to the experimental design, distractor items were not paired with a reward value and thus there were not different false alarm rates associated with high and low reward.
Participant Characteristics. The two participant groups differed by one year on age, t(43) = 2.50, p = .02, ƞ2 = .13 and years of education, t(43) = 2.58, p = .01, ƞ2 = .13. The two groups also differed on Shipley vocabulary test, t(43) = 2.56, p < .01, ƞ2 = .13 and the Drive subscale of the BAS, t(43) = –2.52, p < .02, ƞ2 = .13. The two groups did not differ on anxiety levels (STAI–1, STAI–2 or Trait), Digit Symbol, the BIS nor the fun-seeking or reward-responsivity subscales of the BAS, t(43) ≤ 1.50, p ≥ .14, ƞ2 = .05. See Table 1 for means of the participant characteristics.
Earnings. A 2 × 2 repeated-measures ANOVA with delay as the within-subjects factor and payment group (credit, cash) as the between-subjects factor revealed only a main effect of delay such that participants earned significantly more on day 1 (M = $3.37, SD = $1.53) than day 2 (M = $2.13, SD = $1.47), F(1, 43) = 42.70 p < .001, ηp2 = .50. There was no main effect of group nor a Delay × Group interaction, F(1, 43) ≤ 2.53, p ≥. 12.
Memory performance. A 2 × 2 × 2 repeated-measures ANOVA on hit rates with reward (high, low) and study-test delay (day 1, day 2) as within-subjects factors and payment group (credit, cash) as a between-subjects factor revealed a main effect of reward F(1,43) = 10.15, p = .003, ηp2 = .19, and a main effect of delay, F(1,43) = 67.20, p < .001, ηp2 = .61. The Reward × Delay interaction was not significant F(1, 43) = 2.59, p = .115, ηp2 = .06, nor the main effect of Group, Delay × Group, or Reward × Delay × Group interaction F(1, 43) ≤ .12, p ≥ .73, but critically, there was a significant Reward × Group interaction, F(1, 43) = 7.02, p = .01, ηp2 = .14, see Figure 2. Participants in the cash group had significantly higher hit rates for high reward items (M = .61, SE = .03) compared to low reward items (M = .52, SE = .03), t(22) = 4.23, p < .001, ƞ2 = .45 while this comparison between high reward (M = .56, SE = .03) and low reward items (M = .55, SE = .03) was not significantly different for the credit group, t(21) = .37, p = .72. Probing this interaction in a different way, an independent samples t-test revealed that the two groups were not statistically different in high reward hit rate, t(43) = 1.23, p = .23, ƞ2 = .03 nor low reward hit rate t(43) = .69, p = .50, ƞ2 = .01, but the patterns diverged in opposite ways in the two conditions, leading to the significant interaction depicted in Figure 2. Finally, examining the pattern in individual subjects, there were 14/23 participants (60%) in the cash group and only 8/22 (36%) in the credit group who showed more than a 5% memory benefit of high > low reward items. The raw data for each individual participant are available on Figshare (https://figshare.com/s/f35abd07dec68fa1a2b3).
Because the cash and credit groups differed on some participant characteristics, analyses were re-run with age, education, Shipley score and the drive subscale of the BAS as covariates. Even with these covariates entered into the between-subjects ANCOVA, the Reward × Group interaction remained significant, F(1, 39) = 6.92, p = .01, ηp2 = .15, and no other significant effects emerged, F(1, 39) ≤ 2.55, p ≥ .12, ηp2 ≤ .06. For false alarms, an ANOVA revealed no main effect of delay, no main effect of group, nor a significant Delay × Group interaction, F(1, 43) ≤ 1.78, p ≥ .19, ηp2 = .04. This pattern was the same when covariates of age, education, and test scores were entered.
Figure 3 illustrates what proportion of high reward, low reward and new items were given each of the six responses at each retrieval delay separately for the cash (top graph) and credit group (bottom graph). The proportion of each response type for target (i.e., “old”) items was entered into separate 2 × 2 × 2 repeated-measures ANOVAs with points (high, low) and study-test delay (day 1, day 2) as within-subjects factors and payment group (credit, cash) as a between-subjects factor. “Sure New” responses given to target items, revealed a significant main effect of delay, F(1, 43) = 4.75, p = .04, ηp2 = .10, such that participants used this response more on day 2 (M = .12, SE = .02) compared to day 1 (M = .09, SE = .01). For “Probably New” responses, there was a significant main effect of delay, F(1, 43) = 6.93, p = .01, ηp2 = .14, such that participants used this response more on day 2 (M = .17, SE = .02) than day 1 (M = .13, SE = .02), a significant main effect of reward, F(1, 43) = 7.32, p = .01, ηp2 = .15, qualified by a marginally significant Reward × Group interaction, F(,1, 43) = 3.83, p = .06, ηp2 = .08. Participants in the cash group used the “Probably New” response more often for low reward items (M = .18, SE = .01) compared to high reward items (M = .13, SE = .02), t(22) = 2.97, p = .01, ƞ2 = .29, but the credit group did not modulate their use of this response based on reward value, t(21) = 6.2, p = .54, ƞ2 = .02 (high: M = .14, SE = .02; low: M = .15, SE = .03). For “Maybe New” and “Maybe Old” there were significant effects of delay, F(1, 43) ≥ 19.10, p ≤ .001, ηp2 ≥ .31 such that participant used these responses more on day 2 (MMaybeNew = .23, SEMaybeNew = .03; MMaybeOld = .18, SEMaybeOld = .02) than day 1 (MMaybeNew = .15, SEMaybeNew = .02; MMaybeOld = .12, SEMaybeOld = .01). Further, there was a Reward × Group interaction for “Maybe Old” responses, F(1, 43) = 4.31, p = .04, ηp2 = .09, such that the cash group did used this response slightly more, t(22) = 1.94, p = .07, ƞ2 = .15, for high reward items (M = .17, SE = .02), compared to low reward items (M = .15, SE = .02), but again the credit group did not modulate the use of this response based on reward, t(21) = 1.08, p =.29, ƞ2 = .13 (high: M = .14, SE = .02; low: M = .15, SE = .02). No significant effects emerged on the proportion of responses for “Probably Old”, F(1, 43) ≤ 3. 16, p ≥ .08, ηp2 ≤ .02. The “Sure Old” response type revealed a significant main effect of delay, F(1, 43) = 3.42, p = .07, ηp2 = .07, a main effect of reward, F(1, 43) = 12. 45, p = .001, ηp2 = .23, qualified by a marginally significant Reward × Delay interaction F(1, 43) = 3.42, p = .07, ηp2 = .07, such that participants used this response more for high reward items (M = .43, SE = .03) compared to low reward items (M = .37, SE = .03), t(44)= 3.46, p = .01, ƞ2 = .21 on day 1, but did not modulate their use of this response based on high (M = .22, SE = .03) or low (M = .20, SE = .03) reward on day 2, t(44) = 1.41, p = .17, ƞ2 = .04. There were no significant main effects of the between subjects factor group on any of the response types, F(1, 43) ≤ 3.23, p ≥ .08, ηp2 ≤ .07.
We chose to follow up the significant findings from the between-subjects ANOVA and particularly the Group × Reward interaction with separate ANOVAs for both the credit and the cash group.
Earnings. Participants earned significantly more on the task on day 1 (M = $3.10, SD = $1.62) than on day 2 (M = $1.77, SD = $1.62), t(21) = 4.31, p < .001, ƞ2 = .47.
Memory Performance. The overall hit rate across conditions was .55 (SD = .13). A 2 × 2 repeated-measures analysis of variance (ANOVA) with reward (high, low) and study-test delay (day 1, day 2) as within-subjects factors revealed only a main effect of delay, F(1, 21) = 25.49, p < .001, ηp2 = .55 where hit rates were higher on day 1 (M = .62, SE =.03) than on day 2 (M = .48, SE = .03). Neither the main effect of reward nor the Reward × Delay interaction were significant, F(1,21) ≤ .68, p ≥ .42, ηp2 ≤ .03.
The overall false alarm rate was .24 (SD = .11). There was no effect of delay on false alarms, t(21) = 1.21, p = .24, ƞ2 = .07.
Earnings. Participants earned significantly more on day 1 (M = $3.65, SD = $1.42) than on day 2 (M = $2.49, SD = $1.25), t(22) = 5.10, p < .001, ƞ2 = .54.
Memory Performance. The overall hit rate across conditions was .56 (SD = .13). A 2 × 2 repeated-measures ANOVA with reward (high, low) and study-test delay (day 1, day 2) as within-subjects factors revealed a main effect of reward F(1, 22) = 17.89, p < .001, ηp2 = .45, such that hit rates were higher for high reward items (M =.61, SE =.03) compared to low reward items (M = .52, SE = .03). There was also a main effect of delay, F(1, 22) = 45.24, p < .001, ηp2 = .67, with higher hit rates on day 1 (M = .64, SE =.03) than on day 2 (M = .49, SE = .03). The Reward × Delay interaction was not significant, F(1, 22) = 2.70, p = .12, ηp2 = .11.
The overall false alarm rate was .21 (SD = .09). There was no main effect of delay on false alarm rate, t(22) = .73, p = .47, ƞ2 = .02.
A robust finding in the motivation literature is that participants remember high reward items better than low reward (e.g., Adcock et al., 2006). The goal of Experiment 1 was to determine if the type of compensation chosen by participants relates to the results on a rewarded memory task. In line with previous work and our hypothesis, we show that those who are given cash for participation perform differently than those who are given credit (Brase, 2009; Brase et al., 2006; Zedelius et al., 2013). Participants who were recruited and compensated with cash were sensitive to the reward manipulations on the task and seemingly modulated their behavior in order to maximize memory for high reward compared to low reward items. Those who were recruited and compensated for their time with credit (but still given cash for performance) were not sensitive to these manipulations, as their memory performance did not differ for high and low reward items. At the individual subject level, 60% of participants in the cash group showed a memory benefit of at least 5% for high reward compared to low reward items, but only 36% of credit participants exhibited this benefit, although base rates are low so these percentages should be critically considered. Although the two groups’ memory was not different for the two reward conditions, they are pulling apart from one another in different directions, which led to the significant Reward × Group interaction. In other words, the cash group’s memory was numerically higher than credit group’s for high reward items, but numerically poorer than the credit group for low reward items. These particular findings suggest that the credit group was not simply doing the task poorly overall. Instead, these results indicate that the cash group is doing something different to preferentially remember the high compared to low reward items, but the credit group is not doing this.
Examination of memory confidence indicated that overall, both groups became less confident in their memory after a delay, but also in line with memory results, at some levels of confidence—“Probably New” and “Maybe Old”—participants in the cash group were more confident in their memory for high reward compared to low reward items, but the credit group did not modulate their use of the confidence responses based on reward value. Both groups used the “Sure Old” response more for high reward compared to low reward items on day 1, but not on day 2.
It is likely that only those in the cash group were motivated by the performance-based rewards, while those participating for credit may not have been motivated to modulate their effort for high compared to low reward items. These results are not exactly in line with predications of the overjustification effect or findings from Kvaløy and colleagues (2015), as performance in the credit group was not impaired; instead this group of participants were perhaps more motivated to perform well overall, given that this was a class requirement, rather than modulate their behavior for these extra performance based monetary rewards.
Because this is a between-subject design, there is the possibility that it was group differences in those who chose to be compensated by cash vs. credit, rather than any direct effect of compensation method, that yielded these results. However, we did our best to minimize such group differences by recruiting from a homogeneous population (all students at Boston College with a similar health history) and by measuring personality and cognitive factors that we thought could influence performance. The cash group did score significantly higher on a measure of vocabulary, but controlling for these group differences did not affect the pattern of results. The credit group scored higher on the drive subscale of the BIS/BAS; this drive scale is made of items pertaining to the persistent pursuit of desired goals. If the credit group was interested in earning additional monetary rewards, this questionnaire suggests they would go after this desired goal more than the cash group. Given that they did not modulate their performance based on reward value, but performed well overall, it seems this is more evidence that the credit group was not motivated to the same extent as the cash group by the monetary rewards but was motivated to perform the task to the best of their ability regardless of monetary outcome.
It is possible that those who participated for credit deemed the amount they could possibly earn on the task as minimal and not worth the effort, although this seems unlikely for two reasons. First, participants in the credit group did not underperform compared to the cash group, overall memory performance was not significantly different between the two groups; rather, sensitivity to the high and low reward manipulations is what differed between the two groups. Second, it is curious why the credit group would not be interested in earning this potential $15.60 bonus since they were there doing the task anyway, nonetheless their results suggest they were not motivated by these reward manipulations. It is also possible that those who participated for cash were doing so because they “needed” the money (Zedelius et al., 2013) such that their motivation to earn more money was high; those who participated for credit came in for a different reason (i.e., course grade) and so they may have had little motivation to earn extra cash rewards, but were perhaps motivated to simply do the task well. As our goal was to induce motivation with high and low rewards, some of the participants were asked whether they were in fact motivated by the rewards in the study. Anecdotally, of the 16 asked in the cash group, 8 said yes (50%) whereas of the 18 asked in the credit group, only 6 (33%) said yes, adding to our conclusions that the motivation to participate needs to be considered when recruiting participants and deciding on in-task motivation manipulations.
Related to this previous point is the idea of congruency. Indeed, in the memory literature, there is plenty of evidence of “mood congruency” effects, such that memory is better if there is a match between the affective state of the individual and affective valence of the stimuli to be encoded or memory to be retrieved (e.g., Mayer, McCormick, & Strong, 1995). Perhaps, when one is motivated to participate in order to obtain monetary payment one is more sensitive to the monetary manipulations of the task. However, when one is motivated by partial course credit and views the experiment as an educational experience (perhaps intrinsically motivated) this may be at odds with the reward manipulations of the task (Kvaløy et al., 2015). If participants who came in for credit had the option to earn additional credit for performance, this might create a congruent motivational mindset that leads to differential effects on task performance. This hypothesis is tested in Experiment 2.
In the previous experiment, the compensation for participation and the incentives for good performance on the task were congruent for the cash group, but incongruent for the credit group. We wanted to establish if participants who participated for credit would modulate their performance if their incentive for good task performance was congruent with their motivation for participating. In this study, participants were recruited and compensated for their time with partial course credit and were given the opportunity to earn bonus credit based on their performance on the task. We kept as many aspects of the task the same as Experiment 1 as possible, with a few exceptions due to departmental rules regarding research participation credit. In line with work showing mood congruency effects on memory in the emotional memory literature, our hypothesis was that introducing partial course credit as incentive for task performance, would lead participants to modulate their performance based on the value of the stimuli because there would be congruency between the motivation to participate and motivation to perform well on the task.
The study was approved by the Institutional Review Board at Boston College and all participants gave informed consent. Participants were compensated for their time with partial course credit and were recruited through the Boston College online undergraduate participant pool over the Fall 2016 semester. No one was permitted to participate if they had completed Experiment 1. All interested participants completed the same medical screening and met the same inclusion criteria as described in the methods section of Experiment 1. In addition to receiving credit for their time (1.5 credits), all participants had the opportunity to earn a bonus 0.5 credit based on their performance on the memory task. Participants were not aware of this bonus credit until they arrived for the study to ensure that the performance-based reward was not incentive for participation. Thirty-six participants were tested, two were excluded for failing to complete part 2 of the study, one for a high depression score at the time of testing (BDI = 19), and one for disclosing an exclusionary diagnosis after enrolling in the study. Of the remaining 32 participants, 9 were excluded because they did not need the extra course credit offered as a bonus for task performance (see Procedure).
The study design included two within-subjects factors of points (high versus low) and delay (short [approx. 10 minutes] versus 24-hour).
The same series of questionnaires completed in Experiment 1 (see methods section above) were administered.
Stimuli & Apparatus
Same as reported in Experiment 1 above.
Many aspects of the research procedure were identical to Experiment 1, with the major exception of task performance motivation. An additional change to the procedure, participants were told that they could earn points for correctly recognizing images on the memory test, and if their points balance was higher than average—meaning higher than the average points balance of participants who had completed the experiment so far—they would be able to complete another short task to earn a bonus 0.5 credit. They were told they would be informed of whether they qualify for this bonus at the end of the experiment on day 2; participants were unaware that—in keeping with departmental regulations about course credit assignment—all participants in fact “qualified” for the opportunity to complete the additional task for bonus credit. During the encoding phase, each trial began with a cue indicating how many points the upcoming stimulus was worth if the participant successfully remembered it on the subsequent memory test. Each stimulus was worth either 5 points (high points) or 1 point (low points). After the cue, participants were shown the stimulus and asked to make a judgment about whether it depicted an indoor or outdoor scene.
After encoding, participants were asked to complete the questionnaires and then practice the retrieval task. Responses on the retrieval task were the same as Experiment 1 (i.e., sure-new to sure-old judgments). Both target “old” stimuli and distractor “new” stimuli were presented in random order during the retrieval phase. For each correct “old” response (regardless of confidence), participants earned either 1 or 5 points that the stimulus was previously cued with during encoding. To deter participants from simply pressing “old” to every stimulus, they were penalized 2.75 points for false alarms. At the end of the experimental version of the retrieval task, participants were told their points balance. At the end of the memory test on day two, a screen appeared which indicated their points balance and that they had qualified for the bonus credit. Participants were asked to verbally confirm whether or not they needed the bonus credit and, if they did, participants then completed a simple online reaction time task (~3 minutes). Only participants who indicated that they needed the bonus credit were included in the analyses (N = 9 were excluded).
For the same reasons as Experiment 1, only hit rates rather than measures of sensitivity are reported. To streamline the results, analyses were collapsed across confidence (i.e., responses 1, 2 and 3 = new and responses 4, 5 and 6 = old); the same patterns of results held even when only high confidence responses were analyzed and trials with low confidence (i.e., responses 3 and 4) were removed.
Total Points Earned
Participants earned significantly more points on day 1 (M = 80.40, SD = 24.96) than on day 2 (M = 30.52, SD = 42.37), t(22) = 6.03, p < .001, ƞ2 = .63.
Overall hit rate was .54 (SE = .03). Hit rates were submitted to a 2 × 2 repeated-measures ANOVA with reward points (high, low) and study-test delay (day 1, day 2) as within-subjects factors. This ANOVA revealed only a main effect of delay, F(1, 22) = 66.15, p < .001, ηp2 = .75 where hit rates were higher on day 1 (M = .62, SE =.03) than on day 2 (M = .47, SE = .03). Neither the main effect of reward nor the Reward × Delay interaction were significant, F(1,22) ≤ 1.04, p ≥ .32, ηp2 ≤ .05.
Overall false alarm rate was .21 (SE = .02) and did not differ across the two delays, t(22) = 1.31, p = .20, ƞ2 = .08.
Figure 4 illustrates what proportion of high-point, low-point and new items were given each of the six responses at each retrieval delay. The proportion of each response for target (i.e., “old”) items was entered into separate 2 × 2 repeated-measures ANOVA with points (high, low) and study-test delay (day 1, day 2) as within-subjects factors. There were significant main effects of delay, such that participants were more likely to use responses “Sure New”, F(1, 22) = 21.95, p < .001, ηp2 < .50, more on day 2 (M = .24, SE = .03) than day 1 (M = .15, SE = .03), “Maybe New”, F (1, 22) = 8.5, p < .01, ηp2 < .28, more on day 2 (M = .18, SE = .03) than day 1 (M = .13, SE = .02), and “Maybe Old”, F(1,22) = 8.36, p = .01, ηp2 < .28 more on day 2 (M = .15, SE = .03) than day 1 (M = .10, SE = .02), but more likely to use “Sure Old” on day 1 (M = .45, SE = .03) compared to day 2 (M = .25, SE = .03), F(1, 22) = 80.95, p < .001, ηp2 ≤ .79. There was one main effect of reward, F(1, 22) 6.62, p < .02, ηp2 < .23, such that participants were more likely to use “Maybe Old” for low (M = .17, SE = .03) compared to high point (M = .14, SE = .03) items. There were no significant Reward × Delay interactions, F(1, 22) ≤ 2.28, p ≥ .15, ηp2 ≤ .09.
Comparison of three groups
Participant Characteristics. To examine whether the group participating for extra credit differed from the participants in Experiment 1, characteristics from all three groups were entered into a one-way ANOVA. In addition to the differences described in Experiment 1, this analysis revealed that the cash group was on average 1 year older, t(44) = 2.30, p = .03, ƞ2 = .11, and had 1 more year of education, t(44) = 2.61, p = .01, ƞ2 = .13, than the extra credit group. The extra credit group scored slightly lower than the credit group from Experiment 1 on the BAS-drive subscale, t(43) = 2.69, p = .01, ƞ2 = .24 and on the BAS-fun seeking subscale, t(43) = 3.12, p = .003, ƞ2 = .30. See Table 1 for means of the participant characteristics.
Memory Performance. To examine whether the group participating for extra credit differed from the participants in Experiment 1 in overall memory performance, hit rates and false alarm rates from all three groups were entered into repeated measures ANOVAs. The main effect of group was not significant for the ANOVA on hit nor false alarm rates, F(1, 65) ≤ .63, p ≥ .53, ηp2 ≤ .02. No other effects with group emerged except the significant Reward × Group interaction on hit rate. This was driven by the pattern of results reported for Experiment 1 whereby participants in the cash group had a significantly higher hit rate for high reward items compared to low reward items, but this pattern was not true of either credit group.
Memory performance collapsed across credit groups
As an exploratory analysis, we combined the credit group from experiment 1 and extra credit group from experiment 2, to examine whether the expected effects of higher hit rates for high versus low reward items would emerge. Even with more power afforded by this larger sample size (N = 45), there were no effects of reward on memory, F(1, 44) = 0.98, p =.33, ηp2 = .02, with the hit rate for high reward items (M = .55, SE = .02) not statistically different from low reward items (M = .54, SE = .02). There was an expected main effect of delay, F(1, 44) = 80.69, p < .001, ηp2 = .65, such that hit rates were higher on day 1 (M = .62, SE = .02) than day 2 (M = .48, SE = .02) but no Reward × Delay interaction, F (1, 44) = 0.81, p = .37, ηp2 =.02.
The goal of Experiment 2 was to determine if the congruency between motivation to participate and task performance motivation led to a modulation in memory performance that could account for the lack of a reward effect in the credit group of Experiment 1. The results indicated that again in Experiment 2, credit-compensated participants did not modulate their performance in the expected way, and that memory for high-value items was not significantly different from memory for low-value items. The findings suggest that the congruency of the motivation may not be the mechanism driving differences in memory performance between the two groups in Experiment 1. The extra credit group tested here performed in the same way as the credit group from Experiment 1, and not like the cash group. The extra credit group analyses were restricted to those who needed the bonus credit, and in a number of cases, it was the only remaining credit that students needed to fulfill their quota to receive full research participation credits; thus, their motivation to perform well on the task and to earn that bonus credit should have been high. Like the credit group in Experiment 1, the participants in Experiment 2 were not simply underperforming, as overall hit and false alarm rates are similar across the three groups. Instead, the results suggest that the motivation of the participants to participate is simply leading to differences in how participants perform this cognitive task and congruency in motivation for participation and task-based rewards cannot explain these differences. Further, these findings highlight our overall point that in studies of reward motivation, the motivation of the participants in addition to the type of in-task incentive needs to be considered.
A notable difference between the paradigms in Experiment 1 and 2 was that Experiment 1 participants earned monetary rewards which have no satiation limit, whereas in Experiment 2, participants earned points toward a “potential” and finite amount of bonus credit. This difference was due to departmental restrictions of offering course credit bonuses. Even despite this and other paradigm changes, the pattern of results from the extra credit group look very similar to the pattern of results from the credit group in Experiment 1. This further strengthens the idea that the motivation to participate can have a strong and robust effect on results, perhaps above and beyond task manipulations.
Finally, it may not be appropriate to compare the effects of cash rewards offered in Experiment 1 to the points offered in Experiment 2. It is possible that cash and points do not hold the same motivational weight and therefore have different effects on cognitive performance. Given that the results from our credit group in Experiment 1 mirrored the results from the extra credit group in Experiment 2 this seems unlikely. Additionally, there is substantial evidence that earning points, even points that do not lead to any tangible reward, can influence memory encoding strategies to maximize point accumulation. For example, participants modulate their encoding behavior and remember items associated with high points values better than items with low points values (e.g., Castel et al., 2011; Cohen, Rissman, Suthana, Castel, & Knowlton, 2014). So although they may not be equivalent, evidence suggests rewards (regardless of currency) have similar effects on memory.
Understanding how compensation relates to performance on laboratory tasks is important for behavioral science research, and our work adds to this small psychology literature. The results suggest that researchers examining the effects of motivation on performance (e.g., high versus low reward effects on memory) should take into consideration the motivation of the participants to participate. We extend previous findings by showing that even when both groups are given the opportunity to earn additional performance-based monetary rewards, the differences in motivation to participate in research (cash or credit compensation) leads to different patterns of results. The findings were not in line with the “overjustification effect”, the idea that extrinsic rewards can undermine any intrinsic motivation to perform the task well and affect task performance in a negative way. The participants in the two credit groups did not appear to be unmotivated to perform the task; the overall memory performance was similar across the three groups. Rather, the participants who were motivated by cash performed the rewarded memory task differently than the other two groups, remembering the high-reward images more exclusively. The findings from Experiment 2 suggest that even when there is a congruency between motivation to participate and performance rewards, there still is not modulation of memory by reward in the credit-compensation group. It is possible that credit-compensated participants are motivated to simply perform the task to the best of their ability, regardless of in-task motivation manipulations, as this is considered to be part of their educational experience. Although future work will be needed to clarify the motivational mechanisms giving rise to these group differences, at a practical level, the findings underscore the methodological importance for psychology researchers to report how participants were recruited and compensated for their time and, at least in tasks of motivated memory, to consider this factor in analyses.
The memory task used in this study is frequently utilized in studies examining effects of motivation on memory, but in many psychology experiments, participants receive money or course credit for participation, not based on their performance on the task. Typically, participants receive money or course credit simply for showing up for the experiment, regardless of whether they do the task properly or do the task well. It is entirely possible that performance would not differ between cash and credit compensation groups when performance-based rewards are not part of the task.
There are a few limitations of the current study that provide a starting point for future work on the topic of participant motivation. First, participants were not randomly assigned to the cash or credit groups thus we cannot make causal inferences about why group differences exist. Our design does not rule out that there are group differences in who chooses these compensation types (although we tried to measure and control for differences that occurred to us as potential sources of group differences and included these as covariates). It is quite plausible that individuals who chose credit compensation represented a subset of students who—due to factors we did not measure—were less likely to show effects of reward on memory. In line with our goal regarding methodological concerns, our design does have high external validity. The way that we recruited and had participants sign up for studies is typical in psychology research, where an ad is posted which indicates participants can complete the study for either cash or credit (participant’s choice). It is not possible to randomly assign participants to participate for course credit if they are not enrolled in a course that includes this requirement. Additionally, a factor that was not controlled was the time during the semester that participants in the credit group came to participate. Participants were recruited during the Fall 2015 and Spring 2016 semesters for Experiment 1 and Fall 2016 for Experiment 2. Within the credit groups, motivations to participate and perform the task well might vary over the course of the semester, even on a weekly basis (e.g., motivation to earn the credits may be higher after receiving a poor grade on a midterm examination) and additionally, the likelihood that participants need the extra credit may be more likely at the beginning and end of the semester (Bender, 2007). This is not a problem specific to credit motivation, however; it is also possible that motivations of the cash group could change over the course of the semester-for example, when funds from summer earnings or student loans begin to dwindle. Future research should examine the influence and possible interactive effects of this variable and the “need” for money more explicitly. Second, the higher hit rates for high versus low reward items is a robust finding in the literature and it was unexpected that the type of compensation participants chose to receive would interact with this pattern, specifically that those receiving partial course credit not show this pattern. While we were able to replicate the pattern of results from the credit group in experiment 1 with the extra credit group in Experiment 2, and an analysis collapsed across these two credit groups provided further support for a null effect, we acknowledge that power might be limited with these sample sizes and future work with larger samples would be beneficial for replication. Finally, although we found a null effect of reward-modulated memory in participants who enrolled in the study for course credit across two experiments, there are many factors that may influence the effectiveness of reward incentives (e.g., magnitude of difference between high and low value, amount of the penalty for false alarms), and we could not test all of the possible variants. Our goal is not to claim that there is no effect of reward in a credit-compensated group; rather, that to enable better replication and extension of studies of rewarded memory, the field will do well to elucidate whether there are conditions under which participants who enroll in a study for course credit show robust effects of reward incentives on memory, and to encourage researchers to report, and consider in analyses, the compensation means used to recruit their participants. Doing so will help to inform theories about the factors that may influence such reward-modulated memory.
Only those who were recruited and compensated for their time with cash appeared to be motivated by the performance-based rewards; only this cash-compensated group modulated their memory performance based on the monetary value of the stimulus. This pattern of results was not true for participants who were recruited and compensated with partial course credit, despite being offered the same performance-based rewards as the cash-compensated group (cash reward: Experiment 1) or the same type of reward for which they enrolled in the study (course credit reward: Experiment 2). The results from the current study suggest that recruiting and compensating participants with cash versus partial course credit may influence the results on a rewarded memory task. At least in some of these paradigms, participants recruited for course credit compensation may be less likely to show reward-modulated memory than participants recruited for cash compensation. These results reveal a heretofore undiscussed source of variability in studies of reward-modulated memory and underscore the importance of reporting compensation method in future studies.
Contributed to conception and design: HJB, EAK
Contributed to acquisition of data: HJB
Contributed to analysis and interpretation of data: HJB, EAK
Drafted and/or revised the article: HJB, EAK
Approved the submitted version for publication: HJB, EAK