Risk attitude elicitation using a multi-lottery choice task: Real vs. hypothetical incentives

ABSTRACT We present a bi-dimensional multi lottery choice task which can be used in order to elicit the agents' risk attitudes in financial environments. This task is implemented both with hypothetical and real monetary incentives in a between-subjects and a within-subjects experiment. We observe choices involving significantly lower risk aversion on aggregate when incentives are real. The differences grow with the stakes at play. We also obtain significant differences between hypothetical and real rewards in both utility weighting and probability weighting estimated parameters. We find that the use of hypothetical incentives in multi-lottery choice tasks for evaluating individual risk aversion can be misleading.


INTRODUCTION
In a recent survey, Harrison and Rutström (2008) affi rm that reliable laboratory methods exist to determine the individual risk aversion of a subject and that these methods could be systematically employed to ensure greater control over tests and applications of theory that depend on risk attitudes. They clearly advocate in favor of saliently motivating subjects' responses. We want to investigate at the individual level the consequences of not doing it. A broadly used test among psychologists is Zuckerman's (1978) Sensation Seeking Scale (SSS), while economists mainly use the Holt and Laury (2002) type of binary lotteries (HL). The SSS asks about different types of risks, including fi nancial risks, while HL is exclusively framed in the monetary domain. Both tests present the problem of uni-dimensionality of the risk aversion characterization of an individual. The Sabater- Grande and Georgantzís (2002) test (SGG) that we use allows us to obtain two parameters of the utility function in an effi cient way.
The role of incentives in the context of individual decision making under risk and uncertainty has been recurrently explored in the literature. Since Edwards (1953) found, as we do, an increase in the willingness to take risks when participants play for real money, there have probably been more studies comparing «hypothetical» with «real» decisions in this context than in any other area of experimental economics. However, the issue is still far from settled and many articles are published still today using either method. Our aim in this study is to analyse the existence, direction, and practical relevance of the difference between risk aversion levels inferred under hypothetical and real incentives.
The general consensus among psychologists seems to be that hypothetical risky choices give a reasonable, qualitatively correct picture of real choices. Wärneryd (1996) supports their use in survey contexts. Wiseman and Levin (1996) carry out three experiments in which subjects make risky decisions under conditions of hypothetical or real consequences, fi nding no signifi cant differences in any of them. Beattie and Loomes (1997) suggest that in simple pairwise choices, incentives appear to make very little difference with regard to performance. Also many economists, maybe infl uenced by the psychologists' experimental tradition as suggested by Harrison and Rutström (2008), do not always motivate the subjects monetarily when asking about their risk preferences. For instance, Kuhberger et al. (2002) fi nd that the change from small incentives (hypothetical payoffs, real low payoffs) to high incentives (real high payoffs) leads to a difference in choices, but on the other hand, the same choices are made with real high payoffs than with hypothetical high payoffs. Dohmen et al. (2005) fi nd that the answers to a general risk attitude question predict actual behavior in a lottery quite well. Also Faff et al. (2008) fi nd no signifi cant differences between using hypothetical or real payoffs when comparing fi nancial risk tolerance with risk aversion.
However, the standard experimental economics methodology (Smith, 1982) advocates for salient economic rewards when designing an experiment and many studies report different results with hypothetical and real incentives. It is assumed that if subjects do not consider hypothetical gains seriously, they may be tempted to take more risks (or be less risk averse) than when they are really likely to win. Camerer and Hogarth (1999) review 74 studies comparing behavior of experimental subjects who were not paid, or were paid low or high fi nancial incentives according to their performance. They conclude, contrary to us, that when incentives are low, subjects declare that they would be more risk-loving than they actually are when incentives are increased. Also Etchart-Vincent and l'Haridon (2008) fi nd that subjects exhibit more risk seeking when choices are hypothetical than real. Holt and Laury (2002 fi nd that increasing the size of real payoffs leads subjects to behave in a more risk averse manner both in the gain and the loss domain, while with hypothetical payments, more than half of the subjects who are risk averse for gains turn out to be risk seeking for losses.
Our results are in line with the studies in the literature which claim a difference between hypothetical and real payments. However, in contrast with most previous studies, we observe choices which are, on average, less risk averse when payments are real.
A within-subject design is more reliable than a between-subject design but it presents the potential bias of a carryover effect across sessions, which is very diffi cult to control even taking into account order effects, since, once the subject has been incentivized to think seriously about his risk preference, he will probably remember and try to be consistent with his decision even if asked again hypothetically. So we opted for using a between subjects and a within subjects design and cross-check in this way for the robustness of our results, with the advantage of having relatively many data available under both conditions from the experiments that we have carried out. In fact, a total of 786 subjects participated in our lotteries and 402 of them received real rewards for their decisions. No other study comparing hypothetical with real incentives in risk aversion elicitation has a comparable sample size.
Our results clearly advocate in favor of saliently motivating the answers of the riskaversion test and the elicited level of risk aversion signifi cantly decreases with respect to the case of no payment.
In the next section we explain in detail the experimental design. Then, in Section 3, we present the results. Conclusions and references follow.

EXPERIMENTAL DESIGN AND ECONOMETRIC MODEL
We organized two treatments. In the between subjects treatment (BST) our subjects were a relatively large sample of 695 subjects who were volunteers recruited among the undergraduate students of Business Administration from University Jaume I in Spain. From them, 384 subjects received no money for the lottery decision task and 311 subjects faced the real monetary consequences of the lottery that they had chosen.
In the within subjects treatment (WST) 91 Business Administration students also voluntarily recruited from the same university, who did not participate in the previous treatment, were presented the same lottery decisions as in the BST but they had to ARTÍCULOS DOCTRINALES Risk attitude elicitation using a multi-lottery choice task REVISTA ESPAÑOLA DE FINANCIACIÓN Y CONTABILIDAD. Vol. XL, n.º 152 · octubre-diciembre 2011 face both conditions: fi rst by taking hypothetical decisions and, about one year later, repeating the test under real payment for the lotteries. The temporal stability of estimates of risk aversion has been studied in detail by Harrison et al. (2005) and Andersen et al. (2008). Their results show evidence of stability for appropriately built risk aversion measures for periods up to one and a half year, if the personal socioeconomic conditions of the subjects are not importantly changed. Baucells and Villasís (2010) fi nd some evidence of individual changes in a three month period, but they did not pay their subjects, which as we show in the present article, can lead per se to inconsistent decisions.
Apart from comparing with the between-subjects design in order to check for consistency, in the within-subjects treatment we also introduced a long time span in order to minimize possible carryover effects in the latter treatment. Average earnings in the case of real payments were 6€, the lotteries were explained and completed in about 10 minutes.
The experiments in which the lotteries were played involved no show up fee and no randomized payment. Therefore, our results are not very dependent on the possible sample selection biases pointed out by , particularly for the within subjects treatment, where the distribution of subjects' risk aversion levels is exactly the same.
Rather than usual tests based on binary choice tasks à la Holt and Laury, subjects were presented with the multi-lottery choice task, SGG, which is more appropriate for our purposes, due to the variety of results it produces (1) . The task is designed to capture effi ciently two dimensions of a subject's preferences towards risky choice. i) First, it distinguishes between risk neutral or loving subjects and subjects with different degrees of risk aversion as other lotteries do. ii) Second, the test explores the subjects' reaction to an increase in the magnitude of the risk compensation, that is, an increase in the stakes at play. In fact, by asking our subjects to take four decisions, we get four points of their utility function depending on the size of the compensation for risk, while the most widely used method gets one point after having asked multiple, normally ten, choices.
As shown in table 1, the SGG task involves four panels of ten lotteries each. Each lottery j = 1,…, 10 entails a chance p j of earning X j €(else nothing). Each participant in our experiment had to choose one of the ten lotteries for each of the four panels, presented simultaneously to them.
After choices were collected, a four-sided die determined the panel which would be paid in the case of real payments. Subjects choosing the certain payoff in the selected panel were paid 1€. Subsequently, a 10-sided die was thrown to determine the «winning-lottery threshold». If the result of casting the die was 0, no payment was made to those having chosen a probabilistic payoff, if the result was any other number between 1 and 9, those subjects having chosen a loss probability lower or equal than that number divided by 10 got the prize corresponding to the probability chosen, the others got 0. Each one of the 4 panels is constructed using a certain payoff, c = 1€, and the expected earnings, p j X j , are increased by a ratio t times the probability of not winning, 1 -p j , as implied by the formula: p j X j = c + t(1 -p j ). That is, an increase in the probability of the unfavorable outcome is linearly compensated by an increase in the expected payoff. We use four different risk premium parameters in the four panels, t = 0.1, 1, 5, and 10, implying an increase in the return of risky choices as we move from one panel to the next.
In order to study the subjects' reaction to an increase in t, that is, the magnitude of the risk compensation, we defi ne the elasticity of the probability chosen in panel i = 2, 3, and 4 to the increase in the risk premium as: (1) Additionally we defi ne e p ,t max = as the elasticity of the probability chosen in panel 1 to the maximum increase in risk compensation, occurring in panel 4 (2) .
Assuming, for instance, a constant relative risk aversion (CRRA) utility function, where j=1, 2, …10, it can be checked that a subject maximizing the ex- would choose the lottery j with the probability closest to p = α (1 + c/t).
On the one hand this confi rms the intuitively expected outcome that the lower the probability of winning that the subject chooses, the less risk averse he is, whereas risk neutral/loving subjects would choose p j = 0.1 in all panels. On the other hand, it predicts that the subject should choose riskier lotteries as we move from panel 1 to panel 4. Thus, for risk-averse expected utility maximizing subjects, their sensitivity to the attraction implied by a higher risk compensation t can be approximated by the difference in their choices across subsequent panels.
Our multi-lottery approach also allows us to estimate maximum likelihood models of utility functions in a similar way to . However, we have to adapt a structural model of binary choice to more than two categories, given that in SGG test we have ten possible choices.
First, we estimate a CRRA utility function using SGG lotteries data and assuming expected utility theory (EUT). We assume that utility for a subject is defi ned by: (2) The elasticities obtained from our data are shown in Where, X j is the prize of lottery j, α is the utility weighting parameter and ε j is the stochastic error, with expected value E(ε j ) = 0 ∀j.
Under EUT, the value associated with X j satisfi es: The probability of a subject selecting lottery j over all other possible lotteries is: Assuming that ε j follows an independently and identically distributed (IID) logistic distribution: For instance, the log likelihood of the multinomial logit model is: Where, ∀n = 1, …, N, z nj = 1 if individual n chooses lottery j, z nj = 0 otherwise.
Second, we estimate maximum likelihood models of utility functions assuming Rank Dependent Utility Theory (RDUT). We consider the Tversky and Kahneman (1992) probability weighting function: where γ is the probability weighting parameter, that is, each subject can interpret the same probability in a personal way.
Under RDUT, the value associated with a lottery satisfi es:

RESULTS
In table 2 we present descriptive statistics of the choices made by panel, treatment and reward method. Additionally, in Figure 1 and Figure 2 we present histograms of subjects' probability choices by panel and reward method corresponding to the between and the within-subjects treatments, respectively.
The median in the real reward panels is around 0.4 while it is 0.5 in the hypothetically paid panels and this difference is always signifi cant using Mann Whitney tests in the BST and Wilcoxon tests in the WST. Specifi cally, we observe in table 4 that with real rewards, the probability chosen by subjects is signifi cantly lower than the probability chosen with hypothetical rewards in both treatments, with the exception of panel 4 in the WST.

Result 1: We observe that in both the between and the within-subjects treatments our subjects choose on average riskier lotteries in the SGG test when given real payments as compared to hypothetical ones.
Additionally, using a Levene test we fi nd, as we can see in table 5, that in the betweensubjects treatment the variance of the probabilities chosen by subjects in any panel is signifi cantly higher with hypothetical payments than with real ones. In contrast, in the within-subjects treatment, we obtain this fi nding for panel 4 only.
Result 2: We fi nd that in the SGG multiple lottery task real rewards generate more concentrated choices than hypothetical rewards.
Using a Kolmogorov-Smirnov test (see table 6) we obtain that hypothetical and real rewards generate signifi cantly different distributions of observations (3) . From fi gures 1 and 2 we can observe that with real payments the distribution generally shifts to the left, implying lower levels of risk aversion, and kurtosis grows, refl ecting lower variance in the decisions.
Comparing the elasticities of choices with hypothetical and real payments using a Mann-Whitney test (see Table 4), we obtain that, in the BST, subjects' reaction to an increase in risk compensation is larger when rewards are real than hypothetical, with the exception of e 4 . In the WST this effect is confi rmed only for e max p, t .

Result 3: In the BST the change in subjects' chosen probability from panel to panel is relatively greater when payments are real than when they are hypothetical. In the WST only the change between the fi rst and the last panel is signifi cant.
Apart from calculating the elasticity we have estimated the multinomial logit models presented in Section 2, both under EUT and RDUT and for our two different treatments: real vs. hypothetical payment. We estimate them by maximum likelihood using the clustering method that allows for the possibility of correlation between responses by Estimation results are reported in table 7. Under EUT, the results for the WST and BST are analogous. In the WST the average of the CRRA parameter estimate α is 0.600 with real payment and 0.639 with hypothetical payment. This difference is signifi cant, confi rming again our Result 1 that subjects are more risk averse when payments are hypothetical. The results are equivalent for the BST: α is 0.621 with real payment and 0.665 with hypothetical payment, and this difference is also signifi cant. These values are in accordance to those obtained by .
Under RDUT, the CRRA coeffi cient is again 0.600 with real payment and 0.634 under hypothetical payment in the WST. In the BST these values are 0.619 and 0.658 respectively. All these results are very similar to those under EUT. Regarding the estimates of the probability weighting parameter γ, we obtain a value of 0.647 with real payment and 0.678 with hypothetical payment for the WST. This difference is signifi cant and indicates that the overweighting (underweighting) of small (large) probabilities is more pronounced under real payment. We can observe these effects in fi gure 3.
In the BST we obtain equivalent results, estimated γ being 0.638 under real payment and 0.681 under hypothetical payment (see fi gure 4).

Result 4: Overweighting (underweighting) of small (large) probabilities is greater under real payment.
There are no signifi cant differences between WST and BST, neither in α nor in γ, showing the robustness of the result. To our knowledge, this is the fi rst paper showing that probability weighting is affected depending on whether real or hypothetical rewards are used (4) .

CONCLUSION
We have analyzed the existence, direction and practical relevance of the difference between risk aversion levels inferred under hypothetical and real incentives. Measuring individuals' risk aversion can prove very useful in order to interpret the decisions they take under fi nancial risks. Different tests have been developed both in the psychological and in the economic literature to this aim. We present results based on Sabater-Grande and Georgantzís (2002) multi-lottery choice tests of risk attitude. In contrast to previous studies we obtain that when incentives are real subjects are less risk averse than when they are hypothetical.
Apart from explaining the characteristics of the test, we show that the way in which it is applied is also crucial. If incentives are hypothetical, the answers are noisier, less sensitive to changes in the stakes at play, and show a greater level of risk aversion than if subjects are monetarily motivated. The SGG test we use has good properties (4) Using a between subjects design, Harrison et al. (2010) do not fi nd any signifi cant hypothetical bias for purchasing managers assuming a rank dependent utility model. allowing us to effi ciently obtain two parameters of the utility function of the agent using Rank Dependent Utility Theory. The estimated value of the utility weighting parameter α is signifi cantly lower under real than under hypothetical payments. This means that our subjects are less risk-averse under real incentives and the estimated value for α (around 0.60 for real payment) is in line with the values obtained by  in different studies with other samples. We also obtain differences for the estimated value of γ, the probability weighting parameter in RDUT, this being also signifi cantly lower under real payments. The estimated value of γ (close to 0.64 for real payment) implies the typical overweighting of the small probabilities and underweighting of large probabilities by our subjects.
We obtain these results from a sample of subjects larger than any other comparable study and we use a double design: both within and between-subjects treatments were implemented so that the smaller within-subjects treatment served as robustness check of the between-subjects treatment, obtaining similar results. ARTÍCULOS DOCTRINALES