INDIVIDUAL EXPECTATIONS AND AGGREGATE BEHAVIOR IN LEARNING-TO-FORECAST EXPERIMENTS

Models with heterogeneous interacting agents explain macro phenomena through interactions at the micro level. We propose genetic algorithms as a model for individual expectations to explain aggregate market phenomena. The model explains all stylized facts observed in aggregate price fluctuations and individual forecasting behaviour in recent learning-to-forecast laboratory experiments with human subjects (Hommes et al. 2007), simultaneously and across different treatments.


Introduction
An important feature of models with heterogeneous, interacting agents is that they can explain macro phenomena through simple interactions at the micro level (e.g. Kirman, 1993, Lux, 1995. Agent-based models have been particularly successful in explaining the main stylized facts of nancial markets, such as fat tails and clustered volatility in asset returns (Arthur et al., 1997, Lux and Marchesi, 1999and Hommes, 2002, among others). Duy (2006) presents a recent overview of how agent-based models can explain individual behavior and aggregate phenomena in macroeconomics. The main purpose of our paper is to explain aggregate price behavior through interactions of individual learning. In particular, we provide a simple theory of individual learning through genetic algorithms explaining all stylized facts of aggregate price uctuations in the recent learning to forecast laboratory experiments of Hommes et al. (2007).
Laboratory experiments with human subjects have become an important tool in economic analysis, complementing theoretical, computational and empirical work. A recurring observation from experiments is that individuals often do not behave fully rationally, but tend to use simple heuristics, possibly biased, in making decisions under uncertainty (Tversky and Kahneman, 1974). An extensive bounded rationality research program is developing (e.g. Sargent, 1993) and laboratory experiments are particularly suited to identify behavioral rules that individuals use in economic decision making out of an ocean of potential alternatives (e.g. Kahneman, 2003).
Individual expectations, their interaction and the aggregate outcome they create are at the heart of economics. Duy (2008), for example, argues that laboratory experiments are important to study the adaptive process by which individuals learn and may or may not enforce convergence to a rational expectations outcome at the macro level. Recently, a number of learning to forecast experiments have been conducted to study individual expectation formation and aggregate outcomes, e.g. in Marimon and Sunder (1994), Gerber et al. (2002), Hommes et al. (2005), Sutan and Willinger (2005), Adam (2007) and Heemeijer et al. (2008). In these experiments, subjects must forecast the price of a good, which is determined by market clearing with feedback from individual expectations. Aggregate demand and supply are computerized, e.g. derived from prot and utility maximization given subjects' individual forecasts. An advantage of this experimental setup is that it provides`clean data' on expectations, ceteris paribus. These experimental data can therefore be used to test various theories of bounded rationality, individual expectations and learning at the micro level and test how their interaction matches aggregate behavior. Hommes et al. (2007) conducted learning to forecast experiments in what is perhaps the simplest setting, the classical cobweb`hog cycle' model describing a standard commodity market with a production lag. Historically, the cobweb model has served as a simple framework to develop and test various expectations hypotheses. Ezekiel (1938) started with naive expectations, Nerlove (1958) advocated adaptive expectations, Muth's seminal paper (Muth, 1961) used the cobweb framework to introduce rational expectations and, more recently, Brock and Hommes (1997), used it to introduce endogenous selection among heterogeneous expectations rules 1 .
In their learning to forecast experiments, Hommes et al. (2007) considered three dierent treatments, a stable, an unstable and a strongly unstable treatment. Stable here refers to the stability of the classical cobweb model under naive expectations 2 . They observed the following three stylized facts: 1. the sample mean of realized prices was very close to the RE benchmark in all three treatments; 2. the sample variance of realized prices, however, depended on the treatment: a) it was close to the theoretical variance of the RE benchmark in the stable treatment, while b) it was signicantly higher than the RE benchmark (excess volatility) in the unstable and strongly unstable treatments; 1 The cobweb model has been used in many dierent applications, ranging from markets for lawyers (Freeman, 1975), engineers (Freeman, 1976), public school teachers (Zarkin, 1985), oil (Krugman, 2001), cattle (Rosen et al., 1994) and beef (Chavas, 2000). 2 The stability condition states that the ratio between marginal supply and marginal demand at steady state must be smaller than 1 in absolute value (Ezekiel, 1938). 3. in all treatments, realized market prices exhibited no signicant linear autocorrelation.
These stylized facts were quite robust over a series of experiments, but they appeared hard to explain by standard learning mechanisms oered by the theoretical literature. While many adaptive learning rules lead to eventual convergence to rational expectations and some other learning rules may generate unstable dynamics and excess volatility, homogeneous expectations models are unable to explain the full set of stylized facts simultaneously (Hommes, 2009) In this paper we propose a simple model for micro behavior in order to explain the observed experimental results at the macro level. We model individual learning through genetic algorithms (GAs) 3 . As it turns out, GA experiments with a small population of agents match all stylized facts simultaneously across dierent treatments within a market setting that exactly corresponds to that of the laboratory experiments. While it is certainly hard to imagine GAs as an accurate description of human learning in the literal sense, we argue that they may share key properties of the adapta-tions of human subjects when exposed to a new situation that they cannot penetrate theoretically.
We also investigate the degree of heterogeneity in individual forecasting behaviour. Heterogeneity in forecasting future asset prices is supported by evidence from stock market survey data, e.g. Vissing-Jorgensen (2003) and Shiller (2000), and in ination expectations survey data in Branch (2004) and Mankiw et al. (2003). Moreover, in these survey data heterogeneity shows substantial variation through time. Consistent with the ndings in the laboratory experiments in our GA learning simulations heterogeneity decreases over time, disappears in the stable treatment but heterogeneity persists in the (strongly) unstable treatment.
Using the GAs for individual learning, our paper makes another contribution that goes beyond the limitations of laboratory experiments. Laboratory experiments are costly, because subjects must be paid according to their performance, and typically experimental markets are small because of capacity limitations. After tting our GA model to individual learning, we can easily investigate price behavior in alternative, more realistic market scenarios through numerical simulations. In particular, we investigate the occurrence of excess volatility when the number of subjects in the market becomes large and/or when the number of rules per individual becomes large. We also investigate how excess volatility depends on a continuum of parameters such as the ratio of marginal supply and demand.
The paper is organized as follows. Section 2 recalls the learning to forecast experiments, while Section 3 recalls some basic facts of GA-learning. Section 4 compares the stylized facts of the GA simulations to the laboratory experiments, while Section 5 presents simulations of GA learning in more realistic market scenarios. Finally, Section 6 concludes.
2 The forecasting experiment Hommes et al. (2007) report on a set of cobweb experiments with K = 6 participants per session. The participants were asked to predict next period's price under very limited information on the structural characteristics of the market. The realized price p t in the experiments was determined by the (unknown) market equilibrium between demand and supply: with p e i,t the price forecast of participant i at time t. We normalize the supply side by dividing by the number of rms to facilitate comparison of settings with dierent K. Supply S(p e i,t ) was determined by the nonlinear schedule while demand was formalized via a simple linear schedule: with η t a small stochastic shock drawn from a Normal distribution. Both demand and supply can be derived from prot and utility maximization, and are thus consistent with rational behavior. The resulting equilibrium price is obtained as: where t = η t /d. Given the parameters a, d and λ the aggregate realized price p t depends on individual price expectations as well as the realization of the stochastic shocks.
Participants were only informed about the basic principles of the cobwebtype market. They were advisors to producers, whose only job is to accurately forecast the price of the good for 50 subsequent periods. Pay-os were dened as a quadratic function of squared forecasting errors, truncated at 0: Participants were informed that the price would be determined by market clearing and that it would have to be within the range [0, 10]. Furthermore, they knew that there was (negative) feedback from individual price forecasts to realized market price in the sense that if their forecast would increase, the supply would increase and consequently the market price would decrease. Subjects however did not know how large these feedback eects would be, as they had no knowledge of underlying market equilibrium equations. One could say that subjects had qualitative information about the market, but no quantitative details.
Participants thus solely had to rely on time series observations and their own behavior vis-à-vis their predictions. Due to the nonlinear aggregation of expectations, the superimposed noise and the ignorance of agents concerning the structural form and parameters, conscious coordination to some kind of rational expectations equilibrium would be extremely demanding if not impossible. This setting is close to the informational assumptions of various theoretical models in the literature on learning and bounded rationality (e.g. Sargent, 1993 andHonkapohja, 2001) so that the experimental subjects' behavior could be contrasted with various learning rules. Under rational expectations, all individuals would predict p e t = p * , that is, they would predict the price at which demand and supply intersect. Given that all individuals have rational expectations, realized prices will be given by Given the limited market information one can not expect that all individuals have rational expectations at the outset, but one can hope that in such a simple, stationary environment individuals would learn to have rational expectations. The stylized facts of these cobweb experiments have already been summarized in the introduction. We briey recall them here: (1) the sample mean of realized prices is close to the RE benchmark p * in all three treatments; (2) the sample variance of realized prices depends on the treatment: it is close to the RE benchmark in the stable treatment, but signicantly higher in the unstable and strongly unstable treatments; (3) realized market prices do not exhibit signicant linear autocorrelations. Item (3) indicates that even in the unstable and strongly unstable cases, agents did not leave any linear predictability unexploited. Apparently, the interaction of agents' individual forecasting rules washes out linear predictability in aggregate price behavior. While this points to a certain eciency of their dispersed eects to predict market prices, market prices did uctuate`excessively' in the unstable and strongly unstable treatments. In these cases, price uctuations exceeded those warranted by the exogenous noise component by more than one order of magnitude so that participants' attempts at learning about the market's behavior did apparently intensify price uctuations. While these results were quite robust over a series of experiments, they appeared hard to explain by standard learning mechanisms oered by the theoretical literature.
Our goal here is to model individual learning via genetic algorithms (GAs), so that the interaction of these rules produces the stylized facts observed in the experiments simultaneously and across treatments.

Learning through Genetic Algorithms
Genetic algorithms have been introduced in the seventies as a stochastic learning algorithm (Holland, 1975). In order to solve complex optimization problems with multiple maxima or minima and possible non-continuities this approach mimics evolutionary operations in nature. One typically starts out with a randomly initiated population of candidate solutions. These initial blind trials are typically encoded as chromosomes (strings) using a binary alphabet 5 . After evaluation of the tness of the members of the initial population (in terms of the objective function), one applies the genetic operations of reproduction, crossover, and mutation. For certain reasons, economic applications have mostly added the election operator (Arifovic, 1996) as an additional step in the loop of genetic operations between successive generations of individuals. In the following we provide details of these operators and their implementation in the present setting: 1. Reproduction: in the transition from one generation to the next, the rst step consists in sampling copies of strings from the old generation depending on their tness. In conformity with the pay-o function used in the laboratory experiments tness was dened as a negative quadratic function of forecast errors with truncation at zero: The most common reproduction operator is reproduction depending on relative tness, i.e. copies are sampled from the old population with probabilities f i / j f j biasing the population of new agents towards strategies with higher tness. Other algorithms in the literature are rank-dependent reproduction or tournament selection in which one draws repeatedly n 1 individuals with replacement from the pool and accepts the n 2 < n 1 with the highest tness from the subsample until the new generation is complete.
2. Crossover : when the pool of members of a new generation is complete, genetic material is exchanged between them in order to nd new (possibly better) candidate solutions by recombination of the old ones. The simplest version is random selection of a pair of parent strings, determining a cut-o value within the string and sweeping part of the genetic material of the parents when creating their ospring. We follow this approach and take the genetic material of each of both ospring from the left (right) hand side of their`father' and the right (left) hand side of their`mother'. This operation takes place with a probability p cross , while with 1 − p cross the parent strings are transferred unchanged into the new generation. We note that both more involved crossover schemes as well as versions with more than two parents can be found in the literature as well.
3. Mutation: means that any position (bit) within a chromosome might be ipped into another value (from`0' to`1' or vice versa in the binary alphabet). This happens with a probability p mut once the reproduction and crossover operations are nished.
4. Election: in most economic applications, the usual range of genetic operators has been extended by the election algorithm. This compares new chromosomes that have emerged from crossover and mutation with their parents and only admits them to the population if their virtual tness (measured with respect to the environment in which their parents had to compete) is at least as high as their parents' tness. This operator serves to prevent agents from adopting clearly inferior strategies. Most new strategies that emerge in a genetic process are far o the mark and conscious agents would not voluntarily adopt these new strategies if their trial performance ranks them way below the previous ones.
In many applications of genetic algorithms, the qualitative outcome is largely independent of the particular version of an operator that one adopts (cf. Lux and Schornstein, 2005, for a detailed comparison of various set-ups within a learning context). One may even skip one or the other of the operators (e.g. crossover or election) without changing the overall qualitative results. In our simulations like in various previous economic applications, the results appear to be quite robust under variation of GA parameters and implementations of operators. Unfortunately, one has to rely exclusively on simulations since theoretical results for GAs within an interactive context seem to be essentially out of reach. In our setting with articially intelligent agents, we tried to reproduce the design of the experiments as close as possible. This applies not only to the parametrization of demand and supply functions, and the choice of a tness function identical to the payo function in the laboratory experiments but also to the number of agents.
Hence we report below experiments conducted with K = 6 agents using genetic algorithms to evolve forecasting strategies.
Economic applications of GAs as a learning device have mostly applied it in the sense of`social learning': the number of agents in these papers equals the number of chromosomes and each agent's chromosome type determines her strategic behavior in the market place (e.g. Arifovic, 1996;Dawid, 1999; Arifovic and Gencay, 2000; Lux and Schornstein, 2005). When the genetic operations are applied to this pool of trader-chromosomes, information is eectively shared and incorporated into the entire new generation via the evolutionary dynamics. This design is certainly at odds with the set-up of the experimental market in which subjects are separated from each other and are not allowed to actively exchange information. We, therefore, assumed that each agent in our computer experiment had his own pool of strategies or forecasting rules which undergo their genetic evolution independently from the rules of other agents (cf. LeBaron et al., 1999, for a similar approach). 6 In our experiments reported below, we endowed each agent with M = 10 dierent chromosomes encoding pairs (α i , β i ) of the rst order autoregressive forecasting rule detailed below. The active rule of each agent, i.e. the rule on which her actual forecast was based, was determined by random draws with probabilities equal to the relative tness obtained in the last round (which is a monotonic function of the proximity of the forecast to the realized price, cf. eq. 7).
Genetic algorithms require a functional specication of the forecasting rule, whose tness-maximizing parameter values would then be searched for via the evolutionary algorithm. 7 The simplest specication of a rule would be a constant price forecast. A slightly more complex version would use a constant together with a rst order autoregressive component: This rst order autoregressive (AR1) rule seems a natural forecasting scheme as agents could simply implement it via a linear autoregression using the sample average as their estimate of α i and the rst order sample autocorrelation as the estimate of β i . Moreover, the AR1 forecasting rule (8) has a simple behavioral interpretation, with α i representing an anchor or observed average price level around which the market price uctuates, and β i representing the observed persistence or anti-persistence of price uctuations 8 .
As discussed by Hommes (2009) a representative agent model where all agents employ the same xed rule, e.g. the rule (8), or where all agents adopt the same adaptive learning scheme, e.g. sample autocorrelation or least squares learning, as a uniform learning mechanism for the whole population, can not explain all stylized facts of the experiments simultaneously. A homogenous adaptive learning rule either always enforces convergence to RE (i.e. does not explain the second observed stylized fact, the excess volatility in the strongly unstable case) or, in cases where the adaptive learning rule leads to excess volatility, it generates anti-persistent price behavior with signicantly negative rst order autocorrelation, violating the third stylized fact in the laboratory experiments. In our GA-model, we apply the same functional scheme in a heterogenous agent framework with genetically evolved sets of parameters α i , β i that could dier across individuals.
The key question then is whether the interaction of individual forecasting rules (8) can explain all stylized facts observed in aggregate price behaviour simultaneously.
In our simulations the two parameters α i and β i are encoded in one string of length l = 40, the rst (last) 20 bits representing the parameters α i and β i , respectively (the number of bits is quite arbitrary and only needs to be large enough for a suciently ne-grained structure of the resulting real-valued strategies). α i is restricted to the interval [0,10] just as in the instructions to participants in the laboratory experiments. The interval for β i is more arbitrary and had been set symmetrically around zero, β i [−2, 2], allowing for quite strong serial correlation or anti-correlation. The transition from the binary coded evolutionary process to the real-valued forecasts requires to compute: It is well known that short run GA-simulations may be sensitive to the initialization of the GA's. Therefore, in our (short run) simulations we have chosen an initialization in line with individual forecasts in the rst and the second period of the experiment, as discussed in more detail below.

Experiments with Genetic Algorithms
In this section we report the results from GA simulations. Unless reported otherwise, each of the six agents will be endowed with M = 10 chromosomes and the crossover probability p cross = 0.6 (but dierent values yield similar results). Genetic learning would converge to the rational expectations equilibrium if uniformly across the population all β i,t would tend to zero and the α i,t would converge to the RE price p * . Since the experiments run over a limited number of rounds, an appropriate alignment of our GA simulation with the lab settings is required. Note that in order to start the evaluation of the tness of agents' strategies, we need two realizations of the market price: the rst one serves as the anchor value for the AC strategies in eq. (8) and the subsequent realization serves to evaluate the quality of the AC forecast using eq. (9). As a consequence, evolutionary strategies could be evaluated for the rst time at t = 3. In order to align the GAs to the lab experiments, we therefore, choose for periods 1 and 2 forecasts and prices from the experiments while the GA population of each agent is initialized randomly. In this way, the`initial conditions' of the GAs are set equal to those of the lab experiments and our articial agents initially are subject to the same incentives like the human subjects. As it turns out, this alignment typically guarantees greater similarity than, say, a randomized choice of forecasts at t = 1 and t = 2 (while still qualitative results are pretty much the same under dierent initialization schemes). Obtaining closer proximity by accurate alignment should be seen as an encouraging nding: It is worth emphasizing that this is not a ne-tuning of our algorithm but rather an attempt at matching as closely as possible the experimental scenario.
In our simulations, consistent with the laboratory experiments, we nd that the market price uctuates around the RE benchmark with a sample mean very close to the RE benchmark, but that the level of volatility depends strongly on the treatment. In the stable case, the sample variance is close to its rational expectations benchmark, but it increases signicantly beyond the RE benchmark if we proceed to the`unstable' and`strongly unstable' scenarios. Fig. 1 shows snapshots from longer simulation runs and Table 1 summarizes some key statistics, for all three treatments, averaged over 1,000 simulations of 50 periods each. The sample mean of individual forecasts (Mean(p e )) has been obtained by averaging the individual forecasts over all subjects (K=6) and all experiments (J = 6) for each treatment. The sample variance of individual forecasts (Var(p e )) has been computed as follows. Let p j t,k be the price forecast for time period t, by subject k, in experiment j, then the mean forecast for period t in experiment j is µ j t = 1 K kp j t,k , with K = 6 in the experiments. The sample variance of this mean forecast over all rounds (T = 50) of experiment j is given by The sample average of individual forecasts can then be obtained by averaging over all experiments (J = 6) or over all simulations (J = 1, 000) respectively, within each treatment: 9 Var(p e ) = 1 J j Var j (p e ). Figure 1: Snapshots from simulations of GA-learning: realized prices (solid) and RE benchmarks (broken lines) for all three treatments (stable, unstable and strongly unstable) and three dierent values of the mutation probability σ mut . In a similar vein, the variance of realized prices, Var(p) has been computed according to (11), averaging over J = 1, 000 simulations. Table 1 shows the statistics for the GA simulations, the laboratory experiments and the RE benchmark 10 . The table shows that the GA-simulations are surprisingly close to the laboratory experiments across all treatments. Besides the three treatments of the laboratory experiments, we also distinguish between different settings for the mutation probability, p mut = 0.01, 0.025 and 0.05, as this appears to be the more interesting aspect of the GA design. As can be seen, price uctuations also increase ceteris paribus with higher mutation probability due to the higher rate of new forecasting rules entering the population. Like in other applications of GAs (cf. Lux and Schornstein, 2005) varying other parameters as well as choosing dierent specications of the operators appears to cause no major changes in the overall outcome.   stylized facts above. We note that the qualitative outcome was quite robust under various modications of the GA learning mechanism. For example, we get similar results when dispensing with the election operator. The main dierence in this setting is that uctuations in the`unstable' and`strongly unstable' treatment become more pronounced and that we see somewhat more signicant negative autocorrelation in the rst few lags: as mentioned above, this feature can be easily explained by the mean-reverting nature of the dynamics with more random mutations admitted to the population. On economic grounds we might, however, argue that agents should not allow obviously unsuccessful strategies to enter their set of forecast functions (which is why the election operator had been introduced in economic settings) so that we would not place too much weight on these results.
Interestingly, adopting a simpler concept of learning, that dispenses with the AC parameter β i and restricts forecast rules to a constant α i also leads to results that share some of the stylized facts. While this scenario leads to similar outcomes for volatility in the three treatments, it is, however, also characterized by anti-persistent price behavior and more signicant zigzag patterns of autocorrelations in the strongly unstable case. The simplest GA with only constant rules, therefore, seems to inherit at least part of the oscillating dynamics of the benchmark case of homogenous naive or adaptive expectations. A more intelligent type of forecast rules such as our AR1 rules, taking into account both the average price level and rst order autocorrelation, is required to remove linear forecastability. Stated dierently, individual learning of the mean alone is not consistent with the laboratory experiments, but some more sophisticated form of individual learning taking into account whether prices are persistent or anti-persistent is needed to remove autocorrelations in aggregate prices. The interaction of individual rules, learning both the price level and the rst order autocorrelation, leads to the correct aggregate price level and washes out all autocorrelations in aggregate price uctuations.
Why do the GA experiments reproduce the experimental results so well? Our conjecture is that GAs and human subjects share the tendency of learning by experience and of shifting their strategies towards a specication that would have performed well in the recent past. This is actually the consequence of the built-in genetic operators of GAs. While this leads to an optimal solution for static problems (at least, if the evolutionary process is allowed to run long enough), with the changing objective functions of an interactive environment it could also lead to repetitive patterns (cf. Lux and Schornstein, 2005). In the absence of structural knowledge about the underlying mechanisms of a decision problem, humans can also still determine what actions or decisions would have performed well in the past. Quite clearly, the laboratory experiments with their unknown forms of the underlying market functions and added stochasticity could not have been fully penetrated by the experimental subjects. However, they could easily focus on the past success and failure of their forecasts and learn to maintain successful strategies. Exploiting the mean together with short-run autocorrelations seems to be one of the simplest strategies that could be pursued in a rule-of-thumb manner without computing the sample autocorrelation exactly (which would normally not be possible given the time pressure of most experimental settings). These rough computations lead to stochastic uctuations that are similar to the uctuations caused by the evolutionary dynamics of the GA. The latter feature distinguishes our heterogenous learning scenario from homogenous learning models (Hommes, 2009) which seem unable to explain the full set of stylized facts. Heterogeneity in individual forecasting thus seems to be a key element in explaining all stylized facts at the aggregate level simultaneously.
In summary, our conjecture is that the orientation at successful performance in the past within a reasonable class of forecasting heuristics together with the heterogeneity of the GA design explains its proximity to human behaviour in the lab. We may note that such conformity has also been found in a number of other cases, e.g. in an experimental foreign exchange market (Arifovic, 1996) or public good experiments (Casari, 2004). It is also related to the work of Erev and Roth (1999) on reinforcement learning to explain experiments with repeated games 11 . Anufriev and Hommes (2009) have re- 11 We also investigated a reinforcement learning (RL) algorithm with similar strategy sets as in our GA setting. Individuals were endowed with AC strategies with parameter space α i ∈ [0, 10] and β i ∈ [−2, 2]. Initial parameters were drawn randomly from a uniform distribution and updated with probabilities computed via relative tness (as in the GA experiments) with tness dened by the payo function (5). We distinguished between a myopic RL algorithm, only using the last payo as tness function, and a full memory RL scheme that computed tness as the arithmetic average of all cently used another form of reinforcement learning to explain learning to forecast experiments in a dierent, asset pricing framework.  These examples already suggest that heterogeneity quickly disappears in the stable treatment, while heterogeneity is highly persistent in the strongly unstable treatment. Figure 4 (bottom panel) also shows the average degree of heterogeneity, that is, the time development of the standard deviations of individual forecasts (K = 6 individuals per group) averaged over all groups in each treatment. More precisely, letp j t,k be the price forecast for time period t, by subject k, in experiment j, then the mean forecast for period t in experiment j is µ j t = 1 K kp j t,k . The standard deviation of the mean previous payos. Qualitative results of both settings were not too dierent, however. The basic outcome of these RL experiments was that (i) agents never converged to the RE benchmark. In particular, the variance was always above the RE benchmark, even in the`stable' scenario. Mean values were slightly further away from RE prices than under GA learning, (ii) in all cases, realized market prices showed pronounced cyclical patterns indicating that RL agents were not able to exploit all linear structure, (iii) results seemed to be entirely insensitive to the number of strategies (we allowed the strategy set of agents to vary from M = 50 over 500 up to 5000). Detailed results are available upon request.
forecast at date t, over K = 6 subjects of experiment j is The average degree of heterogeneity at date t over all experiments (J = 6) within a treatment is then dened as the average standard deviation The time development of the average degree of heterogeneity in Figure 4 (bottom panel) exhibits two important features: (1) for all treatments forecast heterogeneity decreases over time, and (2) forecast heterogeneity is persistent in the unstable treatment and highly persistent in the strongly unstable treatment.  in the stable, the unstable and the strongly unstable treatments. This gure shows that GA-learning simulations reproduce the patterns of the average degree of heterogeneity in the laboratory experiments quite nicely: a quick decrease of forecasting heterogeneity in the stable treatment and a much slower decrease in the (strongly) unstable treatment. In fact, both in the experiments and the GA learning simulations the unstable and strongly unstable treatments exhibit a non-monotonic development of forecasting heterogeneity with an increase in heterogeneity in the early stage of the experiment/simulations due to overshooting and a decrease in heterogeneity after periods 5 − 7 due to learning.

Beyond the Laboratory Setting
In contrast to experiments with human subjects, additional experiments with genetic algorithms can be conducted at essentially zero cost. In this section we expand our previous experiments into various directions not covered by the laboratory experiments. Among others, we investigate long run price behaviour, how price behaviour depends upon parameter values, in particular the parameter tuning the nonlinearity of the supply curve, and we investigate the consequences of an increase of the number of agents and forecasting strategies in the GA populations. Table 2 summarizes the long run statistics for all three treatments and three dierent mutation probabilities, p mut = 0.01, 0.025 and 0.05. As can be seen, in the stable case the long run average degree of heterogeneity, Var(p e ), is small and price volatility is quite close to the RE benchmark 0.25. In the unstable treatment price volatility is slightly above the RE variance, while in the strongly unstable treatment the long run price variance is signicantly above the RE benchmark due to a larger average degree of heterogeneity. The strongly unstable treatment thus exhibits persistent heterogeneity and long run excess price volatility. Moreover, both the average degree of heterogeneity and excess price volatility increases with the mutation probability. This seems intuitively clear, since a higher mutation probability leads to a higher rate of new forecasting rules entering the population.

Parameter sensitivity
In the next set of simulations we explore the transition between the`nice' price behaviour of the`stable' treatment and the excessive price volatility of the unstable treatments. Recall that the dierence between these treatments is the parameter λ tuning the nonlinearity of the supply curve. We ran the same type of GA experiments with 800 dierent values of the slope parameters λ ranging from 0.005 to 4 (with increments of 0.005). Fig. 6 reports the mean values and variances of realized prices over 50,000 rounds together with their RE benchmark. It turns out that the variance of realized prices is close to its RE benchmark of 0.25 only for very small values of λ with an almost perfectly linear increase with λ thereafter. 12 In contrast, the average price stays close to its RE benchmark over the whole range of our experiments. While there appears to be a slightly increasing wedge between average price and the RE solution for increasing λ the deviation is always very small compared to the dierence between the realization of the second moment and its RE benchmark. We conjecture that this increasing wedge might be more an artifact of our simulation design than a`true' indication of (small) deviations of the mean price from the rational expectations price. Since p RE is slightly above the center of the admissible range [0, 10] larger uctuations would generate some asymmetries in realized prices with a slight dominance of lower rather than higher prices. The slight deviation from RE in the rst moment (which remains smaller than 2% percent in all scenarios) would then be a numerical consequence of the large deviation in the second moment from its RE benchmark. Notes: Long run simulations with K = 6 GA agents (chromosomes) and dierent mutation probabilities p mut . The rst and second moments for market prices, Mean(p e ) and (Varp e ), are computed from simulations over 50,000 time steps after discarding the rst 10,000 observations as transient sample. Mean (p e ) is the mean over the whole simulation of the average forecast across the 6`agents' in each period. The average degree of heterogeneity, V ar(p e ) has been computed according to (10), averaged over T = 50, 000 periods after a transient of 10, 000 periods.

Simulations with many agents
Another set of Monte Carlo runs investigates what happens if we increase the pool of participants in our forecasting experiments. While laboratory settings are typically restricted to small numbers of agents due to technical restrictions, the availability of subjects and the costs of running large experiments, we can easily extend our previous GA setting to much larger numbers of articial agents. Since we have normalized supply by dividing through the number of rms in eq. (1), the RE benchmarks for rst and second moments remain the same for all population sizes, K. Table  3 compares the results for population numbers K ∈ {6, 12, 30, 100, 150, 600}. Initialization of the GA simulation is done based on the rst and second period individual forecasts in the experiments with K = 6 subjects in Hommes et al. (2007) as well as the experiments with K = 12 subjects in the strongly unstable treatment in van de Velden (2001). For the rst period, all experiments (with 6 or 12 subject, with stable, unstable and strongly unstable treatments) have been pooled and the resulting distribution has been tted with a normal N(5.271, 1.393). Second period forecasts in the experiments dier between treatments, but are very similar for the 6 and 12 subjects cases in the strongly unstable treatment. We, therefore, pooled the forecasts of period 2 over all experiments of each treatment and tted Normal distributions N(5.279, 1.698) for the stable treatment (only cases with 6 subjects), N(5.952, 1.266) for the unstable treatment (only cases with 6 subjects), and N(6.885, 1.225) for the strongly unstable treatment (pooled over all experiments with 6 or 12 subjects). The above experiments with K = 6 to K = 600 agents have been initialized by random draws from the pertinent Normal distribution, i.e., the same over all settings in period 1 but the treatment-dependent ones for period 2.
Apparently, higher numbers of agents have a tendency to dampen uctuations. While there is not much dierence in the experiments with stable slope parameter λ = 0.22, the eect is more pronounced in the unstable and strongly unstable scenarios. The stable case stays close to the RE benchmark for all sizes of the population with the variance of price uctuations close to the variance of the random term. In the other cases, the excess uc-  tuations are clearly reduced when increasing the size of the population. In the`unstable' case (λ = 0.5) price volatility decreases from Var(p) = 0.556 for K = 6, to Var(p) = 0.362 for K = 30, already fairly close to the RE benchmark of 0.25. In the strongly unstable case (λ = 2) price volatility is reduced by more than 50% for K = 600, but is still signicantly higher than the RE benchmark (Var(p) = 0.694). In the strongly unstable treatment in the short run, i.e. for the rst 50 periods, excess volatility thus persists when increasing the number of agents to K = 600.
Another striking feature of these GA-simulations is that an increase of the number of agents beyond N = 30 has little eect upon aggregate behavior. In all treatments, price volatility and the average degree of heterogeneity drop signicantly when the number of agents is increased from N = 6 to N = 30, but hardly drop when the number of agents is further increased from N = 30 to N = 600. This suggests that it may be possible to study macro phenomena in relatively small laboratory experiments with about 30 subjects, a size that is manageable in most experimental laboratories. See also the discussion about the relevance of laboratory experiments in macro in e.g. Duy (2008). Table 4 gives an overview of the same statistics in long run simulations, based on 50,000 periods, after a transient of 10,000 periods. Both the stable and the unstable treatments converge to the RE benchmark, with price volatility of Var(p) = 0.255 and Var(p) = 0.269 respectively for K = 600. Also the strongly unstable treatment approaches the RE benchmark relatively closely in the long run, although price volatility Var(p) = 0.359 for K = 30 and Var(p) = 0.315 for K = 600 respectively are still more than 25% above the RE-benchmark.
The laboratory experiments provide a simple and stylized framework that is stationary for 50 periods. In real markets with uctuating prices, one would perhaps expect larger exogenous shocks to occur occasionally. From this perspective, what we have called the short run, i.e. the rst 50 periods, could be more relevant to real markets than the long run where the market is stationary for a very long time.
The decrease of volatility with increasing population is probably easy to explain: adding more agents evokes a law of large numbers. Since our GA agents are eectively independent stochastic processes, their individual uctuations should be averaged out when aggregating over more and more individuals. 13 This is what seems to happen in our experiments. Note, in particular the strong decrease of the variance of average price expectations in all settings. Of course, if price expectations would converge to the RE benchmark, realized prices would only uctuate due to the exogenous random noise component.

Simulations with many rules
Finally, Table 5 reports results of GA-learning simulations where the number of agents has again been xed to K = 6, while the number of rules, M , available to each agent increases from M = 10 to M = 60.
Note that in a GA setting, the set of rules of an individual need not necessarily be dierent. In fact, convergence of the GA would imply that the population of rules of an agent becomes fully homogeneous. Increasing M thus does not necessarily mean that an individual has more dierent rules in each period, but it only increases the potential sophistication of the set of rules. As it turns out, at least in the`strongly unstable' treatment, this higher sophistication leads to an increase of the volatility of realized (as well as expected) prices. With M = 60 rules per agent, price volatility has almost doubled from Var(p) = 0.824 to Var(p) = 1.546 . In contrast, variation of the number of rules M seems to leave the results of the stable and unstable treatments almost unchanged. We conjecture that the higher number of chromosomes allows agents more easily to react on price uctuations around the RE benchmark. With a high λ, under naive expectations a small deviation from p RE would lead to a step-wise increase or decrease of the price for some time. Autocorrelation detection by some agents could reinforce this tendency as they would already forestall the direction of the subsequent price changes. With a large number of chromosomes, chances are increasing to evolve such momentarily advantageous rules. If such rules are admitted to the population, they would enhance uctuations. It might, therefore, be a mixture of`naive' adaptation of some agents (modications of α i ) and trend chasing of others (adapting α i and β i ) that generates the higher volatility in this case. Unfortunately, a systematic analysis of the interplay between the number of agents and their behavior in experimental settings is beyond the limit of available laboratory resources. Given the autonomous adaptation of human subjects to dierent environments, it is not clear whether their learning behavior would remain unchanged in groups of dierent sizes. Our simulations suggest that, at least in the strongly unstable treatment, an increase of the number of learning rules per agent may be a potentially destabilizing force counterbalancing the stabilizing force of an increase in the number of agents in the market. It therefore seems possible, that changes of behavior might compensate for the law-of-large-number tendency in larger groups.

Concluding Remarks
Genetic algorithm learning of simple forecasting strategies provides an accurate description of individual expectations at the micro level and, at the same time, the interaction of these individual rules matches observed aggregate price behavior at the macro level surprisingly well. In the simple framework of the classical cobweb model, the interaction of individual GA-learning rules is able to reproduce all stylized facts in aggregate prices correct sample mean, excess volatility depending on demand/supply characteristics and no linear predictability observed in recent learning to forecast laboratory experiments with human subjects. In contrast to homogeneous learning rules, the interaction of heterogeneous GA-learning rules explains all stylized facts simultaneously and across various treatments. It should be emphasized that these results are robust and not sensitive to the GAspecication or the two GA-parameters (the mutation probability p mut and the crossover probability p cross ). The GA-algorithms attempt to learn two parameters the sample mean and the rst order autocorrelation coecient in a simple AR(1) forecasting rule. Evolutionary selection within a simple class of individual forecasting heuristics, that take into account both the observed sample mean and the rst order sample autocorrelation, thus explains aggregate price behavior surprisingly well.
We have also looked at the average degree of heterogeneity of individual forecasting behaviour. In all treatments, heterogeneity decreases over time.
In the stable treatment heterogeneity quickly disappears and the price settles down to its RE benchmark. In the (strongly) unstable treatment heterogeneity decreases somewhat due to learning, but heterogeneity persists, even in the long run. These results suggest that economic theory needs to go beyond representative agent models with homogeneous expectations. The matching of our GA-simulation results with laboratory experiments are consistent with a theory of endogenous selection of heterogeneous expectations, for example, as in Evans and Ramey (1992), Brock and Hommes (1997) and, more recently, in Reis (2006).
Fitting a GA-learning model to the laboratory experiments allows one to go beyond experiments and simulate alternative and more realistic market environments. Through GA-simulations, we have seen that adding more agents to the market has a stabilizing eect, that is, price volatility decreases as the number of agents increases. However, in the strongly unstable treatment, excess price volatility persists when the number of agents becomes large. On the other hand, increasing the potential sophistication by allowing more strategies per individual has a destabilizing eect and makes price behavior more volatile. Additional laboratory experiments could reveal more information about the number of strategies subjects are using, in order to explore which of these two forces will dominate.