Written by Bill Price
In a recent discussion on Biofortified, a conversation regarding the ability of small scale research plots to represent real world results was raised. For reasons of experimental control, practicality, and economy, the majority of agricultural research is carried out at smaller scales, i.e. through growth chambers, greenhouses, and small field plots. Almost uniformly, the results of such studies are extrapolated to larger “field” level scales for reporting purposes. While this translation may seem like a straight forward conversion, it can have considerable affects on the interpretation and inference made from the research. Specifically, it is important to understand how error rates at the small scale carry over and affect the larger scale results.
In this post, I will use a journal article cited in the discussion above (Elmore, et al. Glyphosate-Resistant Soybean Cultivar Yields Compared with Sister Lines, Agron. J. 93:408–412, 2001; Accessed from: http://digitalcommons.unl.edu/agronomyfacpub/29/). This article examined the effects of genetically engineered herbicide resistance on production characteristics of several soybean varieties. While multiple varieties and herbicides were considered in the research, those of interest here are the lines genetically altered for glyphosate resistance (GR) and their corresponding “sister” lines which were genetically similar with the exception of having no glyphosate resistance (Non-GR).
Table 5 in the article presents average comparisons for these two varietal groups. The yield of Non-GR lines is given as 3.68 Mega-grams (Mg) per hectare while that of GR lines is 3.48 Mg per hectare. Statistical significance between these two groups is not explicitly shown due to an apparent typographic error in the table (no letter designation is given for the GR group). A standard error (SE) of 0.08 is reported, however, it is not clear whether this represents the error of the means themselves or the error of the mean comparison, 3.68 – 3.48 = 0.20 Mg per hectare. In order to proceed with this discussion I will assume that the standard error is for the contrast itself. This appears to be consistent with the presentation of other tables in the paper and is the “best case” scenario for the researchers with respect to the variability of the data. From here, we can see an approximate 95% confidence interval on the difference in means (roughly the difference ± 2*SE or 0.04 to 0.36) would almost, but not quite, cover zero. This implies some significance, although marginal. Still, the difference of 0.20 Mg or 200 kg per hectare would not be negligible to a producer, especially when compounded over several hectares. Perhaps this is a case where statistical tests belie a real practical difference, as recently covered on Biofortified by David Tribe in GMO statistics Part 10: the King of Hearts is NOT equivalent to the King of England .
At this point, it is useful to consider how the experiments were carried out, what was actually measured, and how the data were collected. In the methods section we find that the soybeans were grown in typical field plots measuring 4 rows wide and 9.1 m (30 feet) long. The rows were on a standard soybean production spacing of 0.76 m (30 inches). In order to provide a buffer from adjacent plots, only the center two rows were harvested. It is not mentioned if a buffer or border zone was used between adjacent plots within a row, however, I will assume a 5 foot border here as this is similar to standard practice in such studies and does not greatly affect the demonstration given here. This leaves us with an effective plot size of 60 inches x 25 feet or 11.613 m2. Through the magic of metric conversion, it turns out that the yield numbers reported in Table 5 also represent the units of 100 g per square meter. This, of course, leads us to plot level yields. For the example above, the difference in varietal groups, 20 per square meter, is equivalent to 232.3 g per plot. Conveniently, the authors have also supplied us with seed size information in Table 5, thus, assuming an average seed size of 0.144 g per seed, the difference observed was approximately 1613 seeds.
This still seemed fairly substantial, but was difficult for me to visualize. To satisfy my curiosity, I took a trip to the local Coop and picked up some soybeans. From these I determined that 1600 seeds is equivalent to approximately 420 ml (~1.7 cups) in volume or about what you could hold in two hands. Using similar computations, the 2*SE used in the confidence interval above translates to about 1300 seeds or 340 ml (~1.4 cups). A cup and a half of seed is translating to a potential difference in metric tons between varietal groups at the production level. How is this happening? First consider the process of plot harvesting. The paper states that a small plot harvester was used for this purpose. For those not familiar with these machines, they are usually scaled down, car sized versions of full size combines. They are complete replications of larger machines having a sickle bar cutter and reel to collect plant material which is then passed through the machine where the debris and chaff is separated from the seed. An operator sits on top controlling the direction, speed, cutter height, etc. Often, a second person will ride or walk alongside the harvester catching the seed from each plot in a bucket or grocery sack. Harvesting is a dirty, dusty business subject to human failures. It is not hard to imagine the loss of a cup or two of seed over 25+ feet of plot during this process. Seeds can be dropped, missed, shattered to the ground by the cutter/reel, or simply blown out the back with the chaff if the settings on the sieves are incorrect. Care must also be taken to pause the combine between plots in the border zone (typically mowed down prior to harvest) in order to allow the combine to finish thrashing and processing the plot material. Matters can be further exacerbated if the seed from each plot is run through a seed cleaner prior to weighing. Of course, on top of all this, there is variation due to spatial location and arrangement, micro-climates, etc. These are all sources of variation that skilled researchers strive to minimize. The problem with scaling small plot results to full scale production levels is that the errors encountered in plot harvesting either do not occur in full scale scenarios or, when they do occur, they do not scale up proportionally. The proportion of seed missed relative to the total amount taken in, for example, can be much higher for a small machine compared to a full size machine. Small scale spatial variation is much more influential on small plot measurements compared to those taken across a wide area. A common way to measure these differences is the CV statistic or coefficient of variation, which is the ratio of the response variability to the response mean. In full scale production, this ratio is typically much smaller than the corresponding values from small scale research. In other words, small variations can have a large influence in research data, but similarly scaled errors are not likely to occur at the field level.
To be clear here, I in no way mean to be critical of these particular researchers. By all accounts they have carried out a set of designed studies to the best of their abilities. I picked this article because of its relevance, convenience, and reported information. It should be generally noted, however, that research methods have their limits in resolution and these limitations can translate to apparent large real world differences. The interpretation of such differences should be considered with caution.
So, given these difficulties, of what use are small scale studies? How should we interpret their findings? While I hope I have shown that we should use caution and common sense when extrapolating to field scale levels, small scale studies have much more value than that. Interpretation of results within a study, whether re-scaled or not, is always important. Ranking and comparison of treatments, varieties, or other experimental effects are usually unaffected by these problems (see for example http://www.colostate.edu/depts/prc/pubs/ComparisonOfLargePlot_KL.pdf). Small scale trials also allow researchers to control for outside influences such as environmental conditions, in order to more accurately measure experimental treatment effects. They are indispensable for “proof of concept” experiments where the objective is to isolate a given process or test a specified hypothesis. As part of the scientific method, small scale studies play an important role helping researchers refine and define their research problems. Often these results are then expanded to larger scale trials, thereby encompassing a wider array of potential variation and allowing better assessment of their viability in the real world.
Too often, however, the initial small scale results are picked up, extrapolated, and used by interested parties without consideration of potential problems. Consider the conclusion drawn here by the authors: “Glyphosate resistant sister lines yielded 5% (200 kg ha-1) less than the non-GR sisters (GR effect).” This conclusion has also been cited as definitive evidence of yield drag in the widely distributed report Failure to Yield by Doug Gurian-Sherman (Union of Concerned Scientists). Yet, we have seen that this difference could have been as low as 40 kg per hectare. Reporting or interpreting the field level extrapolations without acknowledging the variability is misleading. A slight increase in the variability (0.02 Mg or ~160 seeds) would have led to a conclusion of non-significance. Stating that a difference was found is fine, but stating unequivocally that a loss of 200 kg per hectare can be expected is not. As consumers of research results we must be aware of the limitations regarding small scale trials and correctly assess the interpretations of extrapolations we make from them.
Written by Guest Expert
Bill Price has a PhD in plant science. He has worked in agricultural research for nearly 40 years and is currently a statistician in the College of Agriculture at the University of Idaho. His work includes diverse topics including but not limited to dairy science, human nutrition, weed science, and benthic microbiology.
I’m thinking the small-scale field trials are more reliable than farm-scale field trials. On small plots, it would be easier to ensure the GM crop and its non-GM comparator are grown in soil of equal quality.
In farm-scale fields, soil quality generally varies substantially, i.e., enough to throw off an attempt to interpret the results. The variation is visually apparent to whoever is running the combine.
More reliable in what sense? For measures of plant characteristics, they (small plots) can be, but for field level measurements there can be many problems, one of which I outlined above. The field-scale variability you refer to can be an issue, for example, if the plots are located in an atypical environment, say overly fertile or the opposite. For plot to plot comparisons, this is fine, but for extrapolation to field level, it is not and will lead to biased estimates. The field-scale variability is, in fact, exactly what you want to see in a field level trial, allowing testing over a range of environmental conditions.
Each system has its strengths and weaknesses. we just need to remember those when looking at the data.
The discussion seems to assume that the small scale studies are done in liew of “real world” experience. I feel that often the controlled small scale studies are done after “real world” observations are reported. The small scale studies are often done in an attempt to isolate the factor or factors that resulted in the “real world” observations. This is normally seen in the Introduction Section of the publication as the justification for doing the research.
No, I don’t think that is implied at all. Small scale studies are done in lieu of Large Scale studies, not the real world, for the reasons outlined in the quote. The motives may be reactionary to “Real World observations( TM)” or not. Problems can arise when we project the results from either setting to the real world. Experiments are just models of the real world and, hence, by definition, have limitations. We need to understand what those are.
I think that the Nebraska paper that was cited in the lead article is an illustration of my point.
First there is the real world reports of a yield drag.
They also quoted some unpublished research that found no yield gain, but since this “research” was never published; I have not included it.
Then they design experiment(s) to isolate the possible causes.
Then they analyize the results using acceptable stastical methods and report their conclusions.
Please notice that they stated “the potential”.
The original paper in this thread stated the following:
If one reads the actual Gurian-Sherman paper, one finds the following:
How can that statement be considered a justification for the statement in the lead article?
I’m am not really sure what you are arguing here, Henry. Yes, a real problem was observed, and yes, they did a decent experiment to assess it. Yes, they do qualify their results as you cite. It is unfortunate, however, that they did not use such phrasing right up front, in the actual paper, in the abstract, which will likely be the most quoted, cited, referenced section of a paper. There they say:
“Glyphosate resistant sister lines yielded 5% (200 kg ha-1) less than the non-GR sisters (GR effect).”,
implying they measured an absolute 5% or 200kg per hectare change. They did not. They extrapolated this figure from the amount of seed you could hold in your hands. Not everyone citing these things reads them as deeply as you do. I wish they did, but they do not.
I am not saying they didn’t see something. I am saying we should all be careful with such large extrapolations. This is true whether we are trying to draw a line between cells in a petri dish to an embryo in vivo or a cageful of rats to the human population in general.
Citing results as per hectare is like citing speed as per hour. I doubt that people in agriculture would interpret that they utilized 2.471 acres (4 significant figures) and found 200. kg less (3 significant figures). The 5 % number (instead of 5.9 or 5.99 etc.) should indicate to the scientifically trained reader that they are reporting data to a precision of approximately 1 part in 20. (Also writing 200 without a decimal point is not indicating 1 part in 200.)
i.e. the large extrapolation problem is yours not theirs.
Sorry to disagree, Henry, but not all who see this are people in agriculture or scientists. Greenpeace, for example, has used the percentage values to conclude RR soybeans have cost US farmers millions of tons in output.
The extrapolation problem is everyone’s when taken out of context.
I have seen the 5% figure touted by many anti-GE groups, such as Greenpeace, The Center for Food Safety, and the Organic Center. They have extrapolated the experiment to the whole of GE soy crops and make broad sweeping conclusions off of it. It is good to note that although Failure to Yield from the UCS also referred to it, they acknowledged that it didn’t take into account weed issues in non-GE soy, and their conclusion was that when you take that into account, GE soy didn’t yield significantly more nor less.
The Greenpiece link does not work. From a Google search I found this one:
Notice the use of “also”. ie. the main 5 % came from an analysis of multiple field trials not from just the supporting Elmore and colleagues experiment.
I think it is important to introduce the Benbrook paper link.
According to Google Scholar it has been cited by 68 more recent scientific papers.
Here is one that specially discusses the Benbrook paper in detail:
This paper has been cited by 14 more recent papers.
I honestly don’t even think the 5% yield drag associated with RR soy is even that controvertial – Monsanto did, after all, spend millions developing a 2nd generation product which didn’t have the drag (although as Monsanto are profiting from that… perhaps the drag was a conspiracy all along)
All PDiff is pointing out is that a 5% drag isn’t necessarily what you’ll see all the time – essentially what should be reported is a 0-10% drag (which is what the initial paper discussed illustrates – as far as I recall one RR variety beat its sister line) – the all important piece there is the plot, which clearly illustrates that on the whole RR plots performed under the control (similar plots are what are used in yield comparisons pitting variety vs variety – and they’re used because they’re what work)
You’ll note Henry’s greenpeace quote above only mentions the average and worst case scenarios – which is clearly indicative of intellectual dishonesty (on GP’s behalf)
The following was stated:
H. Kuska comment. The 10% was not the worse case scenario???? Did you read the cited reference?
Please compare the above quote with what Greenpiece wrote:
No Henry, I read the quote you posted – which clearly states
This is an intellectually dishonest approach – do they mention that in some locations the roundup ready yields beat their sister lines? No, they cite the lowest available data points and the average but fail to mention the highest data points.
I’m not sure what your whole “The 10% was not the worse case scenario???? ” bit was about – That many question marks, to paraphrase Terry Pratchett rather badly, is the sign of a sick sick mind (this is humour before you ask that punitive measures be taken).
Ewan, the Executive Summary that Benbrook gave was presented by me earlier. I will repeat it here with my lead in question as to whether you had read it:
Please note that the Greenpiece article did not select the most favorable 6.7 %, or the next most favorable 6.1 % that Benbrook cited. Instead they went with the overall average:
Benbrook then mentioned that in some areas the yield was 10% or higher. The Greenpiece article just cited what Benbrook stated in the Exective Summary about some areas having a higher yield difference. Greenpiece did give the full reference. The Benbrook summary did not mention that there were some fields where the RR variety outyielded the conventional variety probably because the overall result for each area did not support the individual reversals. In the actual article he devotes a section to this subject:
(Multiple question marks indicate disbelief.)
The Benbrook paper has been legitimately questioned as to selective use of data and mysteriously changing (inflated) numbers … See.here.
I apologize for the broken GP link. The link you give is the one I tried to reference. Regardless of where the 5 or 10% came from, it is ludicrous to hyper-project those values over multiple years and locations to billions of dollars lost. In doing so, they have , as i anticipated, directly ignored the tentative language used by Elmore et al. and Benbrook regarding weed free trials as well as the phrasing “potential” of Elmore et al. you pointed to earlier.
I already gave the link and the conclusion section of the Carpenter 2001 report. I had read the whole report, and when I came to the section about questions concerning the numbers; my immediate thought was why this author did not contact Benbrook for clarification. (If this had been published in a reviewed scientific journal, I expect that the reviewers/editor would of insisted on it.)
It is not ignoring the weed free trials limitation of Elmore that is important as the main data came from Benbrook, it is ignoring the following from Benbrook:
ie. Greenpeace used 1998 data to estimate losses in the 2006-2009 period. They ignored the possibility suggested by the “future breeding enhancements: comment of Benbrook. This is a completely different deficiency than what your original “article” discusses
Comments are closed.