The "Class Level Information" table lists the levels of the variables specified in the CLASS statement and the ordering of the levels (Output 38.1.2). The "Number of Observations" table displays the number of observations read and used in the analysis.
Output 38.1.2 Class Level Information and Number of Observations
The "Dimensions" table lists the size of relevant matrices (Output 38.1.3).
Output 38.1.3 Model Dimensions Information in GLM Analysis
Because of the absence of G-side random effects in this model, there are no columns in the matrix. The 21 columns in the matrix comprise the intercept, 4 columns for the block effect and 16 columns for the entry effect. Because no RANDOM statement with a SUBJECT= option was specified, the GLIMMIX procedure does not process the data by subjects (see the section Processing by Subjects for details about subject processing).
The "Optimization Information" table provides information about the methods and size of the optimization problem (Output 38.1.4).
Output 38.1.4 Optimization Information in GLM Analysis
With few exceptions, models fit with the GLIMMIX procedure require numerical methods for parameter estimation. The default optimization method for (overdispersed) GLM models is the Newton-Raphson algorithm. In this example, the optimization involves 19 parameters, corresponding to the number of linearly independent columns of the matrix.
The "Iteration History" table shows that the procedure converged after 3 iterations and 13 function evaluations (Output 38.1.5). The Change column measures the change in the objective function between iterations; however, this is not the monitored convergence criterion. The GLIMMIX procedure monitors several features simultaneously to determine whether to stop an optimization.
Output 38.1.5 Iteration History in GLM Analysis
The "Fit Statistics" table lists information about the fitted model (Output 38.1.6). The 2 Log Likelihood values are useful for comparing nested models, and the information criteria AIC, AICC, BIC, CAIC, and HQIC are useful for comparing nonnested models. On average, the ratio between the Pearson statistic and its degrees of freedom should equal one in GLMs. Values larger than one indicate overdispersion. With a ratio of 2.37, these data appear to exhibit more dispersion than expected under a binomial model with block and varietal effects.
Output 38.1.6 Fit Statistics in GLM Analysis
The "Parameter Estimates" table displays the maximum likelihood estimates (Estimate), standard errors, and t tests for the hypothesis that the estimate is zero (Output 38.1.7).
Output 38.1.7 Parameter Estimates in GLM Analysis
The "Type III Tests of Fixed Effect" table displays significance tests for the two fixed effects in the model (Output 38.1.8).
Output 38.1.8 Type III Tests of Block and Entry Effects in GLM Analysis
These tests are Wald-type tests, not likelihood ratio tests. The entry effect is clearly significant in this model with a -value of <0.0001, indicating that the 16 wheat varieties are not equally susceptible to infestation by the Hessian fly.
Example 38.1 Binomial Counts in Randomized Blocks
In the context of spatial prediction in generalized linear models, Gotway and Stroup (1997) analyze data from an agronomic field trial. Researchers studied 16 varieties (entries) of wheat for their resistance to infestation by the Hessian fly. They arranged the varieties in a randomized complete block design on an grid. Each quadrant of that arrangement constitutes a block.
The outcome of interest was the number of damaged plants () out of the total number of plants growing on the unit (). The two subscripts identify the block () and the entry (). The following SAS statements create the data set. The variables lat and lng denote the coordinate of an experimental unit on the grid.
Analysis as a GLM
If infestations are independent among experimental units, and all plants within a unit have the same propensity for infestation, then the are binomial random variables. The first model considered is a standard generalized linear model for independent binomial counts:
The PROC GLIMMIX statement invokes the procedure. The CLASS statement instructs the GLIMMIX procedure to treat both block and entry as classification variables. The MODEL statement specifies the response variable and the fixed effects in the model. PROC GLIMMIX constructs the matrix of the model from the terms on the right side of the MODEL statement. The GLIMMIX procedure supports two kinds of syntax for the response variable. This example uses the events/trials syntax. The variable y represents the number of successes (events) out of n Bernoulli trials. When the events/trials syntax is used, the GLIMMIX procedure automatically selects the binomial distribution as the response distribution. Once the distribution is determined, the procedure selects the link function for the model. The default link for binomial data is the logit link. The preceding statements are thus equivalent to the following statements:
The SOLUTION option in the MODEL statement requests that solutions for the fixed effects (parameter estimates) be displayed.
The "Model Information" table describes the model and methods used in fitting the statistical model (Output 38.1.1).
The GLIMMIX procedure recognizes that this is a model for uncorrelated data (variance matrix is diagonal) and that parameters can be estimated by maximum likelihood. The default degrees-of-freedom method to denominator degrees of freedom for F tests and t tests is the RESIDUAL method. This corresponds to choosing as the degrees of freedom, where is the sum of the frequencies used in the analysis. You can change the degrees of freedom method with the DDFM= option in the MODEL statement.
Output 38.1.1 Model Information in GLM Analysis
Analysis with Random Block Effects
There are several possible reasons for the overdispersion noted in Output 38.1.6 (Pearson ratio = ). The data might not follow a binomial distribution, one or more important effects might not have been accounted for in the model, or the data might be positively correlated. If important fixed effects have been omitted, then you might need to consider adding them to the model. Because this is a designed experiment, it is reasonable not to expect further effects apart from the block and entry effects that represent the treatment and error control design structure. The reasons for the overdispersion must lie elsewhere.
If overdispersion stems from correlations among the observations, then the model should be appropriately adjusted. The correlation can have multiple sources. First, it might not be the case that the plants within an experimental unit responded independently. If the probability of infestation of a particular plant is altered by the infestation of a neighboring plant within the same unit, the infestation counts are not binomial and a different probability model should be used. A second possible source of correlations is the lack of independence of experimental units. Even if treatments were assigned to units at random, they might not respond independently. Shared spatial soil effects, for example, can be the underlying factor. The following analyses take these spatial effects into account.
First, assume that the environmental effects operate at the scale of the blocks. By making the block effects random, the marginal responses will be correlated due to the fact that observations within a block share the same random effects. Observations from different blocks will remain uncorrelated, in the spirit of separate randomizations among the blocks. The next set of statements fits a generalized linear mixed model (GLMM) with random block effects:
Because the conditional distribution—conditional on the block effects—is binomial, the marginal distribution will be overdispersed relative to the binomial distribution. In contrast to adding a multiplicative scale parameter to the variance function, treating the block effects as random changes the estimates compared to a model with fixed block effects.
In the presence of random effects and a conditional binomial distribution, PROC GLIMMIX does not use maximum likelihood for estimation. Instead, the GLIMMIX procedure applies a restricted (residual) pseudo-likelihood algorithm (Output 38.1.9). The "restricted" attribute derives from the same rationale by which restricted (residual) maximum likelihood methods for linear mixed models attain their name; the likelihood equations are adjusted for the presence of fixed effects in the model to reduce bias in covariance parameter estimates.
Output 38.1.9 Model Information in GLMM Analysis
The "Class Level Information" and "Number of Observations" tables are as before (Output 38.1.10).
Output 38.1.10 Class Level Information and Number of Observations
The "Dimensions" table indicates that there is a single G-side parameter, the variance of the random block effect (Output 38.1.11). The "Dimensions" table has changed from the previous model (compare Output 38.1.11 to Output 38.1.3). Note that although the block effect has four levels, only a single variance component is estimated. The matrix has four columns, however, corresponding to the four levels of the block effect. Because no SUBJECT= option is used in the RANDOM statement, the GLIMMIX procedure treats these data as having arisen from a single subject with 64 observations.
Output 38.1.11 Model Dimensions Information in GLMM Analysis
The "Optimization Information" table indicates that a quasi-Newton method is used to solve the optimization problem. This is the default optimization method for GLMM models (Output 38.1.12).
Output 38.1.12 Optimization Information in GLMM Analysis
In contrast to the Newton-Raphson method, the quasi-Newton method does not require second derivatives. Because the covariance parameters are not unbounded in this example, the procedure enforces a lower boundary constraint (zero) for the variance of the block effect, and the optimization method is changed to a dual quasi-Newton method. The fixed effects are profiled from the likelihood equations in this model. The resulting optimization problem involves only the covariance parameters.
The "Iteration History" table appears to indicate that the procedure converged after four iterations (Output 38.1.13). Notice, however, that this table has changed slightly from the previous analysis (see Output 38.1.5). The Evaluations column has been replaced by the Subiterations column, because the GLIMMIX procedure applied a doubly iterative fitting algorithm. The entire process consisted of five optimizations, each of which was iterative. The initial optimization required four iterations, the next one required three iterations, and so on.
Output 38.1.13 Iteration History in GLMM Analysis
The "Fit Statistics" table shows information about the fit of the GLMM (Output 38.1.14). The log likelihood reported in the table is not the residual log likelihood of the data. It is the residual log likelihood for an approximated model. The generalized chi-square statistic measures the residual sum of squares in the final model, and the ratio with its degrees of freedom is a measure of variability of the observation about the mean model.
Output 38.1.14 Fit Statistics in GLMM Analysis
The variance of the random block effects is rather small (Output 38.1.15).
Output 38.1.15 Estimated Covariance Parameters and Approximate Standard Errors
If the environmental effects operate on a spatial scale smaller than the block size, the random block model does not provide a suitable adjustment. From the coarse layout of the experimental area, it is not surprising that random block effects alone do not account for the overdispersion in the data. Adding a random component to a generalized linear model is different from adding a multiplicative overdispersion component, for example, via the PSCALE option in PROC GENMOD or a
statement in PROC GLIMMIX. Such overdispersion components do not affect the parameter estimates, only their standard errors. A genuine random effect, on the other hand, affects both the parameter estimates and their standard errors (compare Output 38.1.16 to Output 38.1.7).
Output 38.1.16 Parameter Estimates for Fixed Effects in GLMM Analysis
Output 38.1.17 Type III Test of Entry in GLMM Analysis
Because the block variance component is small, the Type III test for the variety effect in Output 38.1.17 is affected only very little compared to the GLM (Output 38.1.8).
Analysis with Smooth Spatial Trends
You can also consider these data in an observational sense, where the covariation of the observations is subject to modeling. Rather than deriving model components from the experimental design alone, environmental effects can be modeled by adjusting the mean and/or correlation structure. Gotway and Stroup (1997) and Schabenberger and Pierce (2002) supplant the coarse block effects with smooth-scale spatial components.
The model considered by Gotway and Stroup (1997) is a marginal model in that the correlation structure is modeled through residual-side (R-side) random components. This exponential covariance model is fit with the following statements:
Note that the block effects have been removed from the statements. The keyword _RESIDUAL_ in the RANDOM statement instructs the GLIMMIX procedure to model the matrix. Here, is to be modeled as an exponential covariance structure matrix. The SUBJECT=INTERCEPT option means that all observations are considered correlated. Because the random effects are residual-type (R-side) effects, there are no columns in the matrix for this model (Output 38.1.18).
Output 38.1.18 Model Dimension Information in Marginal Spatial Analysis
In addition to the fixed effects, the GLIMMIX procedure now profiles one of the covariance parameters, the variance of the exponential covariance model (Output 38.1.19). This reduces the size of the optimization problem. Only a single parameter is part of the optimization, the "range" (SP(EXP)) of the spatial process.
Output 38.1.19 Optimization Information in Spatial Analysis
The practical range of a spatial process is that distance at which the correlation between data points has decreased to at most 0.05. The parameter reported by the GLIMMIX procedure as SP(EXP) in Output 38.1.20 corresponds to one-third of the practical range. The practical range in this process is . Correlations extend beyond a single experimental unit, but they do not appear to exist on the scale of the block size.
Output 38.1.20 Estimates of Covariance Parameters
The sill of the spatial process, the variance of the underlying residual effect, is estimated as 2.5315.
Output 38.1.21 Type III Test of Entry Effect in Spatial Analysis
The F value for the entry effect has been sharply reduced compared to the previous analyses. The smooth spatial variation accounts for some of the variation among the varieties (Output 38.1.21).
In this example three models were considered for the analysis of a randomized block design with binomial outcomes. If data are correlated, a standard generalized linear model often will indicate overdispersion relative to the binomial distribution. Two courses of action are considered in this example to address this overdispersion. First, the inclusion of G-side random effects models the correlation indirectly; it is induced through the sharing of random effects among responses from the same block. Second, the R-side spatial covariance structure models covariation directly. In generalized linear (mixed) models these two modeling approaches can lead to different inferences, because the models have different interpretation. The random block effects are modeled on the linked (logit) scale, and the spatial effects were modeled on the mean scale. Only in a linear mixed model are the two scales identical.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
This entry is co-authored by Aaron S. Richmond, Ph.D., who is a faculty member at Metropolitan State College of Denver and a nationally known educator. A few weeks ago he told me about an exercise he did in one of his classes that had a somewhat unexpected outcome. Aaron and I invite you to consider the ethical, social, and pedagogical implications:
A few weeks back, after watching some particularly bad student videos on YouTube, I (Aaron) decided to have my own intro psychology students try some! I created an optional classroom assignment in which I asked students to film a 1-2 minute video illustrating Bandura's Observational Learning Theory. To my amazement, over 90% of my students submitted videos! They approached the assignment earnestly, creatively, and sincerely. Honestly, it was one of the most enjoyable assignments to grade I've ever given. I spent more time than I usually do grading a set of papers--watching their YouTube posts, sometimes laughing to the point of crying (e.g., http://www.youtube.com/watch?v=DfV654wy5O0; http://www.youtube.com/watch?v=pE1moIJAXmo)
As I reflected on the assignment, it struck me that about 20% of my students made their videos in public where they "tricked" innocent bystanders into mimicking modeled behavior. For example, one group conducted the "invisible rope experiment." The three students were located on a major campus thoroughfare. One student was located on each side of the walkway. They pretended to be pulling a rope while the third student pretended to high-step over the rope. Subsequently, people mimicked the third student's behavior by high-stepping over the invisible rope.
It made me feel uneasy that my students were tricking unsuspecting people. I shared my story with Mitch, and we started to explore.
Our first question: Where did students get this tendency toward trickery? One possibility: Psychology itself has a long history of using deception in research. The most famous example is the Milgram "obedience" studies, in which unsuspecting participants gave what they thought were painful shocks to another person. These experiments were very controversial, and some psychologists (e.g., Baumrind, 1964) argued that the deception involved was not worth the new knowledge that Milgram provided about our willingness to inflict pain just because we're told to by an authority.
In some quarters psychologists have the reputation of saying anything to anybody just to see the reaction. But psychologists argue that they do not deceive people willy-nilly (did we really use that word?). The APA Ethics Code says that deception should only be used as a last resort, when there's no other way to test an important hypothesis. The Code also says that psychologists should undo the deception after the experiment is over-this is called "debriefing," during which each participant is told the true purposes of the study and given a chance to have their questions answered. When done with respect and with sensitivity to participants' reactions, the ends of deception-new knowledge-can justify the means.
The long tradition of psychological research may have been one factor in students' use of trickery. The trends and influences might be broader. As we discussed the videos more, we were reminded of the ABC News "What Would You Do?" segments in which John Quiñones intentionally and with premeditation put unsuspecting citizens into morally compromising dilemmas.
Deception and lying on TV has its own long traditions. Some of us are old enough to remember the original "Candid Camera," Joe Isuzu, and that show where potential brides were told that some bachelor was a millionaire.
Our next question: Did the students do unethical things by lying to passers-by just to prove a point, get a grade, or pass the time? And was it unethical to make the assignment without explicitly prohibiting the deceptive videos? If we don't think there were ethical violations (and we don't), we could still take a "positive ethics" approach and ask whether the instructor's or the students' behavior was ethically competent or ethically excellent. Consider these factors:
1. How bad was the deception? How potentially embarrassing (or in other ways harmful) is the situation in which we put research participants, or innocent bystanders? Research participants and people on the street might feel exploited when they are deceived.
Another form of harm comes from potential violations of confidentiality. People's privacy may be violated by being tricked into revealing parts of themselves they would not have done if told the truth.
2. Should we undue the deception? Do we owe participants in our studies, or videos, as explanation of what happened to them? And how do we determine the explanation itself might make people feel worse? A general issue: Can we lie to people to spare them suffering that we caused ourselves? At what point are we just trying to hide our own behavior under the guise of preventing bad feelings?
3. How important is the outcome? What are the goals or motivations involved? It could be argued that the primary motivation of ABC News is not the application of scientific methods to create reliable new knowledge about human behavior. Rather, the primary motivation may be simple pandering to the voyeurism of the American Public in an attempt to garner higher ratings and advertising revenue. Is the purpose of Quiñones' show important enough to justify the lying?
As psychologists we can argue that the "purer" motivation to do science justifies deception more than the profit motive. Might some students have been motivated more by the fun of deception rather than (or in addition to) trying to illustrate a theory? What if students felt that deceptive or tricky videos might get higher grades? If indeed we did (unintentionally, of course) give higher grades to deceptive videos because they struck us as more "creative," is that a value we want to teach our students?
4. How disrespectful is the deception? This is a bit fuzzier than the other concepts. Some folks simply don't like to be lied to or deceived even if no harm is done.
The bottom line: When does deception cross the line from a useful tool intended primarily to create knowledge to an inappropriate, disrespectful exploitation of fellow human beings for the primary purposes of making money, entertaining, earning higher course grades, getting tenure, or achieving other goals? Maybe there's no ethical dilemma, but we believe that incorporating these ethical factors into our decisions will help us become better teachers.
Baumrind. D. (1964). Some thoughts on ethics of research: After reading Milgram's "Behavioral Study of Obedience." American Psychologist, 19, 421-423.
Mitch Handelsman is a professor of psychology at the University of Colorado Denver and the co-author (with Sharon Anderson) of Ethics for Psychotherapists and Counselors: A Proactive Approach(Wiley-Blackwell, 2010). He is also an associate editor of the two-volume APA Handbook of Ethics in Psychology(American Psychological Association, 2012).
© 2011 Mitchell M. Handelsman. All Rights Reserved