You are currently browsing the monthly archive for July 2010.

There’s a (somewhat) recent article in Inside Higher Ed that discusses new changes being made by the College Board’s College-Level Examination Program (CLEP) to its writing examination.  (Saw this article in my NCTE inbox newsletter a couple weeks ago.)  Although these changes might indeed be improvements, it’s still fundamentally flawed to rely on these kinds of tests for writing course placement.  Unfortunately, many universities have few other choices due to the time and money that other assessment methods require.

It’s interesting to read this article, having now skimmed through much of O’Neill, Huot, and Moore (OM&H)’s chapters on assessment history and theory.  I admit I didn’t read super thoroughly in those two respective chapters, especially as the authors maintain that the history of assessment in particular is more of a complicated web than a straightforward continuum.  However, I’d like to present some summarized concepts on the history and theory of assessment according to OM&H here.

Historicizing Writing Assessment

Their history seems to focus on admissions testing/standardized testing.  This history demonstrates the importance of validity and reliability of test results.

They outline the century-long development of writing tests and standardized testing (particularly, the role of the ETS and CEEB in this history).  Assessment has long been outsourced (e.g., today’s SAT) for various reasons, one of which was the popular impetus to nationalize/standardize college admissions testing.  Multiple choice testing and/or testing is also the easiest way to test a large number of people, cheaply and quickly.

Last, and most interestingly, the CEEB was a way of wresting control from secondary teachers over college admissions, despite evidence that high school performance is the single best indicator of college success:

A test was assumed to be better at helping university admissions personnel make important, consequential decisions about students than judgements of secondary teachers. From the beginning (if the establishment of the CEEB can be called the beginning of writing assessment), teachers would have to struggle to be a part of important decisions made on the basis of the assessment of student writing. (17)

(Other factors behind the formation of the CEEB can be found on p. 15-17 of OM&H).

Problems with testing

The problem that has preoccupied assessment for the past century: reliability and validity.  It has been especially difficult for WPAs to get an accurate view of reliability and validity for a few reasons:

  • the fact that this vocabulary (i.e., “reliability” and “validity” measures) and set of concepts lies outside traditional modes of thought in the humanities (see intro to this volume).
  • the fact that reliability (read: consistency of a test’s measures) was difficult to establish, both in terms of consistency of the overall scores (instrument reliability) and agreement of scoring between independent scorers (interrater reliability): “Without consistency in scoring, students’ scores on their writing would depend on who read the papers rather than who wrote them. Without consistency in scoring, it would be impossible to argue for the validity of decisions based on such scores” (19).
  • the fact that “proof of validity” was established by the actual test adminstrator/manufacturers themselves, and not from neutral third parties.
  • studies of testing validity don’t even appear until about 50 years into assessment scholarship.

Testing through indirect methods is, of course, not ideal when compared with direct methods that look at student texts: “Not only is testing but one response to the need for change in education in general and writing programs in particular, it is also not a very strong one” (32).  Indirect testing tends to focus on usage (mechanics, grammar) instead of on the critical thinking behind actual writing.  And even when tests do contain essays, they tend to be themes instead of essays asking for literary interpretation or questions based on reading background.

Important decisions used to be made by tests alone, but disturbingly, the tests/methods themselves were seldom rigorously questioned.

Validity theory, especially in early literature, tended to be erroneously conflated with reliability.  It was once defined as follows by J. P. Guilford: “In a very general sense, a test is valid for anything with which it correlates” (as qtd. in 30).  In modern-day validity theory, the focus shifts “from the accuracy of a measure to its value” (30). Validity is a separate consideration from reliability and needs more attention in assessment scholarship, even to this day.

In sum: The importance of understanding the history of assessment

Understanding the history of contemporary validity theory provides the WPA and writing teacher with an understanding of the professional standards educational testing professionals are supposed to follow.  This kind of information about the development of validity of allows a WPA or writing teacher to look at a range of statistical data offered as proof of validity and to remind her colleagues that validity is more than a row of statistical correlations. (33)

Considering theory

Authors see theory in a less formalized way, “as basic assumptions and beliefs that inform actions and practices” (36), and as something inextricably tied to practice (cite a number of models, including Louise Phelps’ posited dialectic between theory and practice).

According to Lorrie Shepherd in an article on classroom assessment, education since the early 20th c. has been largely framed in social constructivist assumptions. … “In this framework, assessments should address learning professes as well as products, be formative and ongoing, feedback into learning, elicit higher-order thinking, require self-evaluation, and have explicit criteria and expectations” (38).  Though Shepherd’s considerations pertain to classroom learning, these sorts of considerations should guide large-scale assessment.

Authors spend fair amount of time on theories on language and literacy, arguing that “[w]ithout a sense of how language is learned and how literacy functions, an assessment may not yield information that is accurate, useful, or valid” (38).

As the CCCC has declared (see appendix B), writing is a social act and ideally, assessment “respects language variety and diversity and assesses writing on the basis of effectiveness for readers, acknowledging that as purposes vary, criteria will as well” (qtd. in 43).  Socio-cultural situation of the text is an important consideration in assessment. Literacy backgrounds and soci-cultural factors all contribute to language usage and background; socio-cultural situations also affect how we read these texts (as assessors):  “In administering writing assessments, we need to be sure we think through the way socio-cultural factors can influence not only students’ reading of a task and written response but also our interpretations of the responses.  We draw conclusions about the student based on our intrepration of the response, which may or may not be accurate” (42).

Language and literacy theories such as these are more familiar to English professionals; assemseement theories (educational measurement concepts like validity and reliability), however, are not.  Several theoretical approaches to researching/thinking about/constructing arguments for validity and reliability are discussed here (i.e., work by Sandra Murphy, William L. Smith, Pamela Moss, Cherry & Meyer, Jay Parkes, and the Standards for Educational and Psychological Testing by AMERA/APA/NCME).  A lot is valuable and interesting, but I feel like it’s too much to get into here, in any detail.

What I will say, though, is that some researchers in this field (e.g., Messick) urge assessors to undertake assessment in the spirit of a research project, and move beyond reliability to validity.  Validity theory “involves constructing a sound argument to support the interpretation and use of test scores from both theoretical and empirical evidence” (46).  Evidence for validity arguments should come from test aspects such as test content, response process, internal structure, relationship to external variables, and consequences of testing.

Writing Assessment Theory

Interestingly, “[a]s a field, composition and rhetoric doesn’t yet have a clearly articulated and widely acceptable theoretical position on writing assessment” (53)–which Huots posits is partially due to (a) the divide in values and scholarship between K-12 and college writing instruction, (b) the fact that tests are developed, marketed, and administered by outside (often commercial) testing orgs, (c) the fact that assessment often meets an “immediate need…[which] consumes composition scholars’ and administrators’ efforts with little attention given to articulating theories that will inform and direct the assessments let alone working toward developing them” (55), and (d) district/state/federal testing and curriculum mandates.

Because of this, assessment is not well defined in the field and research is broadly drawn /  inter-disciplinary by nature.  The field would benefit from more unified approaches.

In closing this segment, the authors present Huot’s basic principles of assessment:

  • Site-based (specific; needs based)
  • Locally controlled (managed by local institution)
  • Context-sensitive (accounts for and adapts models for the local context)
  • Rhetorically based (audience, purpose, context)
  • Accessible (transparent)
  • Theoretically consistent (w/ literacy and language scholarship and w/ recent assessment scholarship)

Here the authors briefly outline several suggested methods for conducting program assessment.  (Ah, the nitty gritty!)

I think that the information I will summarize below is helpful in and of itself.  However in addition, I hope to use this basic outline of methods as a base starting point for investigating published models (I mentioned finding two published articles on program assessment from WSU; I’ve also seen their assessment referred to in this chapter).

Some terms

  • direct measure = students being assessed produce writing/text, which is then evaluated
  • indirect measure = students are assessed without producing writing–multiple choice, fill-in-the-blank, etc.
  • qualitative vs. quantitative methods = something that the authors mention (“depending upon the degree to which they acknowledge contextual influences” [117]) but that is not very clearly explained.  There is an interesting but short discussion on some of the tensions that can arise from assessment methods (as higher-ups and others in the university are often looking for more quantitative, cause-and-effect kinds of measures while a WPA is more in a position to make qualititative kinds of assessments–it’s difficult to show definitively that any kind of program demonstrates clear improvements in student writing). But this is something in the theory behind assessment that I think I definitely need to investigate further.

Some advice from the authors

  • More in-depth information on the assessment methods they outline is available in research guides such as MacNealy’s Strategies for Empirical Research (1999).
  • It can be helpful to contact others on campus who have experience with empirical research–not only other faculty, but also administrator’s in research-oriented offices.

Method 1: Surveys

  • Sometimes surveys already in use (e.g., teacher evals, first-year student surveys) can be employed; other times, it is helpful to design surveys specifically for the assessment
  • Consider modifying for audience (slightly different questions for students than for teachers)
  • Keeping surveys brief helps improve response rate (10 questions max)
  • Anonymity also improves response rate

Method 2: Interviews

  • More flexible and can allow for more in-depth responses, as it presents opportunity for dialogue and follow-up questions
  • Easing anxiety:  The authors feel that a third-party can be best for conducting interviews, as “[a] formal interview can often be intimidating–especially for contingent faculty” (120).  Aggregating results is also recommended.
  • Focus-group interviews can help save time (see example in appendices – p. 191)

Method 3: Teaching Materials

  • Administrators might collect one kind of document or several (combined into a course portfolio): syllabi, classroom activities, assignment sheets, miscellaneous handouts, course readings.  Can be analyzed on their own or together.
  • Portfolios: instructors should assemble using all documents used during the period being assessed (the quarter, year, etc.), collect any observations and possibly some student samples, and should also provide a reflection that, “if it is to be useful, it should address questions or issues important to the assessment.  If the connection between course content and program learning outcomes is a concern, for example, then instructors can be asked to reflect on the ways that their course supported these outcomes” (121).
  • Administrators, if examining student samples, might consider systematically randomizing selections (e.g., all instructors should provide samples from students numbered 2, 4, 6, and 8 on their roster).
  • Reducing the obvious time-intensive nature: random sampling can be used.  OR even better, administrators might “identify one key issue to focus on during the analysis” (122).  For instance, in order to keep evaluators focused when reading course portfolios, the authors created a reading guide sheet that outlined the portfolio reading process they should follow for each sample and a questionnaire with focused questions (see appendix, pp. 184-185).
  • Easing portfolio process for instructors: informing instructors about course portfolios at the beginning of the term and encouraging they compile as they go eases the time burden.  Anonymity can also ease fears that individual faculty will be evaluated.

Method 4: Student Writing Samples

Method 5: Teaching Observations

It’s interesting trying to find literature on program assessment—is this an area that’s under-researched?  Most literature seems to cover classroom evaluation.  A few keyword searches have mostly turned up K-12 case studies, though I will be reading through a few to see if anything can transfer.  Most promising are two articles published in Assessing Writing that study the writing program at Washington State University, as well as some starting questions/considerations I came across from the WPA council.

In the meantime, there is an excellent (and longish) chapter on program assessment in O’Neill, Moore and Huot.  This book has been almost overwhelmingly helpful, and I suspect it will be the most on-target research I will find this summer.  I like the way they differentiate between classroom and program assessment:

Program assessment differs from other types of writing assessments because the focus is not on individual student performance but on collective achievement.  So while a program assessment might include evaluation of student writing as a data-gathering method, it requires that the writing be considered in terms of what it says about student learning generally and how that learning is supported by curricula, instruction, and instructional materials. (109)

When we look at a writing program, they say, we are looking at the entire “learning context” and evaluating how all parts interact—what aspects are working, how are they working, and why? (109)

Now then…

As these authors advocated so strongly for a contextualized, historically-driven and theoretically-informed approach in their introduction, it’s no surprise that they suggest first reflecting on purpose (how will this assessment be used?) and the program itself.

Starting questions/considerations (from O’Neill, Moore and Huot 110-113)

  1. How will we use the results of this assessment?
  2. How do we define our program?  What elements of our program are we assessing?  (For instance, are we assessing FYC courses, or extending beyond into upper-level courses or developmental courses?)
  3. What is it that we want to know?  (What’s currently happening?  Is it what we expected to see?  What in the program seems to be working or not working?)
  4. What information do we already have?  (Do we already have student demographics that will prove useful?  Standardized teaching observation reports?)

Much of the third consideration above seems to be tied closely with program outcomes.  (I’m thinking that it’s possible that our own program at UC, which has some pretty clear outcomes established for FYC and intermediate composition courses, has already done a lot of the legwork in examining this.)

Coming up next: Designing the assessment itself (“Matching methods to Guiding Questions”)

Also, found a great website from the WPA: http://www.wpacouncil.org/assessment-gallery

In their introduction to A Guide to College Writing Assessment, O’Neill, Moore, and Huot argue that an informed knowledge of the history of and theories behind assessment—being aware of “the connections among assessment practice, research, and theory” as well as “the assumptions informing [our] assessment practice[s]” (8)—is vital to successfully implementing assessment within a department, program, or institution.  In their view, the ability to implement practices in meaningful ways hinges on knowledge of the assumptions that inform these practices

Meaningful assessment, they inform us, is possible when well contextualized by administration.  It then allows us to negotiate some of the considerations I raised at the end of my last post (what will this assessment do for learning within our curriculum, how will it affect or teaching, what will it mean for our students, and so on).  It can also be integral to enriching and informing our own professional lives, as scholars and teachers.

This text is both theoretical and practical; the first three chapters are devoted to historicizing, theorizing, and (locally) contextualizing assessment, respectively.  Then the authors discuss practical applications and more specific methods. Because of the prevalence of material on classroom evaluation, the authors chose to explore the following areas:

– Placement evaluation
– Exit examination
– Programmatic assessment
– Faculty evaluation

Practical discussions are, in the end, what will serve most useful in this research project that I’ve undertaken.  I’ll follow up in my next few posts by examining some of the methods discussed by the text, as well as thinking about adapting some of the thoughts on contextualization.  However, I’ll also be trying to digest some of the theory that accompanies the practical discussions (so it may appear in some future posts).

More on the challenges facing large-scale assessment

Before moving on to a more detailed discussion of content, I’d like to briefly devote more time to some of the challenges facing large-scale assessment.

In their introduction, the authors present challenges that further support their call for  understanding the theories and research that inform our knowledge of and assumptions about assessment.  According O’Neill et al., one of the great challenges for large-scale assessment is unfamiliarity with assessment literature, theory, and current practices; opportunities for participation in (or study of) this field is relatively rare in graduate programs in rhet comp.  They point out that it may seem strange or intimidating to negotiate literature about assessment, as it’s often written from a social science perspective (referencing “measuring” and validating” data) that is foreign to the liberal-arts philosophy of most WPAs and rhet comp specialists.  Much of the literature makes little (if any) reference to the theories and research supporting models from other institutions.

Unfamiliarity with assessment (or poor assessment practices), they say, can lead to tensions within a department or institution.  It can often seem imposed, tedious, and unmeaningful, and can even be perceived as controlling or threatening, especially when purposes, audiences, and implications are not made transparent.  Yet it is vital to more effective teaching and learning within departments and programs.

Further, the authors “have discovered that even those faculty and administrators who do recognize the importance of assessment, are willing to do it, and know the basics of large-scale assessment often have trouble translating their understanding and knowledge into assessments that work” (O’Neill et al. 8).  Thus, they reinforce their philosophy of recognizing the theories and history that guides practice. 

Michael R. Neal and his review of five new books on assessment, including A Guide to College Writing Assessment by Peggy O’Neill, Cindy Moore, and Brian Huot (Logan: Utah State University Press, 2009).

So…why assessment?

As Michael R. Neal writes in a recent CCC review essay, while assessment is unavoidable in (and vital to) our field of rhetoric and composition, its benefits to teaching and learning are not always immediately clear.  This is especially the case with programmatic and top-down assessments like teaching assessments, admissions testing, placement exams, accreditation, etc.; these are “legitimate assessment situations we face that are often detached, or at least more distant, from classroom instruction and thus it is more difficult to determine if and how they relate to learning” (Neal 747).

Though it sometimes seems nebulous in a large-scale context, Neal (and other authors I am investigating) argue that assessment can and should play a vital role in learning.  It is a powerful tool for WPAs, as it has the potential to powerfully affect  teaching and learning and to encourage the development of certain writing skills and types of writing.  Assessments “have the power to influence curriculum and pedagogy, to categorize teachers and writers, and, ultimately, to define ‘good writing,'” and a WPA who embraces programmatic assessment has the ability “to affect, even ‘transform,’ teaching and learning across the university community” (O’Neill, Moore and Huot 2).

Learning and assessing

Though its connection to learning makes assessment a valuable tool to explore, Neal also points out that not all assessments are intended to foster learning.  Take, for example, standardized testing and placement exams, which function to rank and compare students against one another.  He also reminds us of some of the faulty assumptions regarding assessment and its relationship to learning:

Assessments influence what is learned or how knowledge is measured, but they do not necessarily result in learning. By mandating top-down assessments for accountability, governing bodies often incorrectly assume that learning—and a type of learning that is valuable—will be a direct consequence of more testing. Especially in light of these assumptions that drive much of our educational policy, we need to define the relationship between assessment and learning so we do not conflate them and implicitly enable this educational rhetoric, one that has had such deleterious effects on the teaching of writing. (Neal 749; emphasis added)

It is important, therefore, to carefully consider our perspective on this assessment–learning relationship.  For O’Neill, Moore, and Huot, learning from assessment is framed in terms of “what educators need to know about students, their writing, their teachers, or the programs, schools, or institutions where writing is taking place” (Neal 755).  The difficulty is in using assessment honestly—O’Neill et al. discuss the fine line that lies between critically investigating the beliefs that guide a program and using assessment to support these theories on language and learning, based on the “assumption that…[they] are fine as they are” (Neal 755).

O’Neill et al. is, according to Neal, an incredibly comprehensive resource for WPAs interested in inquiry-based methods and data collection.  It also emphasizes the importance of implementing this data for supporting learning—not simply creating a report, but using the data/analysis as a basis for follow up on elements like teaching, placement, etc.

Starting questions

So, from my forays into Neal and O’Neill’s introduction, I’m realizing that in order to begin to think about assessment, we need to first consider questions such as:

  • Exigency: What is motivating assessment at this particular moment? (O’Neill et al. 11)
  • Purpose: What is (are) the ultimate purpose(s) in terms of teaching and learning? (O’Neill et al. 11)
  • More specifically on purpose, How do we see large-scale assessment affecting (and contributing to) learning within our program?
  • How will our data inform our courses?  How will instructors participate in assessment and make use of assessment results? (O’Neill 12)
  • What makes “good writing”?  How do we define “good writing”?
  • Who are are students?  What writing experiences and abilities do they have, and how do we know? (O’Neill 12)
  • What data-collection and assessment methods could be advantageous (or conversely, problematic) for our particular program?
  • EDIT (7/9/2010): More suggested questions and considerations for beginning a program assessment are found in Appendix I (p. 186) of O’Neill, Huot and Moore.

In short, Neal’s emphasis on the assessment–learning relationship is critical to developing an early idea of what we hope to ultimately achieve through assessing a writing program.  And O’Neill et al. make a compelling case for considering the rhetorical context when considering assessment models—for assessment is most effective when locally tailored.

Creative Commons License

Creative Commons License
Investigating Writing Program Assessment by Christina M. LaVecchia is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Permissions beyond the scope of this license may be available at https://ucwpassessment.wordpress.com/contact/ .