Educational assessment or educational evaluation is the systematic process of documenting and using empirical data on the

knowledge Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often defined as true belief that is distin ...

skill A skill is the learned ability to act with determined results with good execution often within a given amount of time, energy, or both. Skills can often be divided into domain-general and domain-specific skills. For example, in the domain of w ...

, attitudes,

aptitude An aptitude is a component of a competence to do a certain kind of work at a certain level. Outstanding aptitude can be considered "talent". Aptitude is inborn potential to perform certain kinds of activities, whether physical or mental, and ...

and

belief A belief is an attitude that something is the case, or that some proposition is true. In epistemology, philosophers use the term "belief" to refer to attitudes about the world which can be either true or false. To believe something is to tak ...

s to refine programs and improve student learning. Assessment data can be obtained from directly examining student work to assess the achievement of learning outcomes or can be based on data from which one can make inferences about learning. Assessment is often used interchangeably with test, but not limited to tests. Assessment can focus on the individual learner, the learning community (class, workshop, or other organized group of learners), a course, an academic program, the institution, or the educational system as a whole (also known as granularity). The word 'assessment' came into use in an educational context after the Second World War. As a continuous process, assessment establishes measurable and clear student learning outcomes for learning, providing a sufficient amount of learning opportunities to achieve these outcomes, implementing a systematic way of gathering, analyzing and interpreting evidence to determine how well student learning matches expectations, and using the collected information to inform improvement in student learning. Assessment is an important aspect of educational process which determines the level of accomplishments of students. The final purpose of assessment practices in education depends on the ''theoretical framework'' of the practitioners and researchers, their assumptions and beliefs about the nature of human mind, the origin of knowledge, and the process of learning.

Types

The term ''assessment'' is generally used to refer to all activities teachers use to help students learn and to gauge student progress.Black, Paul, & William, Dylan (October 1998). "Inside the Black Box: Raising Standards Through Classroom Assessment."Phi Beta Kappan. Available at http://www.pdkmembers.org/members_online/members/orders.asp?action=results&t=A&desc=Inside+the+Black+Box%3A+Raising+Standards+Through+Classroom+Assessment&text=&lname_1=&fname_1=&lname_2=&fname_2=&kw_1=&kw_2=&kw_3=&kw_4=&mn1=&yr1=&mn2=&yr2=&c1= PDKintl.org]. Retrieved January 28, 2009. Assessment can be divided for the sake of convenience using the following categorizations: # Placement, formative, Summative assessment, summative and diagnostic assessment # Objective and subjective # Referencing (criterion-referenced, norm-referenced, and ipsative (forced-choice)) # Informal and formal # Internal and external

Placement, formative, summative and diagnostic

Assessment is often divided into initial, formative, and summative categories for the purpose of considering different objectives for assessment practices. * Placement assessment – Placement evaluation is used to place students according to prior achievement or personal characteristics, at the most appropriate point in an instructional sequence, in a unique instructional strategy, or with a suitable teacher conducted through placement testing, i.e. the tests that colleges and universities use to assess college readiness and place students into their initial classes. Placement evaluation also referred to as pre-assessment or initial assessment, is conducted prior to instruction or intervention to establish a baseline from which individual student growth can be measured. This type of assessment is used to know what the student's skill level is about the subject. It helps the teacher to explain the material more efficiently. These assessments are not graded. * Formative assessment – Formative assessment is generally carried out throughout a course or project. Formative assessment also referred to as "educative assessment," is used to aid learning. In an educational setting, a formative assessment might be a teacher (or

peer Peer may refer to: Sociology * Peer, an equal in age, education or social class; see Peer group * Peer, a member of the peerage; related to the term "peer of the realm" Computing * Peer, one of several functional units in the same layer of a ne ...

) or the learner, providing feedback on a student's work and would not necessarily be used for grading purposes. Formative assessments can take the form of diagnostic, standardized tests, quizzes, oral questions, or draft work. Formative assessments are carried out concurrently with instructions. The result may count. The formative assessments aim to see if the students understand the instruction before doing a summative assessment. * Summative assessment – Summative assessment is generally carried out at the end of a course or project. In an educational setting, summative assessments are typically used to assign students a course grade. Summative assessments are evaluative. Summative assessments are made to summarize what the students have learned, to determine whether they understand the subject matter well. This type of assessment is typically graded (e.g. pass/fail, 0-100) and can take the form of tests, exams or projects. Summative assessments are often used to determine whether a student has passed or failed a class. A criticism of summative assessments is that they are reductive, and learners discover how well they have acquired knowledge too late for it to be of use. * Diagnostic assessment – Diagnostic assessment deals with the whole difficulties at the end that occurs during the learning process. Jay McTighe and Ken O'Connor proposed seven practices to effective learning. One of them is about showing the criteria of the evaluation before the test. Another is about the importance of pre-assessment to know what the skill levels of a student are before giving instructions. Giving a lot of feedback and encouraging are other practices. Educational researcher Robert Stake explains the difference between formative and summative assessment with the following analogy: Summative and formative assessment are often referred to in a learning context as ''assessment of learning'' and ''assessment for learning'' respectively. Assessment of learning is generally summative in nature and intended to measure learning outcomes and report those outcomes to students, parents and administrators. Assessment of learning generally occurs at the conclusion of a class, course, semester or academic year. Assessment for learning is generally formative in nature and is used by teachers to consider approaches to teaching and next steps for individual learners and the class.Earl, Lorna (2003). Assessment as Learning: Using Classroom Assessment to Maximise Student Learning. Thousand Oaks, CA, Corwin Press. A common form of formative assessment is ''diagnostic assessment''. Diagnostic assessment measures a student's current knowledge and skills for the purpose of identifying a suitable program of learning. ''Self-assessment'' is a form of diagnostic assessment which involves students assessing themselves. ''Forward-looking assessment'' asks those being assessed to consider themselves in hypothetical future situations.Reed, Daniel. "Diagnostic Assessment in Language Teaching and Learning." Center for Language Education and Research, available a
Google.com
. Retrieved January 28, 2009. ''Performance-based assessment'' is similar to summative assessment, as it focuses on achievement. It is often aligned with the standards-based education reform and

outcomes-based education Outcome-based education or outcomes-based education (OBE) is an educational theory that bases each part of an educational system around goals (outcomes). By the end of the educational experience, each student should have achieved the goal. There ...

movement. Though ideally, they are significantly different from a traditional multiple choice test, they are most commonly associated with

standards-based assessment In an educational setting, standards-based assessment is assessment that relies on the evaluation of student understanding with respect to agreed-upon standards, also known as "outcomes". The standards set the criteria for the successful demonstr ...

which use free-form responses to standard questions scored by human scorers on a standards-based scale, meeting, falling below or exceeding a performance standard rather than being ranked on a curve. A well-defined task is identified and students are asked to create, produce or do something, often in settings that involve real-world application of knowledge and skills. Proficiency is demonstrated by providing an extended response. Performance formats are further differentiated into products and performances. The performance may result in a product, such as a painting, portfolio, paper or exhibition, or it may consist of a performance, such as a speech, athletic skill, musical recital or reading.

Objective and subjective

Assessment (either summative or formative) is often categorized as either objective or subjective. Objective assessment is a form of questioning which has a single correct answer. Subjective assessment is a form of questioning which may have more than one correct answer (or more than one way of expressing the correct answer). There are various types of objective and subjective questions. Objective question types include true/false answers,

multiple choice Multiple choice (MC), objective response or MCQ (for multiple choice question) is a form of an objective assessment in which respondents are asked to select only correct answers from the choices offered as a list. The multiple choice format is mo ...

, multiple-response and matching questions. Subjective questions include extended-response questions and essays. Objective assessment is well suited to the increasingly popular computerized or

online assessment Electronic assessment, also known as digital assessment, e-assessment, online assessment or computer-based assessment, is the use of information technology in assessment such as educational assessment, health assessment, psychiatric assessment, a ...

format. Some have argued that the distinction between objective and subjective assessments is neither useful nor accurate because, in reality, there is no such thing as "objective" assessment. In fact, all assessments are created with inherent biases built into decisions about relevant subject matter and content, as well as cultural (class, ethnic, and gender) biases.Joint Information Systems Committee (JISC). "What Do We Mean by e-Assessment?" JISC InfoNet. Retrieved January 29, 2009 from http://tools.jiscinfonet.ac.uk/downloads/vle/eassessment-printable.pdf

Basis of comparison

Test results can be compared against an established criterion, or against the performance of other students, or against previous performance: *''Criterion-referenced assessment'', typically using a

criterion-referenced test A criterion-referenced test is a style of test which uses test scores to generate a statement about the behavior that can be expected of a person with that score. Most tests and quizzes that are written by school teachers can be considered criter ...

, as the name implies, occurs when candidates are measured against defined (and objective) criteria. Criterion-referenced assessment is often, but not always, used to establish a person's competence (whether s/he can do something). The best-known example of criterion-referenced assessment is the driving test, when learner drivers are measured against a range of explicit criteria (such as "Not endangering other road users"). *''Norm-referenced assessment'' (colloquially known as " grading on the curve"), typically using a

norm-referenced test A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured. Assigning scores on such tests may b ...

, is not measured against defined criteria. This type of assessment is relative to the student body undertaking the assessment. It is effectively a way of comparing students. The IQ test is the best-known example of norm-referenced assessment. Many entrance tests (to prestigious schools or universities) are norm-referenced, permitting a fixed proportion of students to pass ("passing" in this context means being accepted into the school or university rather than an explicit level of ability). This means that standards may vary from year to year, depending on the quality of the cohort; criterion-referenced assessment does not vary from year to year (unless the criteria change).Educational Technologies at Virginia Tech. "Assessment Purposes." VirginiaTech DesignShop: Lessons in Effective Teaching, available a
Edtech.vt.edu
. Retrieved January 29, 2009. *''

Ipsative assessment In psychology, ipsative questionnaires (; from Latin: ''ipse'', 'of the self') are those where the sum of scale scores from each respondent adds to a constant value. Sometimes called a forced-choice scale, this measure contrasts Likert-type scal ...

'' is self-comparison either in the same domain over time, or comparative to other domains within the same student.

Informal and formal

Assessment can be either ''formal'' or ''informal''. Formal assessment usually implies a written document, such as a test, quiz, or paper. A formal assessment is given a numerical score or grade based on student performance, whereas an informal assessment does not contribute to a student's final grade. An informal assessment usually occurs in a more casual manner and may include observation, inventories, checklists, rating scales, rubrics, performance and portfolio assessments, participation, peer and self-evaluation, and discussion.Valencia, Sheila W. "What Are the Different Forms of Authentic Assessment?" Understanding Authentic Classroom-Based Literacy Assessment (1997), available a
Eduplace.com
Retrieved January 29, 2009.

Internal and external

Internal assessment is set and marked by the school (i.e. teachers). Students get the mark and feedback regarding the assessment. External assessment is set by the governing body, and is marked by non-biased personnel. Some external assessments give much more limited feedback in their marking. However, in tests such as Australia's NAPLAN, the criterion addressed by students is given detailed feedback in order for their teachers to address and compare the student's learning achievements and also to plan for the future.

Standards of quality

In general, high-quality assessments are considered those with a high level of reliability and validity. Other general principles are practicality, authenticity and washback.

Reliability

Reliability relates to the consistency of an assessment. A reliable assessment is one that consistently achieves the same results with the same (or similar) cohort of students. Various factors affect reliability—including ambiguous questions, too many options within a question paper, vague marking instructions and poorly trained markers. Traditionally, the reliability of an assessment is based on the following: # Temporal stability: Performance on a test is comparable on two or more separate occasions. # Form equivalence: Performance among examinees is equivalent on different forms of a test based on the same content. # Internal consistency: Responses on a test are consistent across questions. For example: In a survey that asks respondents to rate attitudes toward technology, consistency would be expected in responses to the following questions: #* "I feel very negative about computers in general." #* "I enjoy using computers."Yu, Chong Ho (2005). "Reliability and Validity." Educational Assessment. Available a
Creative-wisdom.com
Retrieved January 29, 2009. The reliability of a measurement x can also be defined quantitatively as:

R_\text = V_\text/V_\text

where

R_\text

is the reliability in the observed (test) score, x;

V_\text

and

V_\text

are the variability in 'true' (i.e., candidate's innate performance) and measured test scores respectively.

R_\text

can range from 0 (completely unreliable), to 1 (completely reliable). There are four types of reliability: student-related which can be personal problems, sickness, or

fatigue Fatigue describes a state of tiredness that does not resolve with rest or sleep. In general usage, fatigue is synonymous with extreme tiredness or exhaustion that normally follows prolonged physical or mental activity. When it does not resolve ...

, rater-related which includes bias and

subjectivity Subjectivity in a philosophical context has to do with a lack of objective reality. Subjectivity has been given various and ambiguous definitions by differing sources as it is not often the focal point of philosophical discourse.Bykova, Marina ...

, test administration-related which is the conditions of test taking process, test-related which is basically related to the nature of a test.

Validity

Valid assessment is one that measures what it is intended to measure. For example, it would not be valid to assess driving skills through a written test alone. A more valid way of assessing driving skills would be through a combination of tests that help determine what a driver knows, such as through a written test of driving knowledge, and what a driver is able to do, such as through a performance assessment of actual driving. Teachers frequently complain that some examinations do not properly assess the

syllabus A syllabus (; plural ''syllabuses'' or ''syllabi'') or specification is a document that communicates information about an academic course or class and defines expectations and responsibilities. It is generally an overview or summary of the curric ...

upon which the examination is based; they are, effectively, questioning the validity of the exam. Validity of an assessment is generally gauged through examination of evidence in the following categories: #

Content validity In psychometrics, content validity (also known as logical validity) refers to the extent to which a measure represents all facets of a given construct. For example, a depression scale may lack content validity if it only assesses the affective dim ...

– Does the content of the test measure stated objectives? #

Criterion validity In psychometrics, criterion validity, or criterion-related validity, is the extent to which an operationalization of a construct, such as a test, relates to, or predicts, a theoretical representation of the construct—the criterion. Criterion vali ...

– Do scores correlate to an outside reference? (ex: Do high scores on a 4th grade reading test accurately predict reading skill in future grades?) # Construct validity – Does the assessment correspond to other significant variables? (ex: Do ESL students consistently perform differently on a writing exam than native English speakers?) Others are: * consequential validity * face validity A good assessment has both validity and reliability, plus the other quality attributes noted above for a specific context and purpose. In practice, an assessment is rarely totally valid or totally reliable. A ruler which is marked wrongly will always give the same (wrong) measurements. It is very reliable, but not very valid. Asking random individuals to tell the time without looking at a clock or watch is sometimes used as an example of an assessment which is valid, but not reliable. The answers will vary between individuals, but the average answer is probably close to the actual time. In many fields, such as medical research, educational testing, and psychology, there will often be a trade-off between reliability and validity. A history test written for high validity will have many essay and fill-in-the-blank questions. It will be a good measure of mastery of the subject, but difficult to score completely accurately. A history test written for high reliability will be entirely multiple choice. It isn't as good at measuring knowledge of history, but can easily be scored with great precision. We may generalize from this. The more reliable our estimate is of what we purport to measure, the less certain we are that we are actually measuring that aspect of attainment. It is well to distinguish between "subject-matter" validity and "predictive" validity. The former, used widely in education, predicts the score a student would get on a similar test but with different questions. The latter, used widely in the workplace, predicts performance. Thus, a subject-matter-valid test of knowledge of driving rules is appropriate while a predictively valid test would assess whether the potential driver could follow those rules.

Practicality

This principle refers to the time and cost constraints during the construction and administration of an assessment instrument. Meaning that the test should be economical to provide. The format of the test should be simple to understand. Moreover, solving a test should remain within suitable time. It is generally simple to administer. Its assessment procedure should be particular and time-efficient.

Authenticity

The assessment instrument is authentic when it is contextualized, contains

natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...

and meaningful, relevant, and interesting topic, and replicates real world experiences.

Washback

This principle refers to the consequence of an assessment on teaching and learning within classrooms. Washback can be positive and negative. Positive washback refers to the desired effects of a test, while negative washback refers to the negative consequences of a test. In order to have positive washback, instructional planning can be used.

Evaluation standards

In the field of

evaluation Evaluation is a systematic determination and assessment of a subject's merit, worth and significance, using criteria governed by a set of standards. It can assist an organization, program, design, project or any other intervention or initiative to ...

, and in particular educational evaluation, the

Joint Committee on Standards for Educational Evaluation The Joint Committee on Standards for Educational Evaluation is an American/Canadian based Standards Developer Organization (SDO). The Joint Committee, created in 1975, represents a coalition of major professional associations formed in 1975 to dev ...

has published three sets of standards for evaluations. The ''Personnel Evaluation Standards'' were published in 1988, ''The Program Evaluation Standards'' (2nd edition) were published in 1994, and ''The Student Evaluation Standards'' were published in 2003. Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance. In the UK, an award in Training, Assessment and Quality Assurance (TAQA) is available to assist staff learn and develop good practice in relation to educational assessment in adult, further and work-based education and training contexts.

Summary table of the main theoretical frameworks

The following table summarizes the main ''theoretical frameworks'' behind almost all the theoretical and research work, and the instructional practices in education (one of them being, of course, the practice of assessment). These different frameworks have given rise to interesting debates among scholars.

Controversy

Concerns over how best to apply assessment practices across public school systems have largely focused on questions about the use of high-stakes testing and standardized tests, often used to gauge student progress, teacher quality, and school-, district-, or statewide educational success.

No Child Left Behind

For most researchers and practitioners, the question is not whether tests should be administered at all—there is a general consensus that, when administered in useful ways, tests can offer useful information about student progress and curriculum implementation, as well as offering formative uses for learners.American Psychological Association. "Appropriate Use of High-Stakes Testing in Our Nation's Schools." APA Online, available at