Legal Information Retrieval
   HOME

TheInfoList



OR:

Legal information retrieval is the science of
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
applied to legal text, including
legislation Legislation is the process or result of enrolled bill, enrolling, enactment of a bill, enacting, or promulgation, promulgating laws by a legislature, parliament, or analogous Government, governing body. Before an item of legislation becomes law i ...
,
case law Case law, also used interchangeably with common law, is law that is based on precedents, that is the judicial decisions from previous cases, rather than law based on constitutions, statutes, or regulations. Case law uses the detailed facts of a l ...
, and scholarly works. Accurate legal information retrieval is important to provide access to the law to laymen and legal professionals. Its importance has increased because of the vast and quickly increasing amount of legal documents available through electronic means.Jackson et al., p. 60 Legal information retrieval is a part of the growing field of
legal informatics Legal informatics is an area within information science. The American Library Association defines informatics as "the study of the structure and properties of information, as well as the application of technology to the organization, storage, re ...
. In a legal setting, it is frequently important to retrieve all information related to a specific query. However, commonly used
boolean search In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts r ...
methods (exact matches of specified terms) on full text legal documents have been shown to have an average recall rate as low as 20 percent,Blair, D.C., and Maron, M.E., 1985, p.293 meaning that only 1 in 5 relevant documents are actually retrieved. In that case, researchers believed that they had retrieved over 75% of relevant documents. This may result in failing to retrieve important or
precedential A precedent is a principle or rule established in a previous legal case that is either binding on or persuasive for a court or other tribunal when deciding subsequent cases with similar issues or facts. Common-law legal systems place great value ...
cases. In some jurisdictions this may be especially problematic, as legal professionals are
ethically Ethics or moral philosophy is a branch of philosophy that "involves systematizing, defending, and recommending concepts of right and wrong behavior".''Internet Encyclopedia of Philosophy'' The field of ethics, along with aesthetics, concerns ...
obligated to be reasonably informed as to relevant legal documents. Legal Information Retrieval attempts to increase the effectiveness of legal searches by increasing the number of relevant documents (providing a high recall rate) and reducing the number of irrelevant documents (a high precision rate). This is a difficult task, as the legal field is prone to
jargon Jargon is the specialized terminology associated with a particular field or area of activity. Jargon is normally employed in a particular Context (language use), communicative context and may not be well understood outside that context. The conte ...
,
polysemes Polysemy ( or ; ) is the capacity for a sign (e.g. a symbol, a morpheme, a word, or a phrase) to have multiple related meanings. For example, a word can have several word senses. Polysemy is distinct from ''monosemy'', where a word has a single ...
(words that have different meanings when used in a legal context), and constant change. Techniques used to achieve these goals generally fall into three categories: boolean retrieval, manual classification of legal text, and
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
of legal text.


Problems

Application of standard
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
techniques to legal text can be more difficult than application in other subjects. One key problem is that the law rarely has an inherent
taxonomy Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
.Peters, W. et al. 2007, p. 120 Instead, the law is generally filled with open-ended terms, which may change over time. This can be especially true in
common law In law, common law (also known as judicial precedent, judge-made law, or case law) is the body of law created by judges and similar quasi-judicial tribunals by virtue of being stated in written opinions."The common law is not a brooding omnipresen ...
countries, where each decided case can subtly change the meaning of a certain word or phrase. Legal information systems must also be programmed to deal with law-specific words and phrases. Though this is less problematic in the context of words which exist solely in law, legal texts also frequently use polysemes, words may have different meanings when used in a legal or common-speech manner, potentially both within the same document. The legal meanings may be dependent on the area of law in which it is applied. For example, in the context of European Union legislation, the term "worker" has four different meanings:Peters, W. et al. 2007, p. 131 #Any worker as defined in Article 3(a) of
Directive 89/391/EEC Directive 89/391/EEC is a European Union directive with the objective to introduce measures to encourage improvements in the safety and health of workers at work. It is described as a "Framework Directive" for occupational safety and health (OSH) ...
who habitually uses display screen equipment as a significant part of his normal work. #Any person employed by an employer, including trainees and apprentices but excluding domestic servants; #Any person carrying out an occupation on board a vessel, including trainees and apprentices, but excluding port pilots and shore personnel carrying out work on board a vessel at the quayside; #Any person who, in the Member State concerned, is protected as an employee under national employment law and in accordance with national practice; It also has the common meaning:
  1. A person who works at a specific occupation.
Though the terms may be similar, correct information retrieval must differentiate between the intended use and irrelevant uses in order to return the correct results. Even if a system overcomes the language problems inherent in law, it must still determine the relevancy of each result. In the context of judicial decisions, this requires determining the precedential value of the case.Maxwell, K.T., and Schafer, B. 2008, p. 8 Case decisions from senior or
superior court In common law systems, a superior court is a court of general jurisdiction over civil and criminal legal cases. A superior court is "superior" in relation to a court with limited jurisdiction (see small claims court), which is restricted to civil ...
s may be more relevant than those from
lower court A lower court or inferior court is a court from which an appeal may be taken, usually referring to courts other than supreme court. In relation to an appeal from one court to another, the lower court is the court whose decision is being reviewed ...
s, even where the lower court's decision contains more discussion of the relevant facts. The opposite may be true, however, if the senior court has only a minor discussion of the topic (for example, if it is a secondary consideration in the case). An information retrieval system must also be aware of the authority of the jurisdiction. A case from a binding authority is most likely of more value than one from a non-binding authority. Additionally, the intentions of the user may determine which cases they find valuable. For instance, where a legal professional is attempting to argue a specific interpretation of law, he might find a minor court's decision which supports his position more valuable than a senior courts position which does not. He may also value similar positions from different areas of law, different jurisdictions, or dissenting opinions. Overcoming these problems can be made more difficult because of the large number of cases available. The number of legal cases available via electronic means is constantly increasing (in 2003, US appellate courts handed down approximately 500 new cases per day), meaning that an accurate legal information retrieval system must incorporate methods of both sorting past data and managing new data.


Techniques


Boolean searches

Boolean search In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts r ...
es, where a user may specify terms such as use of specific words or judgments by a specific court, are the most common type of search available via legal information retrieval systems. They are widely implemented but overcome few of the problems discussed above. The recall and precision rates of these searches vary depending on the implementation and searches analyzed. One study found a basic boolean search's recall rate to be roughly 20%, and its precision rate to be roughly 79%. Another study implemented a generic search (that is, not designed for legal uses) and found a recall rate of 56% and a precision rate of 72% among legal professionals. Both numbers increased when searches were run by non-legal professionals, to a 68% recall rate and 77% precision rate. This is likely explained because of the use of complex legal terms by the legal professionals.


Manual classification

In order to overcome the limits of basic boolean searches, information systems have attempted to classify case laws and statutes into more computer friendly structures. Usually, this results in the creation of an
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
to classify the texts, based on the way a legal professional might think about them.Maxwell, K.T., and Schafer, B. 2008, p. 2 These attempt to link texts on the basis of their type, their value, and/or their topic areas. Most major legal search providers now implement some sort of classification search, such as
Westlaw Westlaw is an online legal research service and proprietary database for lawyers and legal professionals available in over 60 countries. Information resources on Westlaw include more than 40,000 databases of case law, state and federal statute ...
's “Natural Language”Westlaw Research, http://www.westlaw.com or
LexisNexis LexisNexis is a part of the RELX corporation that sells data analytics products and various databases that are accessed through online portals, including portals for computer-assisted legal research (CALR), newspaper search, and consumer informa ...
' HeadnoteLexis Research, http://www.lexisnexis.com searches. Additionally, both of these services allow browsing of their classifications, via Westlaw's West Key Numbers or Lexis' Headnotes. Though these two search algorithms are proprietary and secret, it is known that they employ manual classification of text (though this may be computer-assisted). These systems can help overcome the majority of problems inherent in legal information retrieval systems, in that manual classification has the greatest chances of identifying landmark cases and understanding the issues that arise in the text.Maxwell, K.T., and Schafer, B. 2008, p. 3 In one study, ontological searching resulted in a precision rate of 82% and a recall rate of 97% among legal professionals. The legal texts included, however, were carefully controlled to just a few areas of law in a specific jurisdiction. The major drawback to this approach is the requirement of using highly skilled legal professionals and large amounts of time to classify texts. As the amount of text available continues to increase, some have stated their belief that manual classification is unsustainable.


Natural language processing

In order to reduce the reliance on legal professionals and the amount of time needed, efforts have been made to create a system to automatically classify legal text and queries.Ashley, K.D. and Bruninghaus, S. 2009, p. 125Gelbart, D. and Smith, J.C. 1993, p. 142 Adequate translation of both would allow accurate information retrieval without the high cost of human classification. These automatic systems generally employ
Natural Language Processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
(NLP) techniques that are adapted to the legal domain, and also require the creation of a legal
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
. Though multiple systems have been postulated, few have reported results. One system, “SMILE,” which attempted to automatically extract classifications from case texts, resulted in an
f-measure In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the nu ...
(which is a calculation of both recall rate and precision) of under 0.3 (compared to perfect f-measure of 1.0).Ashley, K.D. and Bruninghaus, S. 2009, p. 159 This is probably much lower than an acceptable rate for general usage. Despite the limited results, many theorists predict that the evolution of such systems will eventually replace manual classification systems.


Citation-Based ranking

In the mid-90s the Room 5 case law retrieval project used citation mining for summaries and ranked its search results based on citation type and count. This slightly pre-dated the
Page Rank PageRank (PR) is an algorithm used by Google Search to rank webpages, web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. A ...
algorithm at Stanford which was also a citation-based ranking. Ranking of results was based as much on jurisdiction as on number of references. Loui, R. P., Norman, J., Altepeter, J., Pinkard, D., Craven, D., Linsday, J., & Foltz, M. (1997, June). Progress on Room 5: A testbed for public interactive semi-formal legal argumentation. In Proceedings of the 6th international conference on Artificial intelligence and law (pp. 207-214). ACM.


Notes


References

* * * * * * * *


See also

*
Computer-assisted legal research Computer-assisted legal research (CALR) or computer-based legal research is a mode of legal research that uses databases of court opinions, statutes, court documents, and secondary material. Electronic databases make large bodies of case law easily ...
{{DEFAULTSORT:Legal Information Retrieval Information retrieval genres Natural language processing Legal research