NCSA Brown Dog
   HOME

TheInfoList



OR:

NCSA Brown Dog is a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research. It is supported by the
National Center for Supercomputing Applications The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale computer infrastructure that advances research, science and engineering based in the United States. NCSA operates as a ...
(NCSA) that is funded by the
National Science Foundation The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National I ...
(NSF).


History

Brown Dog is part of the
DataNet DataNet, or Sustainable Digital Data Preservation and Access Network Partner was a research program of the U.S. National Science Foundation Office of Cyberinfrastructure. The office announced a request for proposals with this title on September 28 ...
partners program funded by NSF in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science, engineering and education. Brown Dog is part of a follow-on effort called Data Infrastructure Building Blocks (DIBBs), focused on building software to support DataNet. The project was proposed by researchers at NCSA and the
University of Illinois Urbana-Champaign The University of Illinois Urbana-Champaign (U of I, Illinois, University of Illinois, or UIUC) is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the University ...
as well as researchers from
Boston University Boston University (BU) is a private research university in Boston, Massachusetts. The university is nonsectarian, but has a historical affiliation with the United Methodist Church. It was founded in 1839 by Methodists with its original campu ...
and the
University of North Carolina at Chapel Hill A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. Universities typically offer both undergraduate and postgraduate programs. In the United States ...
.


Unstructured, uncurated, long tail data

Much scientific data is smaller, unstructured and uncurated and thus not easily shared. Such data is sometimes referred to as "long tail" data. This borrows a term from statistics and refers to the tail of the distribution of project sizes. The majority of smaller projects lack the resources to properly steward the data they produce. This so-called "long tail" data, both past and present, has the potential to inform future research in many study areas. Much of this data has become inaccessible due to obsolete software and file formats. The resulting impossibility of reviewing data from older research disrupts the overall scientific research project.


Approach

Brown Dog describes itself as the "super mutt" of software (thus the name "Brown Dog"), serving as a low-level data infrastructure to interface digital data content across the internet. Its approach is to use every possible source of automated help (i.e., software) in existence in a robust and provenance-preserving manner to create a service that can deal with as much of this data as possible. The project sees the broader impact of its work in its potential to serve the general public as a sort of "DNS for data", with the goal of making all data and all file formats as accessible as webpages are today.


Technology

Brown Dog seeks to address problems involving the use of uncurated and unstructured data collections through the development of two services: the Data Access Proxy (DAP) to aid in the conversion of file formats and the Data Tilling Services (DTS) for the automatic extraction of metadata from file contents. Once developed, researchers and general public users will be able to download browser plugins and other tools from the Brown Dog tool catalog.


Data Tilling Service

Data Tilling Service (DTS) will allow users to search data collections using an existing file to discover other similar files in a collection. A DTS search field will be appended to configured browsers where example files can be dropped. This tells DTS to search all the files under a given
URL A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
for files similar to the dropped file. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return all images in the collection that also contain three people. If DTS encounters a foreign file format, it will utilize DAP to make the file accessible. DTS also indexes the data and extract and appends metadata to files and collections enabling users to gain some sense of the type of data they are encountering. This service runs on port 9443.


Data Access Proxy

Data Access Proxy (DAP) allows users to access data files that would otherwise be unreadable. Similar to an internet gateway or
Domain Name Service The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned t ...
, the DAP configuration would be entered into a user's machine and browser settings. Data requests over
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
would first be examined by DAP to determine if the native file format is readable on the client device. If not, DAP converts the file into the best available format readable by the client machine. Alternatively, the user could specify the desired format themselves. This service runs on port 8184.


Use cases

Brown Dog targets three
use cases In software and systems engineering, the phrase use case is a polyseme with two senses: # A usage scenario for a piece of software; often used in the plural to suggest situations where a piece of software may be useful. # A potential scenario i ...
proposed by groups within th
EarthCube
research communities. Developers and researchers from these communities will work together on use cases that span
geoscience Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four spheres ...
,
engineering Engineering is the use of scientific method, scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad rang ...
,
biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...
and
social science Social science is one of the branches of science, devoted to the study of societies and the relationships among individuals within those societies. The term was formerly used to refer to the field of sociology, the original "science of soc ...
.


Long tail vegetation data in ecology and global change biology

This use case is led b
Michael DietzeBoston University
Data on the abundance, species composition, and size structure of vegetation is critically important for a wide array of sub-disciplines in ecology, conservation, natural resource management, and global change biology. However, addressing many of the pressing questions in these disciplines will require that terrestrial biosphere and hydrologic models are able to assimilate the large amount of long-tail data that exists but is largely inaccessible. The Brown Dog team in cooperation with researches from Dietze's lab will facilitate the capture of a huge body of smaller research-oriented vegetation data sets collected over many decades and historical vegetation data embedded in Public Land Survey data dating back to 1785. This data will be used as initial conditions for models, to make sense of other large data sets and for model calibration and validation.


Designing green infrastructure considering storm water and human requirements

This use case is led b
Barbara MinskerUniversity of Illinois at Urbana-Champaign

William Sullivan
University of Illinois at Urbana-Champaign
Arthur Schmidt
University of Illinois at Urbana-Champaign
This case study involves developing novel
green infrastructure Green infrastructure or blue-green infrastructure refers to a network that provides the “ingredients” for solving urban and climatic challenges by building with nature.Hiltrud Pötz & Pierre Bleuze (2011). Urban green-blue grids for sustainab ...
design criteria and models that integrate requirements for storm water management and ecosystem and human health and well being. To address the scientific and social problems associated with the design of green spaces, data accessibility and availability is a major challenge. This study will focus on identified areas of the Green Healthy Neighborhood Planning region within the City of Chicago where existing local sewer performance is most deficient and where changes in impervious area through green infrastructure would be beneficial to under served neighborhoods. Brown Dog will be used to extract long-tail experimental data on human landscape preferences and health impacts. This data will be used to develop a human health impacts model that will then be linked together with a terrestrial biosphere model and a storm water model using Brown Dog technology.


Development and application for critical zone studies

This use case is led b
Praveen Kumar
University of Illinois at Urbana-Champaign


NSF Award

CIF21 DIBBs: Brown Dog was awarded in the winter of 2013 with a start date of October 1, 2013. Estimated expiration date is September 30, 2018. The award amount was $10,519,716.00, the largest DIBB award. The principal investigator is Kenton McHenry of NCSA at the University of Illinois at Urbana-Champaign. Coleaders are Jong Lee NCSA/UIUC; Barbara Minsker, Civil and Environmental Engineering, University of Illinois at Urbana-Champaign; Praveen Kumar, Civil and Environmental Engineering, University of Illinois at Urbana-Champaign; Michael Dietze, Department of Earth and Environment, Boston University.


References


External links

* {{official website, http://browndog.ncsa.illinois.edu Data management National Science Foundation Research projects