The Collective Knowledge (CK) project is an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
framework and
repository
Repository may refer to:
Archives and online databases
* Content repository, a database with an associated set of data management tools, allowing application-independent access to the content
* Disciplinary repository (or subject repository), an ...
to enable collaborative, reproducible and sustainable research and development of complex computational systems.
[
] CK is a small, portable, customizable and decentralized infrastructure helping researchers and practitioners:
* share their code, data and models as reusable
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (prog ...
components and automation actions with unified
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
, JSON meta information, and a
UID based on
FAIR principles
FAIR data is data which meets the FAIR principles of findability, accessibility, interoperability, and reusability (FAIR). The acronym and principles were defined in a March 2016 paper in the journal '' Scientific Data'' by a consortium of scie ...
* assemble portable workflows from shared components (such as multi-objective autotuning and
Design space exploration)
* automate,
crowdsource
Crowdsourcing involves a large group of dispersed participants contributing or producing goods and services, goods or services—including ideas, Voting, votes, Microwork, micro-tasks, and finances—for payment or as volunteers. Contemporary ...
and reproduce benchmarking of complex computational systems
* unify
predictive analytics
Predictive analytics encompasses a variety of Statistics, statistical techniques from data mining, Predictive modelling, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or other ...
(
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support ...
,
R, DNN)
* enable reproducible and interactive papers
Notable usages
*
ARM
In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between ...
uses CK to accelerate computer engineering
* Several
ACM-sponsored conferences use CK to automate the Artifact Evaluation process
*
Imperial College (London) uses CK to automate and
crowdsource
Crowdsourcing involves a large group of dispersed participants contributing or producing goods and services, goods or services—including ideas, Voting, votes, Microwork, micro-tasks, and finances—for payment or as volunteers. Contemporary ...
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
bug detection
* Researchers from the
University of Cambridge
The University of Cambridge is a Public university, public collegiate university, collegiate research university in Cambridge, England. Founded in 1209, the University of Cambridge is the List of oldest universities in continuous operation, wo ...
used CK to help the community reproduce results of their publication in the International Symposium on Code Generation and Optimization (CGO'17) during Artifact Evaluation
*
General Motors (USA) uses CK to crowd-benchmark
convolutional neural network
A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...
optimizations
* The
Raspberry Pi Foundation
The Raspberry Pi Foundation is a UK-based educational charity founded in 2008 to promote the study of computer science and related subjects globally, particularly among young people. It is best known for initiating the Raspberry Pi series of sing ...
and the
cTuning foundation released a CK workflow with a reproducible "live" paper to enable collaborative research into multi-objective autotuning and machine learning techniques
*
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
uses CK to reproduce
quantum results from nature
* CK is used to automate MLPerf benchmark
Portable package manager for portable workflows
CK has an integrated cross-platform package manager with
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (prog ...
scripts,
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
API and
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
meta-description to automatically rebuild software environment on a user machine required to run a given research workflow.
Reproducibility of experiments
CK enables reproducibility of experimental results via community involvement similar to
Wikipedia
Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
and
physics
Physics is the scientific study of matter, its Elementary particle, fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge whi ...
. Whenever a new workflow with all components is shared via GitHub, anyone can try it on a different machine, with different environment and using slightly different choices (compilers, libraries, data sets). Whenever an unexpected or wrong behavior is encountered, the community explains it, fixes components and shares them back as described in.
References
External links
* Development site
* Documentation
* Public repository with crowdsourced experiments
* International Workshop on Adaptive Self-tuning Computing System (ADAPT) uses CK to enable public reviewing of publications and artifacts via
Reddit
Reddit ( ) is an American Proprietary software, proprietary social news news aggregator, aggregation and Internet forum, forum Social media, social media platform. Registered users (commonly referred to as "redditors") submit content to the ...
{{FLOSS
Workflow applications
Build automation