Gadfly is a
relational database management system
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
written in
Python. Gadfly is a collection of Python modules that provides relational database functionality entirely implemented in Python. It supports a subset of the standard RDBMS
Structured Query Language (SQL).
Gadfly runs wherever Python runs and supports
client
Client(s) or The Client may refer to:
* Client (business)
* Client (computing), hardware or software that accesses a remote service on another computer
* Customer or client, a recipient of goods or services in return for monetary or other valuabl ...
/
server on any platform that supports the standard Python
socket
Socket may refer to:
Mechanics
* Socket wrench, a type of wrench that uses separate, removable sockets to fit different sizes of nuts and bolts
* Socket head screw, a screw (or bolt) with a cylindrical head containing a socket into which the hexa ...
interface. The
file format
A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
s used by Gadfly for storage are cross-platform—a gadfly database directory can be moved from
Windows 95
Windows 95 is a consumer-oriented operating system developed by Microsoft as part of its Windows 9x family of operating systems. The first operating system in the 9x family, it is the successor to Windows 3.1x, and was released to manufactu ...
to
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
using a binary copying mechanism and gadfly will read and run the database.
It supports persistent databases consisting of a collection of structured tables with indices, and a large subset of SQL for accessing and modifying those tables. It supports a log-based recovery protocol which allows committed operations of a
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
to be recovered even if the database was not shut down in a proper manner (i.e., in the event of a
CPU
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
or
software
Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work.
...
crash,
ut not in the event of a disk crash. It also supports a
TCP/IP
The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suit ...
Client/Server mode where remote clients can access a Gadfly database over a TCP/IP network (such as the Internet) subject to configurable security mechanisms.
Since Gadfly depends intimately on the kw
Parsing
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
package it is distributed as part of the kwParsing package, under the same copyright.
Gadfly allows Python programs to store, retrieve and query tabular data without having to rely on any external database engine or package. It provides an in-memory relational database style engine for Python programs, complete with a notion of a "committed, recoverable transaction" and "aborts".
Usage
The main "gadfly" module attempts to faithfully adhere to
Greg Stein's Python Database
API, as discussed and certified by the Python DB-SIG.
Concurrent database updates are not supported. The "databases" are currently designed to be written/modified by one process in isolation. Multiple processes can access a Gadfly database when accesses are arbitrated by a TCP/IP Gadfly server process.
Creating a new database
Unlike most Python/database-engine interfaces Gadfly databases must be created using Python. To accomplish this programmers use:
import gadfly
connection = gadfly.gadfly()
with no arguments. Then they startup a database using the startup method:
connection.startup("mydatabase", "mydirectory")
Here "mydirectory" must be a directory which exists and which can be written to in order to store the database files. The startup creates some files in "mydirectory". This has the effect of
clobbering any existing Gadfly database called "mydatabase" in the directory "mydirectory". Gadfly will not allow a start up the same connection twice, however.
The first "
import gadfly
" reads in and initializes some rather large data structures used for parsing SQL, and thus may take longer than other module imports.
Within the database the user can create tables, populate them, and commit the result when they are happy:
cursor = connection.cursor()
cursor.execute("CREATE TABLE ph (nm VARCHAR, ph VARCHAR)")
cursor.execute("INSERT INTO ph(nm, ph) VALUES ('arw', '3367')")
cursor.execute("SELECT * FROM ph")
for x in cursor.fetchall():
print x
# prints ('arw', '3367')
connection.commit()
Reconnecting to an existing database
Once a database exists it can be reconnected to:
import gadfly
connection = gadfly.gadfly("mydatabase", "mydirectory")
This will read in the database tables with the most recently committed values. The initialized database is then queried and updated:
cursor = connection.cursor()
cursor.execute("UPDATE ph SET nm='aaron' WHERE nm='arw'")
cursor.execute("SELECT * FROM ph")
for x in cursor.fetchall():
print x
# prints ('aaron', '3367')
If the user does not want to commit updates then the do not execute a commit on the connection object (which writes out the tables). To restore the old values from the existing database they use:
connection.abort()
Updates are only stored upon a connection.commit().
ctually, if autocheckpoint is disabled, updates are only stored to table files on checkpoint—see the documentation on the recovery mechanism.
print cursor.pp()
to "pretty print" the result of any evaluation (which might be
None
for a non-select).
Features
In this version all tables are read into memory upon "connecting" to the database and "touched" tables are written out upon checkpoint. Each table is represented as a separate file in the destination directory, and there is a "data definition" file as well (a list of data definition declarations). During active use a
log file
In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or just information on current operations. These events may occur in the operating system or in other software. A message or ...
appears in the active directory as well, and if the process crashes this log file is used to recover committed operations.
The SELECT statement
At this point Gadfly supports quite a lot of the SQL semantics requested in the ODBC 2.0 specification. SQL statements supported include the SELECT:
SELECT ALLexpressions or *
FROM tables
HERE condition
Here is an adverb that means "in, on, or at this place". It may also refer to:
Software
* Here Technologies, a mapping company
* Here WeGo (formerly Here Maps), a mobile app and map website by Here
Television
* Here TV (formerly "here!"), ...
ROUP BY group-expressions AVING aggregate-condition nion-clause RDER BY columns
This statement is quite powerful. It reads as follows:
# Make all combinations of rows from the tables (FROM line)
# Eliminate those combinations not satisfying condition (WHERE line)
# (if GROUP present) form aggregate groups that match on group-expressions
# (if HAVING present) eliminate aggregate groups that don't satisfy the aggregate-condition.
# compute the columns to keep (SELECT line).
# (if union-clause present) combine (union, difference, intersect) the result with the result of another select statement.
# if DISTINCT, throw out redundant entries.
# (if ORDER present) order the result by the columns (ascending or descending as specified, with precedence as listed).
The actual implementation in gadfly is much better than the intuitive reading, particularly at steps 1 and 2 (which are combined via optimizing transformations and hash join algorithms).
Conditions may include equalities, and inequalities of expressions. Conditions may also be combined using AND, OR, NOT. Expressions include column names, constants, and standard arithmetic operations over them.
Embedded queries supported include subquery expressions, expr IN (subselect), quantified comparisons, and the EXISTS (subselect) predicate.
Aggregate tests and computations can only be applied after the GROUPing and before the columns are selected (steps 3,4,5). Aggregate operations include COUNT(*), COUNT(expression), AVG(expression), SUM(expression), MAX(expression), MIN(expression), and the non-standard MEDIAN(expression). These may be applied to DISTINCT values (throwing out redundancies, as in COUNT(DISTINCT drinker). if no GROUPing is present the aggregate computations apply to the entire result after step 2.
There is much more to know about the SELECT statement. The test suite test/test_gadfly.py gives numerous examples of SELECT statements.
Table creation and "data types"
Create tables using the CREATE TABLE statement:
CREATE TABLE name (colname datatype colname datatype...
Data types currently "supported" are
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
,
float
Float may refer to:
Arts and entertainment Music Albums
* ''Float'' (Aesop Rock album), 2000
* ''Float'' (Flogging Molly album), 2008
* ''Float'' (Styles P album), 2013
Songs
* "Float" (Tim and the Glory Boys song), 2022
* "Float", by Bush ...
, and
varchar. They are ignored by the implementation, anything that is hashable and marshallable can currently go in any column (but that is likely to change). For example:
CREATE TABLE frequents
(drinker VARCHAR,
bar VARCHAR,
perweek INTEGER)
At present tuples, complexes, or anything else can be put into a column specified as "VARCHAR".
Other supported statements
Gadfly also supports the searched DELETE and UPDATE; INSERT VALUES and INSERT subselect; CREATE/DROP INDEX, and DROP TABLE. These have the informal syntax:
DELETE FROM table WHERE condition
UPDATE table SET col=expr col=expr...WHERE condition
INSERT INTO table column "> column...">column [, column...VALUES (value [, value..."> column..."> column...">column [, column...VALUES (value [, value...
INSERT INTO table column [, column..."> column...">column [, column...subselect
CREATE [UNIQUE] INDEX name ON table (column [, column...])
DROP TABLE table
DROP INDEX name
Multiple statements may be executed in one cursor.execute(S) by separating the statements with semicolons in S, for example S might have the string value:
DROP INDEX tdindex;
DROP TABLE templikes
SQL is case insensitive.
Dynamic values
Expressions also include the special expression '?' (the ODBC-style dynamic expression) as in:
insertstat = "INSERT INTO ph(nm,ph) VALUES (?, ?)"
cursor.execute(insertstat, ('nan', "0356"))
cursor.execute(insertstat, ('bill', "2356"))
cursor.execute(insertstat, ('tom', "4356"))
Dynamic values allow the cursor to use the same parsed expression many times for a similar operation. Above the insertstat is parsed and bound to the database only once. Using dynamic attributes should speed up accesses. Thus the above should run much faster than the equivalent:
cursor.execute("INSERT INTO ph(nm,ph) VALUES ('nan', '0356')")
cursor.execute("INSERT INTO ph(nm,ph) VALUES ('bill', '2356')")
cursor.execute("INSERT INTO ph(nm,ph) VALUES ('tom', '4356')")
Dynamic attributes can appear in other statements containing expressions (such as SELECTs, UPDATEs and DELETEs too).
For SELECT, UPDATE, and DELETE the dynamic expression substitutions must consist of a single tuple, as in:
stat = "SELECT * FROM ph WHERE nm=?"
cursor.execute(stat, ("nan",))
...
cursor.execute(stat, ("bob",))
...
Since the dynamic substitution eliminates the need for parsing and binding (expensive operations!) the above should run faster than the equivalent:
cursor.execute("SELECT * FROM ph WHERE nm='nan'")
...
cursor.execute("SELECT * FROM ph WHERE nm='bob'")
...
If several similar queries are repeated multiple times, each query "template string" is associated with a unique cursor object so that each template must be parsed and bound only once. Some relatively complex queries from the test suite run 2 to 3 times faster after they have been parsed and bound, even without the kjbuckets builtin. With kjbuckets the same ran 5 to 10 times faster.
Multiple batch inserts and dynamic values
For the special case of INSERT VALUES a list of substitution tuples allows the query engine to perform the inserts in optimized batch mode. Thus the fastest way to perform the three inserts given earlier is:
data = 'nan', "0356")), ('bill', "2356"), ('tom', "4356")stat = "INSERT INTO ph(nm,ph) VALUES (?, ?)"
cursor.execute(stat, data)
It would be even faster if the cursor had previously executed the stat with different data (since then no parsing or binding would occur).
Introspection
By default a gadfly database automatically includes "introspective" tables which allow a gadfly query to "query the shape of the database"—for example to view table names and names of rows in tables:
>>> g = gadfly()
>>> g.startup("dbtest", "dbtest")
>>> c = g.cursor()
>>> c.execute("select * from __table_names__")
>>> print c.pp()
IS_VIEW , TABLE_NAME
=
1 , __TABLE_NAMES__
1 , DUAL
1 , __DATADEFS__
1 , __COLUMNS__
1 , __INDICES__
1 , __INDEXCOLS__
Here DUAL is a standard one row/one column test table (from the Oracle tradition) and the other tables provide information about the database schema:
>>> c.execute("create table t1 (a varchar, b varchar)")
>>> c.execute("create table t2 (b varchar, c varchar)")
>>> c.execute("create unique index t1a on t1(a)")
>>> c.execute("create index t1b on t1(b)")
>>> c.execute("select * from __table_names__")
>>> print c.pp()
IS_VIEW , TABLE_NAME
=
0 , T1
1 , __DATADEFS__
1 , __INDICES__
0 , T2
1 , __TABLE_NAMES__
1 , __COLUMNS__
1 , DUAL
1 , __INDEXCOLS__
>>> c.execute("select * from __columns__")
>>> print c.pp()
COLUMN_NAME , TABLE_NAME
=
A , T1
B , T1
NAME , __DATADEFS__
DEFN , __DATADEFS__
INDEX_NAME , __INDICES__
TABLE_NAME , __INDICES__
IS_UNIQUE , __INDICES__
TABLE_NAME , __TABLE_NAMES__
IS_VIEW , __TABLE_NAMES__
B , T2
C , T2
COLUMN1 , DUAL
TABLE_NAME , __COLUMNS__
COLUMN_NAME , __COLUMNS__
INDEX_NAME , __INDEXCOLS__
COLUMN_NAME , __INDEXCOLS__
>>> c.execute("select * from __indices__")
>>> print c.pp()
IS_UNIQUE , TABLE_NAME , INDEX_NAME
0 , T1 , T1B
1 , T1 , T1A
>>> c.execute("select * from __indexcols__")
>>> print c.pp()
COLUMN_NAME , INDEX_NAME
B , T1B
A , T1A
>>> c.execute("select * from dual")
>>> print c.pp()
COLUMN1
0
Interactive testing
After installation, the created database can be interactively tested from the same directory using the interactive interpreter:
Python 2.1.3 (#1, Apr 30 2002, 19:37:40)
CC 2.96 20000731 (Red Hat Linux 7.1 2.96-96)on linux2
Type "copyright", "credits" or "license" for more information.
>>>
>>> from gadfly import gadfly
>>> connection = gadfly("test", "dbtest")
>>> cursor = connection.cursor()
>>> cursor.execute("select * from frequents")
>>> cursor.description
(('DRINKER', None, None, None, None, None, None), ('PERWEEK', None, None,
None, None, None, None), ('BAR', None, None, None, None, None, None))
>>> print cursor.pp()
DRINKER , PERWEEK , BAR
adam , 1 , lolas
woody , 5 , cheers
sam , 5 , cheers
norm , 3 , cheers
wilt , 2 , joes
norm , 1 , joes
lola , 6 , lolas
norm , 2 , lolas
woody , 1 , lolas
pierre , 0 , frankies
>>>
Architecture
The SQL grammar is described in grammar.py, the binding of the grammar constructs to semantic objects is performed in bindings.py, the semantic objects and their execution strategies is defined in semantics.py. The semantics use a lot of classical and non-classical logic (
cylindric logic) as well as optimization heuristics to define a relatively efficient and correct implementation of SQL.
The most basic data structures of the implementation are given in either kjbuckets0.py or the faster kjbucketsmodule.c, which implement the same data type signatures in Python and in a C extension to Python respectively.
The database.py module is a simple wrapper that provides a standard DBAPI interface to the system.
The test suite test/test_gadfly.py attempts to provide a regression test and a demonstration of the system.
The SQL parser also requires the kwParsing parser generation package, which consists of a number of additional python modules.
Tools
With gfplus a gadfly database can be interactively manipulated with SQL commands.
The tool works similar to
Oracle
An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination.
Description
The wor ...
's
SQL*Plus.
Concurrency
Because it lacks true
concurrency control
In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while ...
and file-system based indexing it is not appropriate for very large multiprocess
transaction
Transaction or transactional may refer to:
Commerce
*Financial transaction, an agreement, communication, or movement carried out between a buyer and a seller to exchange an asset for payment
*Debits and credits in a Double-entry bookkeeping syst ...
-based systems.
Two applications may access the same database concurrently. However, changes made by one application may not be seen by the other application until after it restarts. This may be because each application loads the database in-memory at startup only.
References
External links
*
Gadfly FAQ
{{DEFAULTSORT:Gadfly (Database)
Free database management systems
Free software programmed in Python