Most of us probably started to use Python as a scripting language to quickly create working code for e.g. numerical and scientific calculations. But, of course, Python is much more than that. If you intend to use Python for more traditional applications, you will sooner or later need to interface with a database. Today, we will see how the Python DB-API can be used for that purpose.
The Python DB-API
Of course there are many databases that can be used with Python – various modules exist like mysql-connector for MySQL, Psycopg for PostgreSQL or sqlite3 for the embedded database SQLite. Obviously, all these drivers differ slightly, but it turns out that they follow a common design pattern which is known as the Python DB-API and described in PEP-249.
If you know Java, you have probably heard of JDBC, which is a standard to access relational databases from Java applications. The Python DB-API is a bit different – it is not a library, but it is a set of design patterns that drivers are supposed to follow.
When using JDBC, there is, for instance, an interface class called java.sql.Connection. Whatever driver and whatever database you use, this interface will always be the same for compliant drivers. For Python, the situation is a bit different. If you use SQLite, then there will be a class sqlite3.Connection. If you use MySQL, there will be a class with the slightly cryptic name mysql.connector.connection_cext.CMySQLConnection. These are different classes, and they do not inherit from a common superclass or interface. Still, the Python DB-ABI dictates that these classes all have a common set of methods and attributes to make switching between different databases as easy as possible.
Before we start to look at an example, it is helpful to understand the basic objects of the DB-API class model. First, there is a connection, which, of course, represents a connection to a database. How to initially obtain a connection is specific for the database used, and we will see some examples further below.
When a connection is obtained, it is active or open. It can be closed by executing its close()
method, which will render the connection unusable and roll back any uncommitted changes. A connection also has commit()
and rollback()
to support transaction control (however, the specification leaves the implementation some freedom with respect to features like auto-commit and isolation levels, and does not even mandate that implementations support transactions at all). The specification is also not fully clear on how to start transactions, it seems that most driver automatically start a new transaction if a statement is executed and no transaction is in progress.
To actually execute statements and fetch results, a cursor is used. Again, it is implementation specific whether this is implemented as a real, server-side cursor or as a local cache within the client. Cursors are created by calling the connections cursor
method, and is only valid within the context of a connection (i.e. if a connection is closed, all cursors obtained from it will become unusable). Once we have a cursor, we can execute statements on it using either its execute
method (which executes a statement with one set of parameters) or its executemany
method, which executes a statement several times with differents sets of parameters. How exactly parameters are referenced in a statement is implementation specific, and we will see some examples below.
The statement executed via a cursor can be an update or a select statement. In the latter case, we can access its results either via the fetchall()
method which returns all rows of the result set as a sequence, or the fetchone()
method which returns the next row. The API also defines a method fetchmany()
which returns a specified number of rows at a time.
Using Python with SQLite
After this general introduction, let us now see how the API works in practice. For the sake of simplicity, we will first use the embedded database SQLite.
One of the nice things about SQLite3 is that the module that you will need to use it in a Python program – slqite3 is part of the Python standard library and therefore part of any Python standard installation. So there is no separate installation needed to use it, and we can start to play with it right away.
To use a databases, the first thing we have to do is to establish a connection to it. As mentioned above, the process to do this is specific for the SQLite database. As SQLite identifies a database by the file where the data is stored, we basically only have to provide the file name to the driver to either establish a connection to an existing database or to create a new database (which will happen automatically if the file we refer to does not exist). Here is a code snippet that will establish a connection to a database stored in your home directory in the file.
import sqlite3 as dblib from pathlib import Path home = str(Path.home()) db = home + "/example.db" c = dblib.connect(db)
Now let us create a table. This sounds simple enough, just execute a CREATE TABLE
statement. However, this will raise an error if the table already exists. Of course, we could use the IF NOT EXISTS
clause to achieve this, but for the sake of demonstration let us choose a different approach.
We first try to drop the table. If the table does not exist, this will raise an exception which we can catch, if the table exists it will be removed. In any case, we can now proceed to create the table. To execute these statements, we need a cursor as explained above which we can get from the newly created connection. So our code looks as follows (of course, in productive code, you would put more effort into making sure that the reason for the exception is what you expect):
cursor = c.cursor() try: cursor.execute('''DROP TABLE books''') except dblib.Error as err: pass cursor.execute('''CREATE TABLE books (author text, title text)''')
Next, we want to insert data into our database. Suppose you wanted to insert a book called “Moby Dick” written by Herman Melville. So we want to execute an SQL statement like
INSERT INTO books (author, title) VALUES ('Herman Melville', 'Moby Dick') ;
Of course, we could simply assemble this SQL statement in Python and pass it to the execute
method of our cursor directly. But this is not the recommended way of doing things. Instead, one generally uses SQL host variables. These are variable parts of a statement which are declared inside the statement and, at runtime, are bound to Python variables. This is generally advisable, for two reasons. First, using the naive approach of manually assembling SQL statements and parameter values makes your program more vulnerable to SQL injection. Second, using host variables allows the driver to prepare the statement only once (i.e. to parse and tokenize the statement and prepare an execution plan) even if you insert multiple rows, thus increasing performance.
To use host variables with SQLite, you would execute a statement in which the values you want to insert are replaced by placeholders, using question marks.
INSERT INTO books (author, title) VALUES (?, ?) ;
When you call the execute
method of a cursor, you pass this SQL string along with the values for the placeholders, and the database driver will then replace by their actual values, a process called binding.
However, when coding this, there is a subtlety we need to observe. The special characters used for the placeholders are not specified by the DB-API Python standard and might therefore be driver specific. Fortunately, even though the standard does not define the placeholders, it defines a way to obtain them that is the same for all drivers.
Specifically, the standard specifies several placeholder styles which can be queried by accessing a global variable. In our example, we use only two of these styles, called pyformat and question mark. Pyformat, the style used by e.g. MySQL, uses Python-like placeholders like %s
, whereas question mark uses – well, you might have guessed that – question marks as SQLite does it. So to keep our code reusable, we first retrieve the placeholder style, translate this into the actual placeholder (a dictionary comes in handy here) and then assemble our statement.
knownMarkers = {'pyformat' : '%s', 'qmark' : '?'} marker = knownMarkers[dblib.paramstyle] examples = [('Dickens', 'David Copperfield'), ('Melville', 'Moby Dick')] sqlString = '''INSERT INTO books (author, title) VALUES ''' + "(" + marker + "," + marker + ")" cursor.executemany(sqlString, examples) c.commit()
Note that we explicitly commit at the end of the statement, as closing the connection would otherwise rollback our changes. We also use the executemany
method which performs several insertions, binding host variables from a sequence of value tuples.
Reading from our database is now very simple. Again, we first need to get a connection and a cursor. Once we have that, we assemble a SELECT statement and execute it. We then use the fetch
or fetchall
method to retrieve the result set and can iterate through the result set as usual.
# Get all rows from table books statement = "SELECT author, title from books;" cursor.execute(statement) rows = cursor.fetchall() # Iterate through result set for row in rows: print ("Author: " + row[0]) print ("Title: " + row[1])
As we can see, the order of the columns in the individual tuples within the result set is as in our SELECT statement. However, I have not found an explicit guarantee for this behaviour in the specification. If we want to make sure that we get the columns right, we can use the cursor.description
attribute which the standard mandates and which contains a sequence of tuples, wherein each tuple represents a column in the result and contains a set of attributes like name and type.
We have now seen how we can use the DB-API to create database tables, to insert rows and to retrieve results. I have assembled the code that we have discussed in two scripts, one to prepare a database and one to read from it, which you can find here.
Using Python with MySQL
To illustrate the usage of the DB-API for different databases, let us now try to do the same thing with MySQL. Of course, there is some more setup involved in this case. First, the driver that we will use – the mysql-connector – is not part of the standard Python library and needs to be installed separately. Of course, this is done using
pip3 install mysql-connector-python
We will also need the MySQL command line client mysql
and the admin client mysqladmin
. On an Ubuntu system, you can install them using
sudo apt-get install mysql-client
As we intend to run the MySQL server in a docker container, you also need a Docker engine installed on your system. If you do not have that yet, the easiest way might be to install it using snap.
snap install docker
Next, we need to bring up an instance of the MySQL database using docker run
, give it some time to come up, create a database called books and grant access rights to a new user that we will use to access our database from within Python. Here is a short script that will perform all these update scripts and that you can also find here.
# Start docker container docker run -d --name some-mysql \ --rm \ -p 3306:3306 \ -e MYSQL_ROOT_PASSWORD=my-secret-root-pw \ -e MYSQL_USER=chr \ -e MYSQL_PASSWORD=my-secret-pw \ mysql # Give the database some time to come up mysqladmin --user=root --password=my-secret-root-pw \ --host='127.0.0.1' \ --silent status while [ $? == 1 ] do sleep 5 mysqladmin --user=root --password=my-secret-root-pw \ --host='127.0.0.1' \ --silent status done # Create database books and grant rights to user chr mysqladmin --user=root \ --password=my-secret-root-pw \ --host='127.0.0.1' create books echo 'grant all on books.* to 'chr';' \ | mysql --user=root \ --password=my-secret-root-pw \ --host='127.0.0.1' books
BE CAREFUL: if you have read my post on Docker networking, you will know that using the -p switch implies that we can reach the database from every machine in the local network – so if you are not in a fully trusted environment, you definitely want to change this script and the program that will follow to use real passwords, not the ones used in this post.
Let us now see how we can access our newly created database from Python. Thanks to the DB-API, most of the code that we will have to use is actually the same as in the case of SQLite. Basically, there are two differences. First, the module that we have to import is of course different, but we can again use the alias dblib
to be able to use the remainder of the code without changes.
import mysql.connector as dblib
The second change that we need to make is the way how we obtain the initial database connection, as connecting to a MySQL database requires additional parameters like credentials. Here is the respective statement
c = dblib.connect(user='chr', password='my-secret-pw', host='127.0.0.1', database='books')
All the remainder of the code can now be taken over from the SQLite case unchanged. On my GitHub page, I have created a separate directory holding the code, and if you compare this code to the code that we used earlier for SQLite, you will see that it is in fact only those two parts of the code that are different.
Of course, there is much more that could be said about database programming with Python. In reality, each database behaves differently, and you will have to deal with specifics like auto-commit, generated fields, different types, performance, result set sizes and so forth. However, the basics are the same thanks to the DB-API, and I hope that I could give you a good starting point for further investigations.
1 Comment