By Peter Hearty, Lutris Technologies UK Ltd.
This article is one of an intermittent series of articles on Javaä database programming in general and InstantDB in particular. Sometimes there’ll be something for absolute beginners; sometimes there’ll be something for more advanced users and occasionally, a little bit of both.
Today we’ve got two sections for you:
1. The “Hello World” of JDBCä” and
2. Importing Data into InstantDB.
OK. You’ve downloaded the InstantDB .zip or .tar.gz file, you’ve followed the instructions and, with a bit of luck, a whole load of stuff has gone wizzing off the top of the screen. So, flushed with success, you sit staring at the screen and think “Now what?”.
Well, lets start by reviewing what happened when you typed the command:
java
org.enhydra.instantdb.ScriptTool
The Java interpreter will have searched through your class path and found the class: org.enhydra .instantdb.ScriptTool in the file idbexmpl.jar. It will then have loaded and started running the class at its Main method.
The ScriptTool program is nothing more than a simple interpreter. It opens a file, starts reading commands from the file and as a result will normally make calls to the Java DataBase Connectivity (JDBC) API. The default ScriptTool command file supplied in the InstantDB Examples directory (called sql1.txt) tells ScriptTool to do the following:
1. load the InstantDB JDBC driver;
2. open an InstantDB database;
3. send some Structured Query Language (SQL) commands to that database.
Although ScriptTool is aware of some of the special features of InstantDB, it could in fact be quite happily used with any relational database that has a JDBC driver, i.e. nearly all of them. We’re not going to dwell too much on the intricacies of ScriptTool in this article (that’s for a later day). The main point is that any Java program can fairly easily access any Relational Database provided there is a suitable JDBC driver available. The JDBC driver is the “glue” that binds a Java database program to an actual database.
The ScriptTool program is a very simple example of a JDBC program. However it’s also quite large and it can be a little difficult to see what it’s doing at times. Instead, we’re going to write a much smaller program that will show you just how easy it really is to write a database program in Java.
Traditionally, when introducing a new programming language or API (or at least, since Brian Kernighan’s famous “Hello World” C example), we begin with the simplest possible program. Figure 1 shows just about the simplest functioning JDBC program that you can write. Unfortunately, it doesn’t say “Hello World”, so my apologies to Mr. Kernighan.
Figure 1 - The "Hello World" Program
Let’s examine the above program line by line.
import
java.sql.*;
In order to talk to databases we need to have access to the JDBC API. This is defined in the java.sql package which is imported above. There then follows the usual syntactic elements needed to start a Java class and to define its Main method.
try {
...
} catch (Exception ex) {}
Nearly all of the JDBC API calls can generate java.sql.SQLException exceptions. We must be ready to catch these wherever possible. If we don’t, then out program will exit without closing its connection. Depending on the database, this might be undesirable. Here, we catch all exceptions, but do nothing with them.
Class.forName ("org.enhydra.instantdb.jdbc.idbDriver");
Now we get on to something a little more interesting. Except that, on the face of it, the above method invocation doesn’t seem to do anything at all! We would normally call Class.forName when we want to be able to find the java.lang.Class object for a given named class. The above line of code seems to do this and then just throw it away. So why do we need it?
The trick is, that in order to use a java.lang.Class object, the class itself must be loaded. Our call to Class.forName ensures that the class in brackets will actually get loaded into the Java Virtual Machine. This is vital for what happens next. Every JDBC driver has a Class constructor that adds the driver to java.sql.DriverManager’s list of known JDBC drivers. So what the above call has achieved is actually very simple. It has found the class: org.enhydra.instantdb.jdbc.idbDriver, which is InstantDB’s JDBC driver; loaded it into memory, and, thanks to idbDriver’s Class constructor, the driver has been registered with DriverManager.
If you wanted to talk to databases other than InstantDB you would load a different driver at this point.
Connection con = DriverManager.getConnection
("jdbc:idb:c:/InstantDB/Examples/sample.prp");
This is the line of code which obtains a Connection to the database. You need a java.sql.Connection object in order to talk to a database from Java. The above line is giving the name of a database (the bit in brackets) to DriverManager and asking it to see if one of its registered JDBC drivers recognises the name. DriverManager simply asks each JDBC driver in turn “is this one of your databases?”, and each driver either says “no” or proceeds to create a connection to the database.
But how do drivers know which databases belong to them? That’s where the database naming convention comes in. All database names in java begin with the url prefix “jdbc:” (a “url” is just another fancy name for a... well... “name”. This is much the same as the way that all web sites begin with “http:” and all FTP connections begin with “ftp:”).
What follows after the “jdbc:” is the “type” of the database. It is this database type which drivers use to identify their own particular databases. All InstantDB databases have the type “idb:”, so the InstantDB database driver will recognise the above url as the name of one of its databases.
What happens next is very much database specific. For some databases the url has to specify things like what machine the database is on, which port to connect to it on, and so on. In the case of InstantDB the bit after “jdbc:idb:” is just treated as a file name. We’ll examine the contents of InstantDB’s .prp files in detail in a later article. For now, all we need to know is that it tells InstantDB everything that it needs to know about the database it’s connecting to. If the database doesn’t actually exist, then InstantDB will create it.
By the time the driver manager returns with a Connection object, InstantDB will have: opened (or created) the database specified in c:/InstantDB/Examples/sample.prp; issued its copyright banner to standard out, and be waiting to receive commands.
Statement stmt = con.createStatement ();
In order to issue commands to the database we first have to create a java.sql.Statement object. We can get one of these just by asking the database connection to create one. That’s what’s happening in the line above. Once we have a statement, we’ll be ready to ask the database to do things for us.
stmt.execute ("DROP TABLE tmp");
Here, we’re finally telling the database to do something. In a relational database, data is stored in tables. Each table consists of rows and columns of data, very similar to a spreadsheet. The above statement is simply telling InstantDB to delete the table called “tmp”. If it exists then the table will be deleted and all of the data in all of the rows and columns in that table will be lost. If the table does not exist, then InstantDB will just ignore the command.
stmt.close();
con.close();
And that’s just about it, except that, since we’re well mannered programmers, we’ll return all of the resources we’ve been using to the database. The above lines allow InstantDB to get rid of any objects that it doesn’t need any more. In addition, InstantDB will spot that the one and only connection to the database has been closed down and so will close the database itself.
Create a file called Simple.java and cut and paste the above program. You’ll have to edit the line that provides the database url to point to the sample.prp file provided in the InstantDB Examples directory. Compile and run and, voila, you’ll have your very first database program!
Next time, we’ll expand on what we’ve done in this article and show you how to get data in and out of the database and how to display that data on the console.
Databases don’t live in isolation. Indeed, it’s very rare for any database to be populated solely by data that it receives from a single application. Being able to exchange data with other databases and importing data from other applications is one of the most important abilities of any database management system.
InstantDB has two means of importing data from other databases and applications. It can import from files, or it can import data directly from other databases. We’ll consider both methods in this article.
Throughout this article we’ll refer to the place where data is being imported from as the source and the table that it’s being import to as the target.
This is probably the most reliable and straightforward way of importing data into InstantDB.
Make sure that a JDBC driver for the source database has been loaded. In a ScriptTool script, for instance, this would mean loading two drivers at the start of the script. Something like:
d org.enhydra.instantdb.jdbc.idbDriver;
d sun.jdbc.odbc.JdbcOdbcDriver;
If the target table does not already exist then create it first using ScriptTool, DBBrowser or any other JDBC application. The target table doesn’t have to exist before you import the data, but making sure that it has been created in advance is the best way to ensure that the imported data gets interpreted as you intended. If the data does not exist before the import begins then InstantDB will try to guess the best data types for each column using the metadata from the source database and will create the table for you.
With both drivers loaded, and your target database open, it is then simply a matter of issuing something like the following SQL request to the target database:
IMPORT customers FROM URL "jdbc:odbc:DBaseEx"
USING "SELECT * FROM customer" BUFFER 2048
The above SQL performs a query against the source database, in this case the one pointed to by the url “jdbc:odbc:DBaseEx”. In this particular example, the JDBC driver being used is the jdbc-odbc bridge. ODBC would expect to find an ODBC Data Source Name (DSN) set up called “DBaseEx”. After that, it’s up to ODBC to find the database and execute the query.
Note that, using the jdbc-odbc bridge in this way allows you to import data from a very wide range of data sources on Microsoft Windows operating systems. ODBC drivers are available for all of the popular Relational Databases as well as for many simpler flat file databases. In addition, ODBC drivers are available for many proprietary application data formats such as Microsoft Excel and for standard data formats such as comma separated values.
Once the query on the target database has completed, InstantDB will retrieve rows of data from the results set and add them to the target table – in this case, the “customers” table. However, adding single rows at a time is a slow and expensive operation. So instead, InstantDB can buffer the rows to minimise the number of IO operations involved in adding data to the table. The above SQL buffers 2048 rows between every write to the target database.
If transaction processing is switched on during an import then you should set the following property in the database’s .prp file:
transImports=512
If this is not set then all the rows of data being imported will do so under a single transaction. The idea here is that, for large imports, it will be possible to commit the data after each discrete chunk has been imported. If practical though, the best thing to do during a large import is to set transaction processing off by setting:
transLevel=0
in the database properties file. If the import fails for any reason, then simply drop the table and start again.
Note that you have to restart the database after changing either of the above properties.
Almost all databases can export data in a variety of text formats. Simple text files are often the simplest, or even the sometimes the only way of transferring data between applications. InstantDB can import data from a variety of text file formats, including fixed width fields and comma separated value files (CSV). The easiest to use are CSV files.
CSV files are exactly what they sound like, rows of data, with each data item separated by a comma. The first few lines of a CSV file might look something like the data shown in figure 2.
Figure 2 - A Typical CSV File
To import data from a CSV file, all you have to do is create your target table (if it doesn’t already exist) and issue some SQL like:
IMPORT orders FROM "orders.txt" BUFFER 2048
Just as with the JDBC import example, it’s worth buffering up the data before writing it to the target database (thus the “BUFFER 2048” bit). Likewise, for a large import, you’ll either want to set the transImports or the transLevel properties in order to control the way transactions get delimited.
For many data transfer operations, CSV files are quite sufficient to get the data from the source to the target. However, there are cases where CSV files are insufficient. Maybe CSV data output isn’t an option, or perhaps commas themselves are part of the data to be imported. Whatever the cause, InstantDB can import from a variety of alternative text formats. However, you do have to work just a little bit harder: you need to create a schema file.
Figure
3 - An Example
Schema File – schema.txt
Figure 4 Figure 5 Figure 6 - An Exmaple Schema File
A schema file is just a short text file that
tells InstantDB something about the data being imported and how it will be laid
out in the source file. A typical schema file is shown in Figure 3. This shows
the schemas for two imports of the same data but in different source formats.
The first shows the schema for data to be imported in fixed width format and held in a file called “ProductsFix.txt”. Each column is listed together with its character width in the import file. For string columns the width also determines the width of the column to be created in the target table. There must be one column definition for each column being imported, even if the target table already exists, so that InstantDB can tell where one data item stops and the next one starts in the import file. The setting:
ColNameHeader=False
tells InstantDB that the names of the columns are not present in the import file itself. The first few lines of ProductsFix.txt might look like Figure 4 (the Product name field has been shortened however so that whole rows of data can be shown).
Figure 4 – Some Fixed Width Data
The second schema in the schema file shows the schema for the same set of columns, but this time held in tab delimited form in a file called “Products.txt”. This schema is very similar. However this time, only string columns need to specify a width. If the target table does not exist before the import takes place then InstantDB will create the table with column widths as specified in the schema file. An example of some data that could be imported using this schema is shown in Figure 5 (although some of the final columns have had to be dropped so as to fit the data on the page).
Figure 5 – Some Tab Delimited Data
This time, ColNameHeader is set to “true” in the schema file, so InstantDB will ignore the first line of the file.
Imports using the above schema file might look something like:
IMPORT products1 FROM “ProductsFix.txt” USING “schema.txt”
IMPORT products2 FROM “Products.txt” USING “schema.txt”
If the tables products1 and products2 did not already exist, then InstantDB would create them using the information in the schema file. Otherwise it will simply add the imported rows onto the end of the existing tables.
We’ve seen how to use CSV delimited, tab delimited and fixed width fields. If some other field delimiter is used then this can also be easily accommodated. The line:
Format=delimited (“|”)
would allow InstantDB to import data from a file containing fields separated by the “|” character.
That about wraps it up for importing data into InstantDB. In a future article we’ll look into how to get the data back out.