weka.core.converters
Class DatabaseLoader

java.lang.Object
  extended by weka.core.converters.AbstractLoader
      extended by weka.core.converters.DatabaseLoader
All Implemented Interfaces:
java.io.Serializable, weka.core.converters.BatchConverter, weka.core.converters.DatabaseConverter, weka.core.converters.IncrementalConverter, weka.core.converters.Loader, weka.core.OptionHandler

public class DatabaseLoader
extends weka.core.converters.AbstractLoader
implements weka.core.converters.BatchConverter, weka.core.converters.IncrementalConverter, weka.core.converters.DatabaseConverter, weka.core.OptionHandler

Reads from a database. Can read a database in batch or incremental mode. In inremental mode MySQL and HSQLDB are supported. For all other DBMS set a pseudoincremental mode is used: In pseudo incremental mode the instances are read into main memory all at once and then incrementally provided to the user. For incremental loading the rows in the database table have to be ordered uniquely. The reason for this is that every time only a single row is fetched by extending the user" query by a LIMIT clause. If this extension is impossible instances will be loaded pseudoincrementally. To ensure that every row is fetched exaclty once, they have to ordered. Therefore a (primary) key is necessary.This approach is chosen, instead of using JDBC driver facilities, because the latter one differ betweeen different drivers. If you use the DatabaseSaver and save instances by generating automatically a primary key (its name is defined in DtabaseUtils), this primary key will be used for ordering but will not be part of the output. The user defined SQL query to extract the instances should not contain LIMIT and ORDER BY clauses (see -Q option). In addition, for incremental loading, you can define in the DatabaseUtils file how many distinct values a nominal attribute is allowed to have. If this number is exceeded, the column will become a string attribute. In batch mode no string attributes will be created. Available options are: -Q the query to specify which tuples to load
The query must have the form: SELECT *| FROM

[WHERE} (default: SELECT * FROM Results0).

-P comma separted list of columns that are a unqiue key
Only needed for incremental loading, if it cannot be detected automatically

-I
Sets incremental loading

Version:
$Revision: 1.3 $
Author:
Stefan Mutter (mutter@cs.waikato.ac.nz)
See Also:
Loader, Serialized Form

Field Summary
static int BOOL
           
static int BYTE
           
static int DATE
           
static int DOUBLE
           
static int FLOAT
           
static int INTEGER
           
static int LONG
           
private  int m_choice
          Decides which SQL statement to limit the number of rows should be used.
private  int m_counter
          Indicates how many rows has already been loaded incrementally
private  DatabaseConnection m_DataBaseConnection
          The database connection
private  weka.core.Instances m_datasetPseudoInc
          Used in pseudoincremental mode.
private  boolean m_firstTime
          Flag indicating that incremental process wants to read first instance
private  java.lang.String m_idColumn
          Name of the primary key column that will allow unique ordering necessary for incremental loading.
private  boolean m_inc
          Flag indicating that incremental mode is chosen (for command line use only)
private  java.util.Hashtable[] m_nominalIndexes
          Stores the index of a nominal value
private  weka.core.FastVector[] m_nominalStrings
          Stores the nominal value
private  int m_nominalToStringLimit
          Limit when an attribute is treated as string attribute and not as a nominal one because it has to many values.
private  weka.core.Instances m_oldStructure
          Set of instances that equals m_structure except that the auto_generated_id column is not included as an attribute
private  weka.core.FastVector m_orderBy
          Contains the name of the columns that uniquely define a row in the ResultSet.
private  boolean m_pseudoIncremental
          Flag indicating that pseudo incremental mode is used (all instances load at once into main memeory and then incrementally from main memory instead of the database)
private  java.lang.String m_query
          The user defined query to load instances.
private  int m_rowCount
          The number of rows obtained by m_query, eg the size of the ResultSet to load
protected  weka.core.Instances m_structure
          The header information that is retrieved in the beginning of incremental loading
protected static java.util.Properties PROPERTIES
          Properties associated with the database connection
protected static java.lang.String PROPERTY_FILE
          The property file for the database connection
static int SHORT
           
static int STRING
           
 
Fields inherited from class weka.core.converters.AbstractLoader
BATCH, INCREMENTAL, m_retrieval, NONE
 
Constructor Summary
DatabaseLoader()
          Constructor
 
Method Summary
private  boolean checkForKey()
          Checks for a unique key using the JDBC driver's method: getPrimaryKey(), getBestRowIdentifier().
 void connectToDatabase()
          Opens a connection to the database
private  java.lang.String endOfQuery(boolean onlyTableName)
          Returns the table name or all after the FROM clause of the user specified query to retrieve instances.
 weka.core.Instances getDataSet()
          Return the full data set in batch mode (header and all intances at once).
 java.lang.String getKeys()
          Gets the key columns' name
 weka.core.Instance getNextInstance()
          Read the data set incrementally---get the next instance in the data set or returns null if there are no more instances to get.
 java.lang.String[] getOptions()
          Gets the setting
 java.lang.String getQuery()
          Gets the query to execute against the database
private  int getRowCount()
          Counts the number of rows that are loaded from the database
 weka.core.Instances getStructure()
          Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.
 java.lang.String getUrl()
          Gets the URL
 java.lang.String getUser()
          Gets the user name
 java.lang.String globalInfo()
          Returns a string describing this Loader
 java.lang.String keysTipText()
          the tip text for this property
private  java.lang.String limitQuery(java.lang.String query, int offset, int choice)
          Used in incremental loading.
 java.util.Enumeration listOptions()
          Lists the available options
static void main(java.lang.String[] options)
          Main method.
 java.lang.String passwordTipText()
          the tip text for this property
private  void printException(java.lang.Exception ex)
          Prints an exception
 java.lang.String queryTipText()
          the tip text for this property
private  weka.core.Instance readInstance(java.sql.ResultSet rs)
          Reads an instance from a database.
 void reset()
          Resets the Loader ready to read a new data set
 void resetStructure()
          Resets the structure of instances
 void setKeys(java.lang.String keys)
          Sets the key columns of a database table
 void setOptions(java.lang.String[] options)
          Sets the options.
 void setPassword(java.lang.String password)
          Sets user password for the database
 void setQuery(java.lang.String q)
          Sets the query to execute against the database
 void setSource()
          Sets the database url using the DatabaseUtils file
 void setSource(java.sql.Connection connection)
          Sets the database url using the given connection
 void setSource(java.lang.String url)
          Sets the database url
 void setSource(java.lang.String url, java.lang.String userName, java.lang.String password)
          Sets the database url
 void setUrl(java.lang.String url)
          Sets the database URL
 void setUser(java.lang.String user)
          Sets the database user
private  void stringToNominal(java.sql.ResultSet rs, int i)
          Converts string attribute into nominal ones for an instance read during incremental loading
 java.lang.String urlTipText()
          the tip text for this property
 java.lang.String userTipText()
          the tip text for this property
 
Methods inherited from class weka.core.converters.AbstractLoader
getRetrieval, setRetrieval, setSource, setSource
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_structure

protected weka.core.Instances m_structure
The header information that is retrieved in the beginning of incremental loading


m_datasetPseudoInc

private weka.core.Instances m_datasetPseudoInc
Used in pseudoincremental mode. The whole dataset from which instances will be read incrementally.


m_oldStructure

private weka.core.Instances m_oldStructure
Set of instances that equals m_structure except that the auto_generated_id column is not included as an attribute


m_DataBaseConnection

private DatabaseConnection m_DataBaseConnection
The database connection


m_query

private java.lang.String m_query
The user defined query to load instances. (form: SELECT *| FROM [WHERE ])


m_pseudoIncremental

private boolean m_pseudoIncremental
Flag indicating that pseudo incremental mode is used (all instances load at once into main memeory and then incrementally from main memory instead of the database)


m_nominalToStringLimit

private int m_nominalToStringLimit
Limit when an attribute is treated as string attribute and not as a nominal one because it has to many values.


m_rowCount

private int m_rowCount
The number of rows obtained by m_query, eg the size of the ResultSet to load


m_counter

private int m_counter
Indicates how many rows has already been loaded incrementally


m_choice

private int m_choice
Decides which SQL statement to limit the number of rows should be used. DBMS dependent. Algorithm just tries several possibilities.


m_firstTime

private boolean m_firstTime
Flag indicating that incremental process wants to read first instance


m_inc

private boolean m_inc
Flag indicating that incremental mode is chosen (for command line use only)


m_orderBy

private weka.core.FastVector m_orderBy
Contains the name of the columns that uniquely define a row in the ResultSet. Ensures a unique ordering of instances for indremental loading.


m_nominalIndexes

private java.util.Hashtable[] m_nominalIndexes
Stores the index of a nominal value


m_nominalStrings

private weka.core.FastVector[] m_nominalStrings
Stores the nominal value


m_idColumn

private java.lang.String m_idColumn
Name of the primary key column that will allow unique ordering necessary for incremental loading. The name is specified in the DatabaseUtils file.


STRING

public static final int STRING
See Also:
Constant Field Values

BOOL

public static final int BOOL
See Also:
Constant Field Values

DOUBLE

public static final int DOUBLE
See Also:
Constant Field Values

BYTE

public static final int BYTE
See Also:
Constant Field Values

SHORT

public static final int SHORT
See Also:
Constant Field Values

INTEGER

public static final int INTEGER
See Also:
Constant Field Values

LONG

public static final int LONG
See Also:
Constant Field Values

FLOAT

public static final int FLOAT
See Also:
Constant Field Values

DATE

public static final int DATE
See Also:
Constant Field Values

PROPERTY_FILE

protected static java.lang.String PROPERTY_FILE
The property file for the database connection


PROPERTIES

protected static java.util.Properties PROPERTIES
Properties associated with the database connection

Constructor Detail

DatabaseLoader

public DatabaseLoader()
               throws java.lang.Exception
Constructor

Throws:
java.lang.Exception
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this Loader

Returns:
a description of the Loader suitable for displaying in the explorer/experimenter gui

reset

public void reset()
           throws java.lang.Exception
Resets the Loader ready to read a new data set

Specified by:
reset in interface weka.core.converters.Loader
Overrides:
reset in class weka.core.converters.AbstractLoader
Throws:
java.lang.Exception - if an error occurs while disconnecting from the database

resetStructure

public void resetStructure()
Resets the structure of instances


setQuery

public void setQuery(java.lang.String q)
Sets the query to execute against the database

Parameters:
q - the query to execute

getQuery

public java.lang.String getQuery()
Gets the query to execute against the database

Returns:
the query

queryTipText

public java.lang.String queryTipText()
the tip text for this property

Returns:
the tip text

setKeys

public void setKeys(java.lang.String keys)
Sets the key columns of a database table

Parameters:
keys - a String containing the key columns in a comma separated list.

getKeys

public java.lang.String getKeys()
Gets the key columns' name

Returns:
name of the key columns'

keysTipText

public java.lang.String keysTipText()
the tip text for this property

Returns:
the tip text

setUrl

public void setUrl(java.lang.String url)
Sets the database URL

Specified by:
setUrl in interface weka.core.converters.DatabaseConverter
Parameters:
string - with the database URL

getUrl

public java.lang.String getUrl()
Gets the URL

Specified by:
getUrl in interface weka.core.converters.DatabaseConverter
Returns:
the URL

urlTipText

public java.lang.String urlTipText()
the tip text for this property

Returns:
the tip text

setUser

public void setUser(java.lang.String user)
Sets the database user

Specified by:
setUser in interface weka.core.converters.DatabaseConverter
Parameters:
the - database user name

getUser

public java.lang.String getUser()
Gets the user name

Specified by:
getUser in interface weka.core.converters.DatabaseConverter
Returns:
name of database user

userTipText

public java.lang.String userTipText()
the tip text for this property

Returns:
the tip text

setPassword

public void setPassword(java.lang.String password)
Sets user password for the database

Specified by:
setPassword in interface weka.core.converters.DatabaseConverter
Parameters:
the - password

passwordTipText

public java.lang.String passwordTipText()
the tip text for this property

Returns:
the tip text

setSource

public void setSource(java.lang.String url,
                      java.lang.String userName,
                      java.lang.String password)
Sets the database url

Parameters:
url - the database url
userName - the user name
password - the password

setSource

public void setSource(java.sql.Connection connection)
Sets the database url using the given connection


setSource

public void setSource(java.lang.String url)
Sets the database url

Parameters:
url - the database url

setSource

public void setSource()
               throws java.lang.Exception
Sets the database url using the DatabaseUtils file

Throws:
java.lang.Exception

connectToDatabase

public void connectToDatabase()
Opens a connection to the database


endOfQuery

private java.lang.String endOfQuery(boolean onlyTableName)
Returns the table name or all after the FROM clause of the user specified query to retrieve instances.

Parameters:
onlyTableName - true if only the table name should be returned, false otherwise
Returns:
the end of the query

checkForKey

private boolean checkForKey()
                     throws java.lang.Exception
Checks for a unique key using the JDBC driver's method: getPrimaryKey(), getBestRowIdentifier(). Depending on their implementation a key can be detected. The key is needed to order the instances uniquely for an inremental loading. If an existing key cannot be detected, use -P option.

Returns:
true, if a key could have been detected, false otherwise
Throws:
java.lang.Exception - if database error occurs

stringToNominal

private void stringToNominal(java.sql.ResultSet rs,
                             int i)
                      throws java.lang.Exception
Converts string attribute into nominal ones for an instance read during incremental loading

Parameters:
rs - The result set
i - the index of the nominal value
Throws:
java.lang.Exception - exception if it cannot be converted

limitQuery

private java.lang.String limitQuery(java.lang.String query,
                                    int offset,
                                    int choice)
Used in incremental loading. Modifies the SQL statement, so that only one instance per time is tretieved and the instances are ordered uniquely.

Parameters:
query - the query to modify for incremental loading
offset - sets which tuple out of the uniquely ordered ones should be returned
choice - the kind of query that is suitable for the used DBMS
Returns:
the modified query that returns only one result tuple.

getRowCount

private int getRowCount()
                 throws java.lang.Exception
Counts the number of rows that are loaded from the database

Returns:
the entire number of rows
Throws:
java.lang.Exception - if the number of rows cannot be calculated

getStructure

public weka.core.Instances getStructure()
                                 throws java.io.IOException
Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.

Specified by:
getStructure in interface weka.core.converters.Loader
Specified by:
getStructure in class weka.core.converters.AbstractLoader
Returns:
the structure of the data set as an empty set of Instances
Throws:
java.io.IOException - if an error occurs

getDataSet

public weka.core.Instances getDataSet()
                               throws java.io.IOException
Return the full data set in batch mode (header and all intances at once).

Specified by:
getDataSet in interface weka.core.converters.Loader
Specified by:
getDataSet in class weka.core.converters.AbstractLoader
Returns:
the structure of the data set as an empty set of Instances
Throws:
java.io.IOException - if there is no source or parsing fails

readInstance

private weka.core.Instance readInstance(java.sql.ResultSet rs)
                                 throws java.lang.Exception
Reads an instance from a database.

Parameters:
rs - the ReusltSet to load
Returns:
an instance read from the database
Throws:
java.lang.Exception - if instance cannot be read

getNextInstance

public weka.core.Instance getNextInstance()
                                   throws java.io.IOException
Read the data set incrementally---get the next instance in the data set or returns null if there are no more instances to get. If the structure hasn't yet been determined by a call to getStructure then method does so before returning the next instance in the data set.

Specified by:
getNextInstance in interface weka.core.converters.Loader
Specified by:
getNextInstance in class weka.core.converters.AbstractLoader
Returns:
the next instance in the data set as an Instance object or null if there are no more instances to be read
Throws:
java.io.IOException - if there is an error during parsing

getOptions

public java.lang.String[] getOptions()
Gets the setting

Specified by:
getOptions in interface weka.core.OptionHandler
Returns:
the current setting

listOptions

public java.util.Enumeration listOptions()
Lists the available options

Specified by:
listOptions in interface weka.core.OptionHandler
Returns:
an enumeration of the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Sets the options. Available options are: -Q the query to specify which tuples to load
The query must have the form: SELECT *| FROM [WHERE} (default: SELECT * FROM Results0).

-P comma separted list of columns that are a unqiue key
Only needed for incremental loading, if it cannot be detected automatically

-I
Sets incremental loading

Specified by:
setOptions in interface weka.core.OptionHandler
Parameters:
options - the options
Throws:
java.lang.Exception - if options cannot be set

printException

private void printException(java.lang.Exception ex)
Prints an exception

Parameters:
ex - the exception to print

main

public static void main(java.lang.String[] options)
Main method.

Parameters:
options - the options