20120610

Weka - 連接MS SQL SERVER 2008 R2

說明

Weka是學習Data Mining人們的好伙伴。而我們常常對它有一些特別的要求。

 

例如。

 

能不能從MS SQL SERVER撈資料呢?

 

答案當然是「可以」,只是撈了之後,會不會分析是另外一回事。

 

今天,我就來介紹如何讓Weka利用ODBC從MS SQL SERVER 2008撈資料。

 

 

設定方式

1. 首先請參考這一篇「Java - 安裝Microsoft JDBC Driver for SQL Server」,將JDBC安裝至Windows上。

2. 在Weka的安裝目錄中,找到名為「Runweka.ini」的檔案,並用記事本軟體開啟。

可看到如下內容 (請注意到最後一行,就是我們要修改的部份,也就是「cp」,代表ClassPath)

   1: # Contains the commands for running Weka either with a command prompt
   2:  
   3: # ("cmd_console") or without the command prompt ("cmd_default").
   4:  
   5: # One can also define custom commands, which can be used with the Weka
   6:  
   7: # launcher "RunWeka.class". E.g., to run the launcher with a setup called
   8:  
   9: # "custom1", you only need to specify a key "cmd_custom1" which contains the
  10:  
  11: # command specification.
  12:  
  13: #
  14:  
  15: # Notes: 
  16:  
  17: # - This file is not a DOS ini file, but a Java properties file.
  18:  
  19: # - The settings listed here are key-value pairs, separated by a "=". Every
  20:  
  21: # key can only be listed ONCE.
  22:  
  23: #
  24:  
  25: # Author FracPete (fracpete at waikato dot ac dot nz)
  26:  
  27: # Version $Revision: 1.3 $
  28:  
  29: # setups (prefixed with "cmd_")
  30:  
  31: cmd_default=javaw -Dfile.encoding=#fileEncoding# -Xmx#maxheap# -classpath "#wekajar#;#cp#" #mainclass#
  32:  
  33: cmd_console=cmd.exe /K start cmd.exe /K "java -Dfile.encoding=#fileEncoding# -Xmx#maxheap# -classpath \"#wekajar#;#cp#\" #mainclass#"
  34:  
  35: cmd_explorer=javaw -Dfile.encoding=#fileEncoding# -Xmx#maxheap# -classpath "#wekajar#;#cp#" weka.gui.explorer.Explorer
  36:  
  37: cmd_knowledgeFlow=java -Dfile.encoding=#fileEncoding# -Xmx#maxheap# -classpath "#wekajar#;#cp#" weka.gui.beans.KnowledgeFlow
  38:  
  39: # placeholders ("#bla#" in command gets replaced with content of key "bla")
  40:  
  41: # Note: "#wekajar#" gets replaced by the launcher class, since that jar gets
  42:  
  43: # provided as parameter
  44:  
  45: maxheap=1024m
  46:  
  47: # The MDI GUI
  48:  
  49: #mainclass=weka.gui.Main
  50:  
  51: # The GUIChooser
  52:  
  53: mainclass=weka.gui.GUIChooser
  54:  
  55: # The file encoding; use "utf-8" instead of "Cp1252" to display UTF-8 characters in the 
  56:  
  57: # GUI, e.g., the Explorer
  58:  
  59: fileEncoding=Cp1252
  60:  
  61: # The classpath placeholder. Add any environment variables or jars to it that
  62:  
  63: # you need for your Weka environment.
  64:  
  65: # Example with an enviroment variable (e.g., THIRD_PARTY_LIBS):
  66:  
  67: # cp=%CLASSPATH%;%THIRD_PARTY_LIBS%
  68:  
  69: # Example with an extra jar (located at D:\libraries\libsvm.jar):
  70:  
  71: # cp=%CLASSPATH%;D:\\\\libraries\\\\libsvm.jar
  72:  
  73: # Or in order to avoid quadrupled backslashes, you can also use slashes "/":
  74:  
  75: # cp=%CLASSPATH%;D:/libraries/libsvm.jar
  76:  
  77: cp=%CLASSPATH

 

3. 在最後一行的後方加入如下JDBC的路徑「D:/SW/jdbc/sqljdbc_4.0/cht/sqljdbc4.jar」。

請注意,要記得在加入路徑之前,加入一個「;」,作為分隔之用。

這邊我指定的是sqljdbc4.jar,這是JDBC 4.0版,可支援到SQL SERVER 2012



   1: cp=%CLASSPATH%;D:/SW/jdbc/sqljdbc_4.0/cht/sqljdbc4.jar

 

4. 進入Weka的安裝目錄,將weka.jar (為一ZIP壓縮檔)中的「weka.jar\weka\experiment\DatabaseUtils.props.mssqlserver2005」解壓縮,並放至到Weka的安裝目錄,更名為「DatabaseUtils.props」。

201200610-13-Weka-連接MS SQL SERVER 2008 R2-W

 

5. 更改「DatabsaeUtils.props」中的設定如下:

(其中的「SQLSERVERNAME」為SQL SERVER的實體名稱;「DBNAME」為資料庫名稱;「TEST」為使用者ID;「***」為密碼,請依實際情況輸入)



   1: jdbcDriver= com.microsoft.sqlserver.jdbc.SQLServerDriver 
   2:  
   3: jdbcURL=jdbc:sqlserver://localhost;instanceName=SQLSERVERNAME;databaseName=DBNAME;Persist Security Info=True;User ID=TEST;Password=***

 


7. 另外還要在DatabaseUtils.prop中設定MS SQL Server與Weka間型別的對應,請在DatabaseUtils.prop中找到如下程式區塊



   1: # specific data types 
   2:  
   3: # string, getString() = 0; --> nominal 
   4:  
   5: # boolean, getBoolean() = 1; --> nominal 
   6:  
   7: # double, getDouble() = 2; --> numeric 
   8:  
   9: # byte, getByte() = 3; --> numeric 
  10:  
  11: # short, getByte()= 4; --> numeric 
  12:  
  13: # int, getInteger() = 5; --> numeric 
  14:  
  15: # long, getLong() = 6; --> numeric 
  16:  
  17: # float, getFloat() = 7; --> numeric 
  18:  
  19: # date, getDate() = 8; --> date 
  20:  
  21: # text, getString() = 9; --> string 
  22:  
  23: # time, getTime() = 10; --> date 
  24:  
  25: varchar=0 
  26:  
  27: float=2 
  28:  
  29: tinyint=3 
  30:  
  31: int=5

 


這邊定義了SQL Server的資料欄位型別與Weka的型別對應,不過還是有不足的地方,所以我們繼續加入以下資料



   1: nvarchar=0 
   2:  
   3: bit=1 
   4:  
   5: datetime=8 
   6:  
   7: nchar=0

這樣就可以對應大部份的型別了。

 

8. 啟動Weka,進入Explorer。選擇「Open DB...」

201200610-9-Weka-連接MS SQL SERVER 2008 R2-W

 

9. 連結字串的部份,應該會顯示第5步驟輸入的「jdbc:sqlserver://localhost;instanceName=SQLSERVERNAME;databaseName=DBNAME;Persist Security Info=True;User ID=TEST;Password=***」。

201200610-10-Weka-連接MS SQL SERVER 2008 R2-W

 


10. 再按下「User...」,輸入認證用的帳號及密碼。


201200610-11-Weka-連接MS SQL SERVER 2008 R2-W


 

11. 按下[Connect]之後,就可以在Query中輸入T-SQL來擷取資料了。

 

12. 大功告成。

 

 

參考

Java - 安裝Microsoft JDBC Driver for SQL Server

建立連接 URL

[Wekalist] Sybase: java.sql.SQLException: No suitable driver

Opening Windows Databases in Weka (Outdated)

Windows Databases

weka_experiment_DatabaseUtils.props

沒有留言: