說明
Weka是學習Data Mining人們的好伙伴。而我們常常對它有一些特別的要求。
例如。
能不能從MS SQL SERVER撈資料呢?
答案當然是「可以」,只是撈了之後,會不會分析是另外一回事。
今天,我就來介紹如何讓Weka利用ODBC從MS SQL SERVER 2008撈資料。
設定方式
1. 首先請參考這一篇「Java - 安裝Microsoft JDBC Driver for SQL Server」,將JDBC安裝至Windows上。
2. 在Weka的安裝目錄中,找到名為「Runweka.ini」的檔案,並用記事本軟體開啟。
可看到如下內容 (請注意到最後一行,就是我們要修改的部份,也就是「cp」,代表ClassPath)
1: # Contains the commands for running Weka either with a command prompt
2:
3: # ("cmd_console") or without the command prompt ("cmd_default").
4:
5: # One can also define custom commands, which can be used with the Weka
6:
7: # launcher "RunWeka.class". E.g., to run the launcher with a setup called
8:
9: # "custom1", you only need to specify a key "cmd_custom1" which contains the
10:
11: # command specification.
12:
13: #
14:
15: # Notes:
16:
17: # - This file is not a DOS ini file, but a Java properties file.
18:
19: # - The settings listed here are key-value pairs, separated by a "=". Every
20:
21: # key can only be listed ONCE.
22:
23: #
24:
25: # Author FracPete (fracpete at waikato dot ac dot nz)
26:
27: # Version $Revision: 1.3 $
28:
29: # setups (prefixed with "cmd_")
30:
31: cmd_default=javaw -Dfile.encoding=#fileEncoding# -Xmx#maxheap# -classpath "#wekajar#;#cp#" #mainclass#
32:
33: cmd_console=cmd.exe /K start cmd.exe /K "java -Dfile.encoding=#fileEncoding# -Xmx#maxheap# -classpath \"#wekajar#;#cp#\" #mainclass#"
34:
35: cmd_explorer=javaw -Dfile.encoding=#fileEncoding# -Xmx#maxheap# -classpath "#wekajar#;#cp#" weka.gui.explorer.Explorer
36:
37: cmd_knowledgeFlow=java -Dfile.encoding=#fileEncoding# -Xmx#maxheap# -classpath "#wekajar#;#cp#" weka.gui.beans.KnowledgeFlow
38:
39: # placeholders ("#bla#" in command gets replaced with content of key "bla")
40:
41: # Note: "#wekajar#" gets replaced by the launcher class, since that jar gets
42:
43: # provided as parameter
44:
45: maxheap=1024m
46:
47: # The MDI GUI
48:
49: #mainclass=weka.gui.Main
50:
51: # The GUIChooser
52:
53: mainclass=weka.gui.GUIChooser
54:
55: # The file encoding; use "utf-8" instead of "Cp1252" to display UTF-8 characters in the
56:
57: # GUI, e.g., the Explorer
58:
59: fileEncoding=Cp1252
60:
61: # The classpath placeholder. Add any environment variables or jars to it that
62:
63: # you need for your Weka environment.
64:
65: # Example with an enviroment variable (e.g., THIRD_PARTY_LIBS):
66:
67: # cp=%CLASSPATH%;%THIRD_PARTY_LIBS%
68:
69: # Example with an extra jar (located at D:\libraries\libsvm.jar):
70:
71: # cp=%CLASSPATH%;D:\\\\libraries\\\\libsvm.jar
72:
73: # Or in order to avoid quadrupled backslashes, you can also use slashes "/":
74:
75: # cp=%CLASSPATH%;D:/libraries/libsvm.jar
76:
77: cp=%CLASSPATH
3. 在最後一行的後方加入如下JDBC的路徑「D:/SW/jdbc/sqljdbc_4.0/cht/sqljdbc4.jar」。
請注意,要記得在加入路徑之前,加入一個「;」,作為分隔之用。
這邊我指定的是sqljdbc4.jar,這是JDBC 4.0版,可支援到SQL SERVER 2012
1: cp=%CLASSPATH%;D:/SW/jdbc/sqljdbc_4.0/cht/sqljdbc4.jar
4. 進入Weka的安裝目錄,將weka.jar (為一ZIP壓縮檔)中的「weka.jar\weka\experiment\DatabaseUtils.props.mssqlserver2005」解壓縮,並放至到Weka的安裝目錄,更名為「DatabaseUtils.props」。
5. 更改「DatabsaeUtils.props」中的設定如下:
(其中的「SQLSERVERNAME」為SQL SERVER的實體名稱;「DBNAME」為資料庫名稱;「TEST」為使用者ID;「***」為密碼,請依實際情況輸入)
1: jdbcDriver= com.microsoft.sqlserver.jdbc.SQLServerDriver
2:
3: jdbcURL=jdbc:sqlserver://localhost;instanceName=SQLSERVERNAME;databaseName=DBNAME;Persist Security Info=True;User ID=TEST;Password=***
7. 另外還要在DatabaseUtils.prop中設定MS SQL Server與Weka間型別的對應,請在DatabaseUtils.prop中找到如下程式區塊
1: # specific data types
2:
3: # string, getString() = 0; --> nominal
4:
5: # boolean, getBoolean() = 1; --> nominal
6:
7: # double, getDouble() = 2; --> numeric
8:
9: # byte, getByte() = 3; --> numeric
10:
11: # short, getByte()= 4; --> numeric
12:
13: # int, getInteger() = 5; --> numeric
14:
15: # long, getLong() = 6; --> numeric
16:
17: # float, getFloat() = 7; --> numeric
18:
19: # date, getDate() = 8; --> date
20:
21: # text, getString() = 9; --> string
22:
23: # time, getTime() = 10; --> date
24:
25: varchar=0
26:
27: float=2
28:
29: tinyint=3
30:
31: int=5
這邊定義了SQL Server的資料欄位型別與Weka的型別對應,不過還是有不足的地方,所以我們繼續加入以下資料
1: nvarchar=0
2:
3: bit=1
4:
5: datetime=8
6:
7: nchar=0
這樣就可以對應大部份的型別了。
8. 啟動Weka,進入Explorer。選擇「Open DB...」 。
9. 連結字串的部份,應該會顯示第5步驟輸入的「jdbc:sqlserver://localhost;instanceName=SQLSERVERNAME;databaseName=DBNAME;Persist Security Info=True;User ID=TEST;Password=***」。
10. 再按下「User...」,輸入認證用的帳號及密碼。
11. 按下[Connect]之後,就可以在Query中輸入T-SQL來擷取資料了。
12. 大功告成。
參考
Java - 安裝Microsoft JDBC Driver for SQL Server
[Wekalist] Sybase: java.sql.SQLException: No suitable driver
Opening Windows Databases in Weka (Outdated)