Debian/Linux下Sphinx-for-chinese (中文全文搜索)的安装
Sphinx是一个基于SQL的全文检索引擎,但对中文用户来说一个致命的缺陷是不支持中文。后来在网上发现了一个基于 Sphinx 的支持切词的全文搜索引擎 sphinx-for-chinese。下载下来安装使用后发现很好用,下面介绍一下具体的安装过程。
- 下载所需的安装包 sphinx-for-chinese-0.9.9-r2117.tar.gz xdict_1.1.tar.gz 下载地址:http://code.google.com/p/sphinx-for-chinese/downloads/list
- 安装 sphinx-for-chinese
$ tar zxvf sphinx-for-chinese-0.9.9-r2117.tar.gz $ cd sphinx-for-chinese-0.9.9-r2117 $ ./configure --prefix=/usr/local/sphinx $ make $ sudo make install
- 创建test数据库,并创建sphinx用户
mysql> create database test; mysql>create user 'sphinx'@'localhost' identified by 'sphinx'; mysql>grant all privileges on test.* to 'sphinx'@'localhost';
- 指定sphinx配置文件
$ cd /usr/local/sphinx/etc $ sudo cp sphinx.conf.dist sphinx.conf
- 编辑配置文件
sql_host = localhost sql_user = sphinx sql_pass = sphinx sql_db = test sql_port = 3306 # optional, default is 3306
说明:加粗部分是修改的内容
- 解压字典文件 xdict_1.1.tar.gz
$ tar zxvf xdict_1.1.tar.gz
- 借助先前安装的 mkdict 工具生成字典
$ /usr/local/sphinx/bin/mkdict xdict.txt xdict
- 将字典 xdict 拷贝到 /usr/local/sphinx/etc目录下
- 配置中文切词
打开 sphinx.conf文件,找到 'charset_type = sbcs' 字样,将其改为
charset_type = utf-8 chinese_dictionary = /usr/local/sphinx/etc/xdict
- 编辑sphinx-for-chinese自带的SQL脚本,加入中文数据
$ vi /usr/local/sphinx/etc/example.sql
REPLACE INTO test.documents ( id, group_id, group_id2, date_added, title, content ) VALUES ( 1, 1, 5, NOW(), 'test one', 'this is my test document number one. also checking search within phrases.' ), ( 2, 1, 6, NOW(), 'test two', 'this is my test document number two' ), ( 3, 2, 7, NOW(), 'another doc', 'this is another group' ), ( 4, 2, 8, NOW(), 'doc number four', 'this is to test groups' ), ( 5, 2, 8, NOW(), 'doc number five', '一个' ), ( 6, 2, 8, NOW(), 'doc number six', '我' ), ( 7, 2, 8, NOW(), 'doc number seven', '中国人' );
说明:加粗部分是添加的中文测试数据 - 导入数据
$ mysql -usphinx -psphinx < example.sql
- 建立索引
$ sudo /usr/local/sphinx/bin/indexer --all
- 检索
$ /usr/local/sphinx/bin/search 我是一个中国人 Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file '/usr/local/sphinx/etc/sphinx.conf'... index 'test1': query '我是一个中国人 ': returned 0 matches of 0 total in 0.000 sec words: 1. '我': 1 documents, 1 hits 2. '是': 0 documents, 0 hits 3. '一个': 1 documents, 1 hits 4. '中国人': 1 documents, 1 hits index 'test1stemmed': query '我是一个中国人 ': returned 0 matches of 0 total in 0.000 sec words: 1. '我': 1 documents, 1 hits 2. '是': 0 documents, 0 hits 3. '一个': 1 documents, 1 hits 4. '中国人': 1 documents, 1 hits