如何在 Ubuntu 16.04 上安装和设置 Sphinx
在本文中,我们将了解如何在 Ubunt 16.04 上安装和设置 Sphinx,Sphinx 是一个开源搜索引擎,允许进行完整的测试搜索,并且最适合非常有效地使用海量数据执行搜索,数据可以来自任何来源(例如 SQL 数据库、纯文本文件等)
狮身人面像的特点
高级索引和查询的好工具。
高搜索性能和索引。
推进后处理的结果。
通过高级搜索轻松扩展。
可以与 SQL 和 XML 源集成。
可以扩展以处理具有数千个查询的海量数据。
先决条件
在开始之前,我们需要一些先决条件。
我们需要一台 Ubuntu 机器,该机器上有一个具有 sudo 权限的非 root 用户。
MySQL安装在机器上。
在机器上安装 Sphinx
我们可以使用 apt-get 使用 Ubuntu 的本机软件包存储库直接安装 Sphinx,以下是安装 Sphinx 的命令。
$ sudo apt-get install sphinxsearch
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libmysqlclient20 libstemmer0d
The following NEW packages will be installed:
libmysqlclient20 libstemmer0d sphinxsearch
0 upgraded, 3 newly installed, 0 to remove and 92 not upgraded.
Need to get 2,608 kB of archives.
After this operation, 20.5 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://in.archive.ubuntu.com/ubuntuxenial/universe amd64 libstemmer0d amd 64 0+svn585-1 [62.1 kB]
Get:2 http://in.archive.ubuntu.com/ubuntuxenial-updates/main amd64 libmysqlclie nt20 amd64 5.7.15-0ubuntu0.16.04.1 [809 kB]
Get:3 http://in.archive.ubuntu.com/ubuntu xenial/universe amd64 sphinxsearch amd 64 2.2.9-1build1 [1,737 kB]
Fetched 2,608 kB in 2s (986 kB/s)
Selecting previously unselected package libstemmer0d:amd64.
(Reading database ... 117542 files and directories currently installed.)
Preparing to unpack .../libstemmer0d_0+svn585-1_amd64.deb ...
Unpacking libstemmer0d:amd64 (0+svn585-1) ...
Selecting previously unselected package libmysqlclient20:amd64.
Preparing to unpack .../libmysqlclient20_5.7.15-0ubuntu0.16.04.1_amd64.deb ...
Unpacking libmysqlclient20:amd64 (5.7.15-0ubuntu0.16.04.1) ...
Selecting previously unselected package sphinxsearch.
Preparing to unpack .../sphinxsearch_2.2.9-1build1_amd64.deb ...
Unpacking sphinxsearch (2.2.9-1build1) ...
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for systemd (229-4ubuntu4) ...
Setting up libstemmer0d:amd64 (0+svn585-1) ...
Setting up libmysqlclient20:amd64 (5.7.15-0ubuntu0.16.04.1) ...
Setting up sphinxsearch (2.2.9-1build1) ...
Adding system user `sphinxsearch' (UID 119) ...
Adding new group `sphinxsearch' (GID 125) ...
Adding new user `sphinxsearch' (UID 119) with group `sphinxsearch' ...
Not creating home directory `/var/run/sphinxsearch'.
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for systemd (229-4ubuntu4) ...
为 Sphinx 创建测试数据库
现在我们必须使用默认包中附带的示例数据创建一个测试数据库,这将允许您在后面的步骤中测试 Sphinx 搜索。
让我们登录 MySQL,我们将在其中创建测试数据库并导入示例数据库。
$ mysql –u root –p
mysql> create database test;
Query OK, 1 row affected (0.01 sec)
mysql> SOURCE /etc/sphinxsearch/example.sql;
Query OK, 0 rows affected, 1 warning (0.01 sec)
Query OK, 0 rows affected (0.03 sec)
Query OK, 4 rows affected (0.01 sec)
Records: 4 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 10 rows affected (0.01 sec)
Records: 10 Duplicates: 0 Warnings: 0
Mysql> quit
配置 Sphinx 进行搜索
在Sphinx中,我们需要编辑和配置3个主要块以适应我们的环境,其中定义了索引、搜索和源等基本功能,这些可以在配置文件sphinx.conf中找到,该文件位于/etc/sphinxsearch/sphinx。 conf.sample 文件,我们需要将现有示例配置文件复制到 /etc/sphinxsearch 文件夹中
$ cp /etc/sphinxsearch/sphinx.conf.sample /etc/sphinxsearch/sphinx.conf
$ sudo vi /etc/sphoxsearch/sphinx.conf
配置文件应如下所示,带有块
sphinx.conf 中的源块
source src1
{
type = mysql
#SQL settings (for ‘mysql’ and ‘pgsql’ types)
sql_host = localhost
sql_user = roo
tsql_pass = ubuntu
sql_db = test
sql_port = 3306 # optional, default is 3306
sql_query = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents
sql_attr_uint = group_id
sql_attr_timestamp = date_added
}
sphinx.conf 中的索引块
index test
{
source = src1
path = /var/lib/sphinxsearch/data/test
docinfo = extern
}
Searchd block in sphinx.conf
searchd
{
listen = 9312:sphinx #SphinxAPI port
listen = 9306:mysql41 #SphinxQL port
log = /var/log/sphinxsearch/searchd.log
query_log = /var/log/sphinxsearch/testquery.log
read_timeout = 5
max_children = 30
pid_file = /var/run/sphinxsearch/testsearchd.pid
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
binlog_path = /var/lib/sphinxsearch/datatest
}
有一次,我们编辑了索引 Sphinx 所需的配置。
管理 Sphinx 上的索引
在这里,我们将使用我们在前面步骤中编辑的配置文件进行索引
$ sudo indexer –all
Sphinx 2.2.9-id64-release (rel22-r5006)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.007 sec, 24319 bytes/sec, 504.03 docs/sec
total 4 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
在生产环境中,我们需要使索引保持最新,因此我们将为此创建一个 cronjob –
$ crontab –e
将以下内容添加到文件末尾。
# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h dom mon dow command
@hourly /usr/bin/indexer --rotate --config /etc/sphinxsearch/sphinx.conf –all
启动Sphinx服务
由于我们已经使用配置文件配置了索引,所以我们现在需要编辑 Sphinx 配置文件,默认情况下 Sphinx 守护进程未启动,我们需要编辑 /etc/default/sphinxsearch 中的文件
$ vi /etc/default/sphinxsearch
#
# Settings for the sphinxsearch searchd daemon
# Please read /usr/share/doc/sphinxsearch/README.Debian for details.
#
# Should sphinxsearch run automatically on startup? (default: no)
# Before doing this you might want to modify /etc/sphinxsearch/sphinx.conf
# so that it works for you.
START=yes
以下是启动 Sphinx 守护进程的命令
$ sudo systemctl restart sphinxsearch.services
重新启动 sphinxsearch 服务后,我们将使用以下命令检查状态
$ sudo systemctl status sphinxsearch.service
sphinxsearch.service - LSB: Fast standalone full-text SQL search engine
Loaded: loaded (/etc/init.d/sphinxsearch; bad; vendor preset: enabled)
Active: active (exited) since Mon 2016-09-19 13:00:20 IST; 1h 10min ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 512)
Memory: 0B
CPU: 0
Sep 19 13:00:20 ubuntu-16 systemd[1]: Starting LSB: Fast standalone full-text SQL search engine...
Sep 19 13:00:20 ubuntu-16 sphinxsearch[7804]: To enable sphinxsearch, edit /etc/default/sphinxsearch and set START=ye
Sep 19 13:00:20 ubuntu-16 systemd[1]: Started LSB: Fast standalone full-text SQL search engine.
测试 Sphinx 搜索
现在我们将使用 MySQL 接口使用端口 9306 连接到 SphinxQL。
$ mysql -h0 -P9306
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2.2.9-id64-release (rel22-r5006)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
在数据库中搜索单词“test”
mysql> SELECT * FROM test WHERE MATCH('test '); SHOW META;
+------+----------+------------+
| id | group_id | date_added |
+------+----------+------------+
| 1 | 1 | 1474272578 |
| 2 | 1 | 1474272578 |
| 4 | 2 | 1474272578 |
+------+----------+------------+
3 rows in set (0.00 sec)
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 3 |
| total_found | 3 |
| time | 0.000 |
| keyword[0] | test |
| docs[0] | 3 |
| hits[0] | 5 |
+---------------+-------+
6 rows in set (0.00 sec)
通过使用设置和配置,我们可以将Sphinx配置为一个强大的搜索引擎,它更高效并且可以处理海量数据,sphinx搜索可以处理数十亿个文档,并且可以处理TB级数据,其中每个可以执行数千个搜索查询第二。