We hope you find this tutorial helpful. In addition to guides like this one, we provide simple cloud infrastructure for developers. Learn more →

How To Install and Configure Sphinx on Ubuntu 14.04

PostedFebruary 18, 2015 76.8k views Miscellaneous

Introduction

Sphinx is an open source search engine that allows full-text searches. It is best known for performing searches over large data very efficiently. The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on.

Some key features of Sphinx are:

  • High indexing and searching performance
  • Advanced indexing and querying tools
  • Advanced result set post-processing
  • Proven scalability up to billions of documents, terabytes of data, and thousands of queries per second
  • Easy integration with SQL and XML data sources, and SphinxQL, SphinxAPI, or SphinxSE search interfaces
  • Easy scaling with distributed searches

In this tutorial we will set up Sphinx with MySQL server using the example SQL file included in the distribution package. It will give you a basic idea of how to use Sphinx for your project.

Prerequisites

Before you begin this guide, you will need:

  • A Ubuntu 14.04 Droplet.
  • A sudo non-root user, which you can set up by following this tutorial.

Step 1 - Installing MySQL Server

Note: You only need to complete this section if you don't already have MySQL server installed.

In this section, we will install MySQL server. MySQL will let us store a database of sample data that we can use to test Sphinx's search abilities.

Update the package list.

sudo apt-get update

Then install MySQL server.

sudo apt-get install mysql-server

Enter a password for the root user when asked, and enter the same password again to confirm.

Step 2 - Installing Sphinx

In this section, we will install Sphinx.

To install Sphinx, run:

sudo apt-get install sphinxsearch

Now you have successfully installed Sphinx on your server. Before starting the Sphinx daemon, let's configure it.

Step 3 – Creating the Test Database

In this section, we will set up a database using the example SQL file provided with the package.

Let's import an example SQL file into the database.

Log in to the MySQL server shell.

mysql -u root -p

Enter the password for the MyQL root user when asked. Your prompt will change to mysql>.

Create a database named test.

CREATE DATABASE test;

Import the SQL file.

SOURCE /etc/sphinxsearch/example.sql;

Then exit the MySQL shell.

quit

Now you should have the test database filled with data.

Step 4 – Configuring Sphinx

In this section, we will configure the Sphinx configuration file.

Create the sphinx.conf file.

sudo nano /etc/sphinxsearch/sphinx.conf

Sphinx configuration consists of 3 main blocks that are essential to run. They are index, searchd, and source. Each of these blocks is described below, and at the end of this step, the entirety of sphinx.conf is included for you to paste into the file.

The source block contains the type of source, username and password to the MySQL server. The first column of the SQL query should be a unique id. The SQL query will run on every index and dump the data to Sphinx index file. Below are descriptions of each field and the source block itself.

  • sql_host: Hostname for the MySQL host. In our example, this is the localhost. This can be a domain or IP address.
  • sql_user: Username for the MySQL login. In our example, this is root.
  • sql_pass: Password for the MySQL user. In our example, this is the root MySQL user's password
  • sql_db: Name of the database that stores data. In our example, this is test.
  • sql_query: This is the query thats dumps data to index.
source src1
{
  type          = mysql

  sql_host      = localhost
  sql_user      = root
  sql_pass      = password
  sql_db        = test
  sql_port      = 3306 # optional, default is 3306

  sql_query     = \
  SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
  FROM documents

  sql_attr_uint     = group_id
  sql_attr_timestamp    = date_added

  sql_query_info        = SELECT * FROM documents WHERE id=$id
}

The index component contains the source and the path to store the data.

  • source: Name of the source block. In our example, this is src1.
  • path: This path to save the index.
  • charset_type: This is the charset of the index. You also set this to utf-8.
index test1
{
  source            = src1
  path          = /var/lib/sphinxsearch/data/test1
  docinfo           = extern
  charset_type      = sbcs
}

The searchd component contains the port and other variables to run the Sphinx daemon.

  • listen: This is the port which sphinx daemon will run. In our example, this is 9312.
  • query_log: This path to save the query log.
  • pid_file: This is path to PID file of Sphinx daemon.
  • max_matches: Maximum number matches to return per search term.
  • seamless_rotate: Prevents searchd stalls while rotating indexes with huge amounts of data to precache.
  • preopen_indexes: Whether to forcibly preopen all indexes on startup.
  • unlink_old: Whether to unlink old index copies on successful rotation.
searchd
{
  listen            = 9312   # Port to listen on
  log           = /var/log/sphinxsearch/searchd.log
  query_log     = /var/log/sphinxsearch/query.log
  read_timeout      = 5
  max_children      = 30
  pid_file      = /var/run/sphinxsearch/searchd.pid
  max_matches       = 1000
  seamless_rotate       = 1
  preopen_indexes       = 1
  unlink_old        = 1
  binlog_path       = /var/lib/sphinxsearch/data
}

The full configuration to copy and paste is below. The only variable you need to change below is the sql_pass in the source block, which is highlighted.

source src1
{
  type          = mysql

  sql_host      = localhost
  sql_user      = root
  sql_pass      = password # change this to your root users MySQL password
  sql_db        = test
  sql_port      = 3306

  sql_query     = \
  SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
  FROM documents

  sql_attr_uint         = group_id
  sql_attr_timestamp    = date_added

  sql_query_info        = SELECT * FROM documents WHERE id=$id
}
index test1
{
  source            = src1
  path              = /var/lib/sphinxsearch/data/test1
  docinfo           = extern
  charset_type      = sbcs
}
searchd
{
  listen            = 9312
  log               = /var/log/sphinxsearch/searchd.log
  query_log         = /var/log/sphinxsearch/query.log
  read_timeout      = 5
  max_children      = 30
  pid_file          = /var/run/sphinxsearch/searchd.pid
  max_matches       = 1000
  seamless_rotate   = 1
  preopen_indexes   = 1
  unlink_old        = 1
  binlog_path       = /var/lib/sphinxsearch/data
}

Step 5 - Adding Data to the Index

In this section, we'll add data to the Sphinx index.

Add data to index using the config we created earlier.

sudo indexer --all

You should get something that looks like the following.

Sphinx 2.0.4-id64-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test1'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 25600 kb
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.005 sec, 33788 bytes/sec, 700.28 docs/sec
total 3 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 9 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

To keep the index up to date, lets create a cronjob.

Open crontab.

crontab -e

You may be asked which text editor you want to use. Choose whichever you prefer; in this tutorial, we've used nano.

Copy and paste the following at the end of the file, then save and close it.

@hourly /usr/bin/indexer --rotate --config /etc/sphinxsearch/sphinx.conf --all

The above cronjob will run on every hour and add new data to the index using the default config file.

Step 6 - Starting Sphinx

In this section, we'll start the Sphinx daemon.

By default, the Sphinx daemon is tuned off. To enable Sphinx, first open /etc/default/sphinxsearch.

sudo nano /etc/default/sphinxsearch

Find the line START=no and set it to yes.

START=yes

Then, save and close the file.

Finally, start the Sphinx daemon.

sudo service sphinxsearch start

In this section, we will test our search. Enter the following command.

search this is my test document number

You should get something that looks like the following.

Sphinx 2.0.4-id64-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
index 'test1': query 'this is my test document number ': returned 2 matches of 2 total in 0.002 sec

displaying matches:
1. document=1, weight=7431, group_id=1, date_added=Tue Dec 16 09:49:04 2014
id=1
group_id=1
group_id2=5
date_added=2014-12-16 09:49:04
title=test one
content=this is my test document number one. also checking search within phrases.
2. document=2, weight=7431, group_id=1, date_added=Tue Dec 16 09:49:04 2014
id=2
group_id=1
group_id2=6
date_added=2014-12-16 09:49:04
title=test two
content=this is my test document number two

words:
1. 'this': 4 documents, 4 hits
2. 'is': 4 documents, 4 hits
3. 'my': 2 documents, 2 hits
4. 'test': 3 documents, 5 hits
5. 'document': 2 documents, 2 hits
6. 'number': 3 documents, 3 hits

Above you can find 2 matches from our test index for our search term. You can also find matches per word.

Conclusion

In this tutorial, we have shown you a very simple search using the command line. You can also find official native SphinxAPI implementations for PHP, Perl, Python, Ruby and Java here.

If you are using PHP, here are some handy examples to test Sphinx.

By using Sphinx, you can easily add a custom search to your site. For more information on using Sphinx, visit the project website.

14 Comments

Creative Commons License