VGTech is a blog where the developers and devops of Norways most visited website share code and tricks of the trade… Read more



Are you brilliant? We're hiring. Read more

Using Elastica to query elasticsearch

PHP

The last couple of months I have been playing around with elasticsearch, an open source, distributed, RESTful search engine built on top of Apache Lucene. To interact with elasticsearch in PHP I have been using a client called Elastica. This was all fun and games until I needed to do actual queries, which is what our users will be doing most of the time.

Elastica’s documentation does not (yet) say anything about how to search using the client, so I needed to dig through the code to see if I could find some solutions. Some different methods exist, and I’ll present some of them here using different types of queries.

Note: The remainder of this post requires some understanding on how elasticsearch work. I will post more on this subject on this blog in the coming weeks.

Imagine indexing this very blog in elasticsearch, using the following mapping (with cURL):

Show code
curl -XPOST 'http://localhost:9200/blog' -d '{
  "mappings": {
    "posts": {
      "properties": {
        "id": {"type": "integer"},
        "title": {
          "type": "multi_field",
          "fields": {
            "title": {"type": "string"},
            "na": {"type": "string", "index": "not_analyzed"}
          }
        },
        "content": {"type": "string"},
        "published": {"type": "date", "format": "YYYY-MM-dd HH:mm:ss"},
        "user": {
          "type": "multi_field",
          "fields": {
            "user": {"type": "string"},
            "na": {"type": "string", "index": "not_analyzed"}
          }
        },
        "categories": {
          "type": "multi_field",
          "fields": {
            "categories": {"type": "string", "index_name": "category"},
            "na": {"type": "string", "index": "not_analyzed"}
          }
        },
        "tags": {
          "type": "multi_field",
          "fields": {
            "tags": {"type": "string", "index_name": "tag"},
            "na": {"type": "string", "index": "not_analyzed"}
          }
        }
      }
    }
  }
}'

Simple URI-based queries

Let’s start with a really simple query using cURL:

Show code
curl 'http://localhost:9200/blog/posts/_search?q=phpunit'

Using Elastica to perform the same query is pretty straight forward:

Show code
<?php
// Create the search object and inject the client
$search = new Elastica_Search(new Elastica_Client());

// Configure and execute the search
$resultSet = $search->addIndex('blog')
                    ->addType('posts')
                    ->search('phpunit');

// Loop through the results
foreach ($resultSet as $hit) {
    // ...
}

Complex queries using filters, facets, sorting and more

More complex queries in elasticsearch can be accomplished by including the query in the request body. A filtered query using facets, sorting and from/size (offset/limit) using cURL can look like this:

Show code
curl -XPOST 'http://localhost:9200/blog/posts/_search' -d '{
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query":"php zend framework",
          "default_operator": "OR",
          "fields": ["title", "content"]
        }
      },
      "filter": {
        "range": {
          "published": {
            "from": "2012-01-01 00:00:00",
            "to": "2013-01-01 00:00:00"
          }
        }
      }
    }
  },
  "facets": {
    "categories": {
      "terms": {
        "field": "categories.na"
      }
    },
    "months": {
      "date_histogram": {
        "field": "published",
        "interval": "month"
      }
    }
  },
  "sort":{
    "published": {
      "order": "desc"
    },
    "title.na": "asc"
  },
  "from": "0",
  "size": "25"
}'

Queries like the one above can be created in Elastica in many different ways. We can use the query builder and simply pass in the same query as used with cURL:

Show code
<?php
$query = new Elastica_Query_Builder('{
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query":"php zend framework",
          "default_operator": "OR",
          "fields": ["title", "content"]
        }
      },
      "filter": {
        "range": {
          "published": {
            "from": "2012-01-01 00:00:00",
            "to": "2013-01-01 00:00:00"
          }
        }
      }
    }
  },
  "facets": {
    "categories": {
      "terms": {
        "field": "categories.na"
      }
    },
    "months": {
      "date_histogram": {
        "field": "published",
        "interval": "month"
      }
    }
  },
  "sort":{
    "published": {
      "order": "desc"
    },
    "title.na": "asc"
  },
  "from": "0",
  "size": "25"
}');

// Create a raw query since the query above can't be passed directly to the search method used below
$query = new Elastica_Query($query->toArray());

// Create the search object and inject the client
$search = new Elastica_Search(new Elastica_Client());

// Configure and execute the search
$resultSet = $search->addIndex('blog')
                    ->addType('posts')
                    ->search($query);

// Loop through the results
foreach ($resultSet as $hit) {
    // ...
}

or, we can build the query using the fluent API of Elastica’s query builder (indentation used for increased readability):

Show code
<?php
$query = new Elastica_Query_Builder();
$query
  ->query()
    ->filteredQuery()
      ->query()
        ->queryString()
          ->field('query', 'php zend framework')
          ->defaultOperator('OR')
          ->fields(array('title', 'content'))
        ->queryStringClose()
      ->queryClose()
      ->filter()
        ->range()
          ->fieldOpen('published')
            ->field('from', '2012-01-01 00:00:00')
            ->field('to', '2013-01-01 00:00:00')
          ->fieldClose()
        ->rangeClose()
      ->filterClose()
    ->filteredQueryClose()
  ->queryClose()

  ->facets()
    ->fieldOpen('categories')
      ->fieldOpen('terms')
        ->field('field', 'categories.na')
      ->fieldClose()
    ->fieldClose()

    ->fieldOpen('months')
      ->fieldOpen('date_histogram')
        ->field('field', 'published')
        ->field('interval', 'month')
      ->fieldClose()
    ->fieldClose()
  ->facetsClose()

  ->sort()
    ->fieldOpen('published')
      ->field('order', 'desc')
    ->fieldClose()
    ->field('title.na', 'asc')
  ->sortClose()

  ->from(0)
  ->size(25);

// Create a raw query since the query above can't be passed directly to the search method used below
$query = new Elastica_Query($query->toArray());

// Create the search object and inject the client
$search = new Elastica_Search(new Elastica_Client());

// Configure and execute the search
$resultSet = $search->addIndex('blog')
                    ->addType('posts')
                    ->search($query);

// Loop through the results
foreach ($resultSet as $hit) {
    // ...
}

or, we can build our query using a set of objects:

Show code
<?php
// Query string
$queryString = new Elastica_Query_QueryString('php zend framework');
$queryString->setDefaultOperator('OR')
            ->setFields(array('title', 'content'));

// Filtered query using the query string and a filter
$filteredQuery = new Elastica_Query_Filtered(
    $queryString,
    new Elastica_Filter_Range('published', array(
        'from' => '2012-01-01 00:00:00',
        'to' => '2013-01-01 00:00:00',
    ))
);

// Facets
$categoryFacet = new Elastica_Facet_Terms('categories');
$categoryFacet->setField('categories.na');

$monthsFacet = new Elastica_Facet_DateHistogram('months');
$monthsFacet->setField('published')
            ->setInterval('month');

// Create the main query object
$query = new Elastica_Query($filteredQuery);
$query->setFacets(array($categoryFacet, $monthsFacet))
      ->setSort(array(
          'published' => array('order' => 'desc'),
          'title.na' => 'asc'
      ))
      ->setFrom(0)
      ->setLimit(25);

// Create the search object and inject the client
$search = new Elastica_Search(new Elastica_Client());

// Configure and execute the search
$resultSet = $search->addIndex('blog')
                    ->addType('posts')
                    ->search($query);

// Loop through the results
foreach ($resultSet as $hit) {
    // ...
}

There are probably more ways to do queries with Elastica as well, and hopefully Elastica’s docs will improve in the future. Feel free to comment if you know about other ways to do the same queries used in this post, or if you need help running other types of queries. Happy searching!

Senior developer at VG. Coder of code, drinker/brewer of beer and listener of metal/punk/hc. @cogocogo | @BeerNorway | www.beernorway.com


12 comments

  • Elastica + elasticsearch | Christer's blog o' fun

    [...] Using Elastica to query elasticsearch Like this:LikeBe the first to like this. This entry was posted in PHP and tagged elastica, elasticsearch, PHP. Bookmark the permalink. ← How to make GitHub host your PEAR channel [...]


  • teddy

    Hey Christer,

    Great post, very helpfull. I have a question about Elastica.

    I'm trying to integrate it with CI but the autoload function is not working. Were you able to integrate it with a php framework so it autoloads the classes?

    Regards and thanks for the post.!!


    • Christer Edvartsen

      Hi there,

      I have not had any autoloading issues with Elastica. If you install it via composer the autoloader generated for you should take care of it. I have also used a simple implementation of a PSR-0 autoloader like this one: https://gist.github.com/3049155

      If you go with the solution from the gist just remember to update your include_path to include the path to Elastica's lib directory.

  • teddy

    Thanks for the help Christer will try that.

    Another question, how can I do this with the query builder or the classes

    "terms":{"country_id": [177,80]}

    the field method of the query builder only accepts strings and enclose the values with ""

    thanks


  • teddy

    I read that the pull was approved, thanks Christer!!


  • Rob Masters

    Thanks for the great post, it's been very useful.

    I wondered if you'd be able to advise on how to do a similar query string search, which boosts fields such as the title (boost: 5) and content (boost: 2) AND also applies a boost based on how recent the published date (boost: ???) is. e,g. Posts should receive boosting of 5 if within the previous month, 4 if over a month ago etc and posts older than 5 months receive no additional boosting. I'm able to boost on fields such as title and content easily enough, but wouldn't have a clue how to go about boosting by date as well.

    I don't think simply sorting by published date is the answer as very relevant results should always be returned first. Any ideas?

    Thanks!


  • Rob Masters

    Ignore my last question, I've found a solution that works. In case anyone is interested:

    http://stackoverflow.com/questions/12091365/is-it-possible-to-boost-newest-items-using-elasticsearch-foqelasticabundle/12299806#12299806


  • tobias

    great post!

    short question: how about stuff like this?

    {
    "type" : "jdbc",
    "jdbc" : {
    "driver" : "com.mysql.jdbc.Driver",
    "url" : "jdbc:mysql://'$MYSQL_SERVER':3306/smartfeed",
    "user" : "'$MYSQL_SERVER_USER'",
    "password" : "'$MYSQL_SERVER_PASS'",
    "sql" : "SELECT * FROM t_data WHERE t_data_t_ad_id IN ('$1')",
    "acksql" : "UPDATE t_imported SET t_status = \"IMPORTED\" WHERE t_ad_id = '$1'",
    "strategy" : "oneshot"
    },
    "index" : {
    "index" : "t_data",
    "type" : "t_data_product",
    "bulk_size" : 10000,
    "bulk_threshold" : 50
    }
    }'


    • Christer Edvartsen

      Not sure I understand what you're asking about... Care to elaborate?

  • Tobi

    Thank you, that helped me a lot!
    Prost from another german beer brewing and hc loving coder :)


  • Tim Jason

    very insightful tutorial. Well detailed one, I was looking for just this.


Leave your comment