Skip to content
Snippets Groups Projects
  • ghaughian's avatar
    fc7cc571
    [solr] adding support for Apache Solr · fc7cc571
    ghaughian authored
    updating readme
    
    updating package info
    
    perfecting logic for http solr clients for all operations
    
    renamed properties, tested cloud mode and cleaned code
    
    removed dependency on dynamic field names, updated readme
    
    now enforcing checkstyle
    
    adding solr artifact
    
    removing test cases relying on external dependencies
    
    removed unused maven dependencies, added batch mode support, all try blocks now catch eplicit exceptions, Query/UpdateResponse status codes are handled more granularly, updated readme, added sample schema.xml file to support default field names in ycsb client, updated all license headers to 2016, using SolrClient object as primary client type regardless if Solr is running in Cloud or Stand-alone mode
    
    cleaned code and config files, now accepting a solr base url property, simplified sample schema.xml file, renamed class to SolrClient, now updating documents atomically, added batch support to delete method
    
    updated new line spacing of pom file comments
    
    removed sample schema file, updated readme with more indepth explanation on running/setting up the solr-binding
    
    removed some code lines no longer in use
    
    renamed zookeeper param name, now throwing caught exceptions where appropriate, debug messages are now being logged on stderr
    
    now returning an appropriate error if we receive an unexpected response from solr server, repeated calls to getResults is no longer
    
    now using singletonMap to store update params in, fixed typo and missing id field in sample config in README
    fc7cc571
    History
    [solr] adding support for Apache Solr
    ghaughian authored
    updating readme
    
    updating package info
    
    perfecting logic for http solr clients for all operations
    
    renamed properties, tested cloud mode and cleaned code
    
    removed dependency on dynamic field names, updated readme
    
    now enforcing checkstyle
    
    adding solr artifact
    
    removing test cases relying on external dependencies
    
    removed unused maven dependencies, added batch mode support, all try blocks now catch eplicit exceptions, Query/UpdateResponse status codes are handled more granularly, updated readme, added sample schema.xml file to support default field names in ycsb client, updated all license headers to 2016, using SolrClient object as primary client type regardless if Solr is running in Cloud or Stand-alone mode
    
    cleaned code and config files, now accepting a solr base url property, simplified sample schema.xml file, renamed class to SolrClient, now updating documents atomically, added batch support to delete method
    
    updated new line spacing of pom file comments
    
    removed sample schema file, updated readme with more indepth explanation on running/setting up the solr-binding
    
    removed some code lines no longer in use
    
    renamed zookeeper param name, now throwing caught exceptions where appropriate, debug messages are now being logged on stderr
    
    now returning an appropriate error if we receive an unexpected response from solr server, repeated calls to getResults is no longer
    
    now using singletonMap to store update params in, fixed typo and missing id field in sample config in README

Quick Start

This section describes how to run YCSB on Solr running locally.

1. Set Up YCSB

Clone the YCSB git repository and compile:

git clone git://github.com/brianfrankcooper/YCSB.git
cd YCSB
mvn -pl com.yahoo.ycsb:solr-binding -am clean package

2. Set Up Solr

There must be a running Solr instance with a core/collection pre-defined and configured.

  • See this API reference on how to create a core.
  • See this API reference on how to create a collection in SolrCloud mode.

The conf/schema.xml configuration file present in the core/collection just created must be configured to handle the expected field names during benchmarking. Below illustrates a sample from a schema config file that matches the default field names used by the ycsb client:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="field0" type="text_general" indexed="true" stored="true"/>
<field name="field1" type="text_general" indexed="true" stored="true"/>
<field name="field2" type="text_general" indexed="true" stored="true"/>
<field name="field3" type="text_general" indexed="true" stored="true"/>
<field name="field4" type="text_general" indexed="true" stored="true"/>
<field name="field5" type="text_general" indexed="true" stored="true"/>
<field name="field6" type="text_general" indexed="true" stored="true"/>
<field name="field7" type="text_general" indexed="true" stored="true"/>
<field name="field8" type="text_general" indexed="true" stored="true"/>
<field name="field9" type="text_general" indexed="true" stored="true"/>

If running in SolrCloud mode ensure there is an external Zookeeper cluster running.

  • See here for details on how to set up an external Zookeeper cluster.
  • See here for instructions on how to use Zookeeper to manage your core/collection configuration files.

3. Run YCSB

Now you are ready to run! First, load the data:

./bin/ycsb load solr -s -P workloads/workloada -p table=<core/collection name>

Then, run the workload:

./bin/ycsb run solr -s -P workloads/workloada -p table=<core/collection name>

For further configuration see below:

Default Configuration Parameters

The default settings for the Solr node that is created is as follows:

  • solr.cloud

    • A Boolean value indicating if Solr is running in SolrCloud mode. If so there must be an external Zookeeper cluster running also.
    • Default value is false and therefore expects solr to be running in stand-alone mode.
  • solr.base.url

    • The base URL in which to interface with a running Solr instance in stand-alone mode
    • Default value is `http://localhost:8983/solr
  • solr.commit.within.time

    • The max time in ms to wait for a commit when in batch mode, ignored otherwise
    • Default value is 1000ms
  • solr.batch.mode

    • Indicates if inserts/updates/deletes should be commited in batches (frequency controlled by the solr.commit.within.time parameter) or commit 1 document at a time.
    • Default value is false
  • solr.zookeeper.hosts

    • A list of comma seperated host:port pairs of Zookeeper nodes used to manage SolrCloud configurations.
    • Must be passed when in SolrCloud mode.
    • Default value is localhost:2181

Custom Configuration

If you wish to customize the settings used to create the Solr node you can created a new property file that contains your desired Solr node settings and pass it in via the parameter to 'bin/ycsb' script. Note that the default properties will be kept if you don't explicitly overwrite them.

Assuming that we have a properties file named "myproperties.data" that contains custom Solr node configuration you can execute the following to pass it into the Solr client:

./bin/ycsb run solr -P workloads/workloada -P myproperties.data -s

If you wish to use SolrCloud mode ensure a Solr cluster is running with an external zookeeper cluster and an appropriate collection has been created. Make sure to pass the following properties as parameters to 'bin/ycsb' script.

solr.cloud=true
solr.zookeeper.hosts=<zkHost2>:<zkPort1>,...,<zkHostN>:<zkPortN>