-
ghaughian authored
updating readme updating package info perfecting logic for http solr clients for all operations renamed properties, tested cloud mode and cleaned code removed dependency on dynamic field names, updated readme now enforcing checkstyle adding solr artifact removing test cases relying on external dependencies removed unused maven dependencies, added batch mode support, all try blocks now catch eplicit exceptions, Query/UpdateResponse status codes are handled more granularly, updated readme, added sample schema.xml file to support default field names in ycsb client, updated all license headers to 2016, using SolrClient object as primary client type regardless if Solr is running in Cloud or Stand-alone mode cleaned code and config files, now accepting a solr base url property, simplified sample schema.xml file, renamed class to SolrClient, now updating documents atomically, added batch support to delete method updated new line spacing of pom file comments removed sample schema file, updated readme with more indepth explanation on running/setting up the solr-binding removed some code lines no longer in use renamed zookeeper param name, now throwing caught exceptions where appropriate, debug messages are now being logged on stderr now returning an appropriate error if we receive an unexpected response from solr server, repeated calls to getResults is no longer now using singletonMap to store update params in, fixed typo and missing id field in sample config in README
ghaughian authoredupdating readme updating package info perfecting logic for http solr clients for all operations renamed properties, tested cloud mode and cleaned code removed dependency on dynamic field names, updated readme now enforcing checkstyle adding solr artifact removing test cases relying on external dependencies removed unused maven dependencies, added batch mode support, all try blocks now catch eplicit exceptions, Query/UpdateResponse status codes are handled more granularly, updated readme, added sample schema.xml file to support default field names in ycsb client, updated all license headers to 2016, using SolrClient object as primary client type regardless if Solr is running in Cloud or Stand-alone mode cleaned code and config files, now accepting a solr base url property, simplified sample schema.xml file, renamed class to SolrClient, now updating documents atomically, added batch support to delete method updated new line spacing of pom file comments removed sample schema file, updated readme with more indepth explanation on running/setting up the solr-binding removed some code lines no longer in use renamed zookeeper param name, now throwing caught exceptions where appropriate, debug messages are now being logged on stderr now returning an appropriate error if we receive an unexpected response from solr server, repeated calls to getResults is no longer now using singletonMap to store update params in, fixed typo and missing id field in sample config in README
Quick Start
This section describes how to run YCSB on Solr running locally.
1. Set Up YCSB
Clone the YCSB git repository and compile:
git clone git://github.com/brianfrankcooper/YCSB.git
cd YCSB
mvn -pl com.yahoo.ycsb:solr-binding -am clean package
2. Set Up Solr
There must be a running Solr instance with a core/collection pre-defined and configured.
- See this API reference on how to create a core.
- See this API reference on how to create a collection in SolrCloud mode.
The conf/schema.xml
configuration file present in the core/collection just created must be configured to handle the expected field names during benchmarking.
Below illustrates a sample from a schema config file that matches the default field names used by the ycsb client:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="field0" type="text_general" indexed="true" stored="true"/>
<field name="field1" type="text_general" indexed="true" stored="true"/>
<field name="field2" type="text_general" indexed="true" stored="true"/>
<field name="field3" type="text_general" indexed="true" stored="true"/>
<field name="field4" type="text_general" indexed="true" stored="true"/>
<field name="field5" type="text_general" indexed="true" stored="true"/>
<field name="field6" type="text_general" indexed="true" stored="true"/>
<field name="field7" type="text_general" indexed="true" stored="true"/>
<field name="field8" type="text_general" indexed="true" stored="true"/>
<field name="field9" type="text_general" indexed="true" stored="true"/>
If running in SolrCloud mode ensure there is an external Zookeeper cluster running.
- See here for details on how to set up an external Zookeeper cluster.
- See here for instructions on how to use Zookeeper to manage your core/collection configuration files.
3. Run YCSB
Now you are ready to run! First, load the data:
./bin/ycsb load solr -s -P workloads/workloada -p table=<core/collection name>
Then, run the workload:
./bin/ycsb run solr -s -P workloads/workloada -p table=<core/collection name>
For further configuration see below:
Default Configuration Parameters
The default settings for the Solr node that is created is as follows:
-
solr.cloud
- A Boolean value indicating if Solr is running in SolrCloud mode. If so there must be an external Zookeeper cluster running also.
- Default value is
false
and therefore expects solr to be running in stand-alone mode.
-
solr.base.url
- The base URL in which to interface with a running Solr instance in stand-alone mode
- Default value is `http://localhost:8983/solr
-
solr.commit.within.time
- The max time in ms to wait for a commit when in batch mode, ignored otherwise
- Default value is
1000ms
-
solr.batch.mode
- Indicates if inserts/updates/deletes should be commited in batches (frequency controlled by the
solr.commit.within.time
parameter) or commit 1 document at a time. - Default value is
false
- Indicates if inserts/updates/deletes should be commited in batches (frequency controlled by the
-
solr.zookeeper.hosts
- A list of comma seperated host:port pairs of Zookeeper nodes used to manage SolrCloud configurations.
- Must be passed when in SolrCloud mode.
- Default value is
localhost:2181
Custom Configuration
If you wish to customize the settings used to create the Solr node you can created a new property file that contains your desired Solr node settings and pass it in via the parameter to 'bin/ycsb' script. Note that the default properties will be kept if you don't explicitly overwrite them.
Assuming that we have a properties file named "myproperties.data" that contains custom Solr node configuration you can execute the following to pass it into the Solr client:
./bin/ycsb run solr -P workloads/workloada -P myproperties.data -s
If you wish to use SolrCloud mode ensure a Solr cluster is running with an external zookeeper cluster and an appropriate collection has been created. Make sure to pass the following properties as parameters to 'bin/ycsb' script.
solr.cloud=true
solr.zookeeper.hosts=<zkHost2>:<zkPort1>,...,<zkHostN>:<zkPortN>