# Elasticsearch

The Elasticsearch plugin provides Rosetta with functionality relating to the Elasticsearch search engine.

# Dependency Information

<dependency>
  <groupId>com.k-int.rosetta</groupId>
  <artifactId>rosetta-elasticsearch</artifactId>
  <version>3.0.0</version>
</dependency>

# Providers

# elasticsearch

Entity Kind: Provider
Type: elasticsearch

The elasticsearch Provider allows Rosetta to communicate with Elasticsearch clusters. Specifically, it interprets the object contained in the request field of a DataServiceRequest as an Elasticsearch search request, which is then executed against the _search endpoint for a specific index, or for all indices if none is specified.

The response is processed such that:

The _source field of each hit is registered as a provider result
The hits.total.value field is used as the total value within the provider statistics
Aggregations are processed into a custom provider field, aggregations, which is a list of Rosetta Aggregation objects.

Currently only terms, date range and filters aggregations are supported.

# Properties

Property	Type	Default	Description
host	String	localhost	The host name of the Elasticsearch cluster.
port	Integer	9200	The port of the Elasticsearch cluster.
protocol	String	http	The protocol/scheme of the Elasticsearch cluster (usually 'http' or 'https').
path_prefix	String	null	The path prefix of the Elasticsearch cluster (usually only required if it is behind a reverse-proxy). E.g. if the full uri for the search endpoint of an index named 'my-index' is 'https://my-server/es/my-index/_search', then the path_prefix would be '/es/'.
index	String	null	The index against which the search request will be performed. If not specified, all indices will be searched.
skip_ping	Boolean	false	If true, then the ping normally used to check for availability of this provider will be skipped. This may be necessary if the Elasticsearch cluster is behind a proxy/gateway with strict path limitations.
username	String	null	The username to use in Elasticsearch requests.
password	String	null	The password to use in Elasticsearch requests.
client_timeout	Integer	3000	The client timeout in milliseconds to use in Elasticsearch requests.
aggregations_key	String	aggregations	The custom provider metadata key to which any aggregations in the response will be bound.
date_range_format	String	`{from} - {to}`	The format used for the `value` part of a date range aggregation entry, where the placeholders `{from}` and `{to}` are resolved to the lower and upper date range boundaries of a specific bucket, respectively.

# Example

name: my-provider
type: elasticsearch
properties:
  protocol: http
  host: localhost
  port: 9200
  path_prefix: /es/base/path/
  index: my-index
  skip_ping: true
  username: rosetta
  password: password
  client_timeout: 60000
  aggregations_key: aggs
  date_range_format: {from} to {to}

# Glyphs

# elastic-generic-search

Entity Kind: Glyph
Type: elastic-generic-search

The elastic-generic-search Glyph transforms a GenericSearchRequest into an Elasticsearch search request.

The top-level query generated by this Glyph is a boolean Elasticsearch query. Each GenericSearchRequest parameter is converted as follows:

The from and size parameters are interpreted literally.
Each value in queries is used to generate one query for each of the fields configured in the search_config.search_fields property.
A query_string query is generated if query string syntax elements are detected (e.g. "OR" or "AND") in the value, or a match query is generated otherwise.
All queries generated from the queries list are combined in a should clause of a boolean query, which is in turn combined with queries resulting from other parameters using a must clause of the top-level boolean query.
Each value in ids is used to generate one query for each of the fields configured in the search_config.id_fields property.
A match query is generated in each case. All queries generated from the ids list are combined in a should clause of a boolean query, which is in turn combined with queries resulting from other parameters using a must clause of the top-level boolean query.
Each entry in the filters map is treated as a separate filter whose key is interpreted as either a field alias or a literal Elasticsearch field (see the Properties table) and the value is a list of values against which that field should match. Each filter is combined in the filter list of the top-level boolean query (so uses AND-like logic) and each filter value is combined in a the should clause of a nested boolean query (OR-like logic).
Each entry in the fields map generates a separate query where the key interpreted as either a field alias or a literal Elasticsearch field and the value either generates a query_string query or a match query, depending on the presence of query string syntax (like with queries). The queries generated by each entry contribute to the must clause of the top-level boolean query.
Each item in the aggregations list generates either a date_range aggregation if the value matches a configured value of date_aggregations[].agg_name, or a terms aggregation on the specified field otherwise. The value is interpreted as either a field alias or a literal Elasticsearch field.
Each entry in the autocomplete map generates a specialized terms aggregation using the key as the field/alias and the value to produce a regular expression in the include parameter of the aggregation, such that only terms containing words prefixed by the value are included in the entry list for that aggregation (note the regular expression produced is case-insensitive). This parameter can be used to produce "autocomplete" style suggestions for a given text input and field, which can be useful for faceting on a field with many distinct values, for instance.
Each item in the sort_clauses list generates an item in the Elasticsearch sort request parameter using path as the sort field/alias and direction as the sort direction.

# Properties

Property	Type	Default	Description
search_config.search_fields	Field[]	Default Search Fields	The list of Elasticsearch fields to use in queries generated from the `queries` parameter of the input `GenericSearchRequest`.
search_config.id_fields	Field[]	Default Id Fields	The list of Elasticsearch fields to use in queries generated from the `ids` parameter of the input `GenericSearchRequest`.
search_config.fields	Field[]	[]	Elasticsearch fields listed here will use the corresponding boost value when used in queries generated from the `fields` parameter of the input `GenericSearchRequest`.
search_config.exists_fields	OptionalField[]	[]	A list of Elasticsearch fields upon which exists queries will be made in all requests generated by this Glyph.
path_config.paths	Map	`{}`	A string-valued map whose keys are field aliases and whose values are the Elasticsearch field to which the alias is bound. If an alias appears in a request parameter (e.g. `filters`), then the corresponding Elasticsearch field will be de-referenced when generating the query.
path_config.require_alias	Boolean	false	If true, then an exception will be thrown if an alias is not defined for a field referenced in the request. Otherwise, the verbatim value for that field name will be used.
aggregation_configs[].name	String	null	The alias for this aggregation.
aggregation_configs[].size	Integer	null	The size used with this aggregation. Falls back to the `default_aggregation_size` if not specified.
aggregation_configs[].auto_complete_size	Integer	null	The size used when this aggregation is used with the `autocomplete` parameter. Falls back to the `size` value for this aggregation, then `default_aggregation_size` if not specified.
default_aggregation_size	Integer	10	The default size parameter used in Elasticsearch aggregations.
date_aggregations[].agg_name	String	null	The name of this date range aggregation as it appears in the response and the alias when used in a date range filter query.
date_aggregations[].path	String	null	The Elasticsearch field upon which a date range aggregation is to be made. This field is also used when generating a date range filter query.
date_aggregations[].additional_path	String	null	An additional Elasticsearch field to use in a date range filter query. If present, it is combined with `path` within a `should` (i.e. logical OR) clause of a Boolean query.
date_aggregations[].format	String	null	The date format to use in the date range aggregation and date range filter.
date_aggregations[].date_ranges[].from	String	null	The lower bound for a date range (inclusive).
date_aggregations[].date_ranges[].to	String	null	The upper bound for a date range (exclusive).
date_range_format	String	`{from} - {to}`	The date range format to use for closed date range filter queries (e.g. "1950 - 2000"). The placeholders `{from}` and `{to}` are parsed into the lower and upper date range boundaries, respectively, and each may only appear once in the format.
global_filter	String	null	A global filter to apply to all requests in query string syntax.
match_operator	Enum: { and, or }	and	The operator used in generated match queries.
default_operator	Enum: { and, or }	and	The default operator used in generated query_string queries.
request_timeout	Integer	3000	The request timeout in milliseconds.

# Field

Property	Type	Default	Description
field	String	null	The Elasticsearch field
boost	Float	null	The boost for this field

# OptionalField

As Field but with the following additional property.

Property	Type	Default	Description
requirement	Enum: RequirementLevel	none	The requirement level for the exists query against this field. See RequirementLevel for descriptions of allowed values.

# RequirementLevel

Value	Description
none	This field is not required to exist in order to match. The presence of the field in a hit will merely boost its relevance.
should	This field is not required to exist in order to match but at least one should-level field must exist.
must	This field must exist in order to match.

# Default Search Fields

- field: _generic_all_std
  boost: 0.0
- field: _all
  boost: 0.0

# Default Id Fields

- field: @admin.id
  boost: 0.0
- field: admin.id
  boost: 0.0

# Example

name: my-glyph
type: elastic-generic-search
properties:
  search_config:
    search_fields:
      - field: my_search_field
        boost: 0.0
    id_fields:
      - field: my_id_field
        boost: 0.0
    fields:
      - field: my_field
        boost: 0.0
    exists_fields:
      - field: my_exists_field
        boost: 0.0
        requirement: none
  path_config:
    paths:
      alias_1: field_1
      alias_2: field_2
      alias_3: field_3
    require_alias: false
  aggregation_configs:
    - name: alias_1
      size: 20
      auto_complete_size: 100
  default_aggregation_size: 10
  date_aggregations:
    - agg_name: date_field_alias
      path: date_elasticsearch_field_1
      additional_path: date_elasticsearch_field_2
      format: dd/MM/yyyy
      date_ranges:
        - from: 1850
          to: 1900
        - from: 1900
          to: 1950
        - from: 1950
          to: 2000
  date_range_format: '{from} - {to}'
  global_filter: my_field:(value_1 OR value_2 OR value_3)
  match_operator: and
  default_operator: and
  request_timeout: 3000