# Elasticsearch

The Elasticsearch plugin provides Rosetta with functionality relating to the Elasticsearch search engine.

# Dependency Information

<dependency>
  <groupId>com.k-int.rosetta</groupId>
  <artifactId>rosetta-elasticsearch</artifactId>
  <version>3.0.0</version>
</dependency>

# Providers

# elasticsearch

The elasticsearch Provider allows Rosetta to communicate with Elasticsearch clusters. Specifically, it interprets the object contained in the request field of a DataServiceRequest as an Elasticsearch search request, which is then executed against the _search endpoint for a specific index, or for all indices if none is specified.

The response is processed such that:

  • The _source field of each hit is registered as a provider result
  • The hits.total.value field is used as the total value within the provider statistics
  • Aggregations are processed into a custom provider field, aggregations, which is a list of Rosetta Aggregation objects.

# Properties

Property Type Default Description
host String localhost The host name of the Elasticsearch cluster.
port Integer 9200 The port of the Elasticsearch cluster.
protocol String http The protocol/scheme of the Elasticsearch cluster (usually 'http' or 'https').
path_prefix String null The path prefix of the Elasticsearch cluster (usually only required if it is behind a reverse-proxy). E.g. if the full uri for the search endpoint of an index named 'my-index' is 'https://my-server/es/my-index/_search', then the path_prefix would be '/es/'.
index String null The index against which the search request will be performed. If not specified, all indices will be searched.
skip_ping Boolean false If true, then the ping normally used to check for availability of this provider will be skipped. This may be necessary if the Elasticsearch cluster is behind a proxy/gateway with strict path limitations.
username String null The username to use in Elasticsearch requests.
password String null The password to use in Elasticsearch requests.
client_timeout Integer 3000 The client timeout in milliseconds to use in Elasticsearch requests.
aggregations_key String aggregations The custom provider metadata key to which any aggregations in the response will be bound.
date_range_format String {from} - {to} The format used for the value part of a date range aggregation entry, where the placeholders {from} and {to} are resolved to the lower and upper date range boundaries of a specific bucket, respectively.

# Example

name: my-provider
type: elasticsearch
properties:
  protocol: http
  host: localhost
  port: 9200
  path_prefix: /es/base/path/
  index: my-index
  skip_ping: true
  username: rosetta
  password: password
  client_timeout: 60000
  aggregations_key: aggs
  date_range_format: {from} to {to}

# Glyphs

# elastic-generic-search

The elastic-generic-search Glyph transforms a GenericSearchRequest into an Elasticsearch search request.

The top-level query generated by this Glyph is a boolean Elasticsearch query. Each GenericSearchRequest parameter is converted as follows:

  • The from and size parameters are interpreted literally.
  • Each value in queries is used to generate one query for each of the fields configured in the search_config.search_fields property.
    A query_string query is generated if query string syntax elements are detected (e.g. "OR" or "AND") in the value, or a match query is generated otherwise.
    All queries generated from the queries list are combined in a should clause of a boolean query, which is in turn combined with queries resulting from other parameters using a must clause of the top-level boolean query.
  • Each value in ids is used to generate one query for each of the fields configured in the search_config.id_fields property.
    A match query is generated in each case. All queries generated from the ids list are combined in a should clause of a boolean query, which is in turn combined with queries resulting from other parameters using a must clause of the top-level boolean query.
  • Each entry in the filters map is treated as a separate filter whose key is interpreted as either a field alias or a literal Elasticsearch field (see the Properties table) and the value is a list of values against which that field should match. Each filter is combined in the filter list of the top-level boolean query (so uses AND-like logic) and each filter value is combined in a the should clause of a nested boolean query (OR-like logic).
  • Each entry in the fields map generates a separate query where the key interpreted as either a field alias or a literal Elasticsearch field and the value either generates a query_string query or a match query, depending on the presence of query string syntax (like with queries). The queries generated by each entry contribute to the must clause of the top-level boolean query.
  • Each item in the aggregations list generates either a date_range aggregation if the value matches a configured value of date_aggregations[].agg_name, or a terms aggregation on the specified field otherwise. The value is interpreted as either a field alias or a literal Elasticsearch field.
  • Each entry in the autocomplete map generates a specialized terms aggregation using the key as the field/alias and the value to produce a regular expression in the include parameter of the aggregation, such that only terms containing words prefixed by the value are included in the entry list for that aggregation (note the regular expression produced is case-insensitive). This parameter can be used to produce "autocomplete" style suggestions for a given text input and field, which can be useful for faceting on a field with many distinct values, for instance.
  • Each item in the sort_clauses list generates an item in the Elasticsearch sort request parameter using path as the sort field/alias and direction as the sort direction.

# Properties

Property Type Default Description
search_config.search_fields Field[] Default Search Fields The list of Elasticsearch fields to use in queries generated from the queries parameter of the input GenericSearchRequest.
search_config.id_fields Field[] Default Id Fields The list of Elasticsearch fields to use in queries generated from the ids parameter of the input GenericSearchRequest.
search_config.fields Field[] [] Elasticsearch fields listed here will use the corresponding boost value when used in queries generated from the fields parameter of the input GenericSearchRequest.
search_config.exists_fields OptionalField[] [] A list of Elasticsearch fields upon which exists queries will be made in all requests generated by this Glyph.
path_config.paths Map {} A string-valued map whose keys are field aliases and whose values are the Elasticsearch field to which the alias is bound. If an alias appears in a request parameter (e.g. filters), then the corresponding Elasticsearch field will be de-referenced when generating the query.
path_config.require_alias Boolean false If true, then an exception will be thrown if an alias is not defined for a field referenced in the request. Otherwise, the verbatim value for that field name will be used.
aggregation_configs[].name String null The alias for this aggregation.
aggregation_configs[].size Integer null The size used with this aggregation. Falls back to the default_aggregation_size if not specified.
aggregation_configs[].auto_complete_size Integer null The size used when this aggregation is used with the autocomplete parameter. Falls back to the size value for this aggregation, then default_aggregation_size if not specified.
default_aggregation_size Integer 10 The default size parameter used in Elasticsearch aggregations.
date_aggregations[].agg_name String null The name of this date range aggregation as it appears in the response and the alias when used in a date range filter query.
date_aggregations[].path String null The Elasticsearch field upon which a date range aggregation is to be made. This field is also used when generating a date range filter query.
date_aggregations[].additional_path String null An additional Elasticsearch field to use in a date range filter query. If present, it is combined with path within a should (i.e. logical OR) clause of a Boolean query.
date_aggregations[].format String null The date format to use in the date range aggregation and date range filter.
date_aggregations[].date_ranges[].from String null The lower bound for a date range (inclusive).
date_aggregations[].date_ranges[].to String null The upper bound for a date range (exclusive).
date_range_format String {from} - {to} The date range format to use for closed date range filter queries (e.g. "1950 - 2000"). The placeholders {from} and {to} are parsed into the lower and upper date range boundaries, respectively, and each may only appear once in the format.
global_filter String null A global filter to apply to all requests in query string syntax.
match_operator Enum: { and, or } and The operator used in generated match queries.
default_operator Enum: { and, or } and The default operator used in generated query_string queries.
request_timeout Integer 3000 The request timeout in milliseconds.
# Field
Property Type Default Description
field String null The Elasticsearch field
boost Float null The boost for this field
# OptionalField

As Field but with the following additional property.

Property Type Default Description
requirement Enum: RequirementLevel none The requirement level for the exists query against this field. See RequirementLevel for descriptions of allowed values.
# RequirementLevel
Value Description
none This field is not required to exist in order to match. The presence of the field in a hit will merely boost its relevance.
should This field is not required to exist in order to match but at least one should-level field must exist.
must This field must exist in order to match.
# Default Search Fields
- field: _generic_all_std
  boost: 0.0
- field: _all
  boost: 0.0
# Default Id Fields
- field: @admin.id
  boost: 0.0
- field: admin.id
  boost: 0.0

# Example

name: my-glyph
type: elastic-generic-search
properties:
  search_config:
    search_fields:
      - field: my_search_field
        boost: 0.0
    id_fields:
      - field: my_id_field
        boost: 0.0
    fields:
      - field: my_field
        boost: 0.0
    exists_fields:
      - field: my_exists_field
        boost: 0.0
        requirement: none
  path_config:
    paths:
      alias_1: field_1
      alias_2: field_2
      alias_3: field_3
    require_alias: false
  aggregation_configs:
    - name: alias_1
      size: 20
      auto_complete_size: 100
  default_aggregation_size: 10
  date_aggregations:
    - agg_name: date_field_alias
      path: date_elasticsearch_field_1
      additional_path: date_elasticsearch_field_2
      format: dd/MM/yyyy
      date_ranges:
        - from: 1850
          to: 1900
        - from: 1900
          to: 1950
        - from: 1950
          to: 2000
  date_range_format: '{from} - {to}'
  global_filter: my_field:(value_1 OR value_2 OR value_3)
  match_operator: and
  default_operator: and
  request_timeout: 3000