# Export

The Export plugin allows data retrieved via the Rosetta data service to be exported to various export targets (e.g. a CSV file, an Elasticsearch index).

Exports are carried out asynchronously where export jobs are first requested and submitted to a blocking queue and an export job id is generated and included in the response (see the POST /export endpoint). Then the status of the export job can be polled using that job id in subsequent requests (see the GET /export/job/{id} endpoint). Once the polled job status is COMPLETED, then the export output may be retrieved from the target.

The Export plugin also provides a number of export-specific Views that allow the job submission/status endpoints to be bound to specific paths, as well as simplified/customized using request and response transforms.

# REST Endpoints

Each endpoint below is listed in the form: {HTTP request method} {HTTP request path template}, followed by a description of the endpoint.


# POST /export

Submit an export request to the export service.

# Request

Body
Schema: ExportRequest

The request body is a JSON object with a String property, type, whose value determines the schema for the remaining properties.

See individual export targets for specific type values and corresponding request schemas.

# Response

Body
Schema: ExportResponse

A JSON object that describes the outcome of the export request, including the job id (if accepted), status and a string message.

Code
200 if the export request was accepted onto the queue; 403 if the request was refused (e.g. due to the queue being full); or 500 if an error occurred.


# POST /export/pipeline

Execute a named export pipeline. The pipelines must be defined in the configuration properties.

# Request

Parameters

Name In Type Required Description
name Query String Yes The name of the pipeline to execute.

# Response

Body
Schema: ExportResponse[]

An array of ExportResponse objects, each one describing the outcome of each ExportRequest in the pipeline.


# GET /export/status

Reports the overall status of the export plugin.

# Response

Body
Schema: ExportServiceStatus

An object containing information about the status of the export service, including the total number of jobs in the queue.


# GET /export/job

Lists the currently running and completed export jobs.

# Response

Body
Schema: ExportJob[]

An array of ExportJob objects, including those in the queue and a history of completed jobs.


# GET /export/job/{id}

Retrieves a single export job by id.

# Request

Parameters

Name In Type Required Description
id Path String Yes The id of the job to retrieve.

# Response

Body
Schema: ExportJob

The ExportJob object whose id is given by the path parameter, id.


# GET /export/explain

Provides information about the job that would be submitted for a given request body (without actually submitting it).

# Request

Body
Schema: ExportRequest

The request body is a JSON object with a String property, type, whose value determines the schema for the remaining properties.

# Response

Body
Schema: ExplainResponse

A JSON object describing details of the submitted ExportRequest, including the corresponding Java class, the resolved type, the full export request (including any default values for each field) and any errors that may have occurred.


# Processes

For a number of export targets, the means by which results are retrieved from the Rosetta data service for the purposes of exporting is defined in terms on one or more processes.

In this context, a process is composed of the following steps:

  1. Execute a starting data service request against the data service to retrieve the first page of results to export, where the starting request body conforms to the CustomPagedRequest schema (i.e. has the from and size parameters).
  2. Export the current page of results.
  3. Form a new data service request by incrementing the from parameter, where the increment amount is influenced by various request fields, but by default, is equal to the size parameter of the starting request.
  4. Execute the incremented request against the data service to retrieve the next page of results.
  5. Repeat steps 2-4 until at least one of the configured exit conditions is met.

# Export Targets

The export target is determined by the type field in the export request body. The following subsections describe each type of export target and are titled for their respective type value.


# csv

Export to a CSV file, either on the local file system, in an S3 bucket, or a LocalStack S3 bucket.

Each result is exported as a separate row where the keys are interpreted as the column headers of the output CSV file. If the results are composed of complex data with nested maps and arrays, then the column headers will be the full definite JSONPaths (excluding the "$." prefix) to each leaf node in each result.

Due to the above, it is recommended that the results are converted to a CSV-appropriate JSON format (e.g. a flat map of key-value pairs) before being processed by the export service. This can be done by creating a CSV-specific Profile that has a Glyph registered in the data phase, which converts to a format amenable to CSV exports. Then, configure the export to use this Profile, either in the configuration properties, or as part of the export request.

# Request Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "type": {
      "const": "csv"
    },
    "processes": {
      "type": "array",
      "description": "An array of processes that are executed in sequence by the export service for the purposes of retrieving the results to be exported.",
      "items": {
        "type": "object",
        "properties": {
          "starting_request": {
            "type": "object",
            "description": "The initial data service request made to the Rosetta data service to retrieve the result set for export",
            "properties": {
              "profile": {
                "type": "string",
                "description": "The profile used to retrieve the result set"
              },
              "request": {
                "type": "object",
                "description": "The initial request",
                "properties": {
                  "from": {
                    "type": "integer",
                    "description": "The zero-based result offset for the initial request."
                  },
                  "size": {
                    "type": "integer",
                    "description": "The maximum number of results per page. Functions as the export batch size when 'increment_type' is set to 'size'."
                  }
                }
              }
            },
            "default": {
              "request": {
                "size": 100
              }
            }
          },
          "increment_type": {
            "description": "Determines the method of incrementing the 'from' parameter.",
            "oneOf": [
              {
                "const": "size",
                "description": "Increments the 'from' parameter by the 'size' field of the starting Rosetta request"
              },
              {
                "const": "one",
                "description": "Increments the 'from' parameter by 1"
              },
              {
                "const": "custom",
                "description": "Increments the 'from' parameter by a custom amount given by 'custom_batch_size', which may be different from the size parameter in the starting request"
              }
            ],
            "default": "size"
          },
          "custom_batch_size": {
            "type": "integer",
            "description": "Batch size when used with 'custom' increment type."
          },
          "to": {
            "type": "integer",
            "description": "Maximum value for 'from' (exclusive). Has no effect unless used with 'to' exit condition."
          },
          "exit_conditions": {
            "type": "array",
            "description": "List of exit conditions. Combined with Boolean OR.",
            "items": {
              "oneOf": [
                {
                  "const": "not_found",
                  "description": "Exits if the data response field 'found' is false"
                },
                {
                  "const": "size",
                  "description": "Exits if the number of results is smaller than the request size"
                },
                {
                  "const": "size_no_errors",
                  "description": "Exits if the result set is smaller than the request size and there are no errors"
                },
                {
                  "const": "total",
                  "description": "Exits if the total number of records is smaller than the 'from' request parameter"
                },
                {
                  "const": "to",
                  "description": "Exits if the 'from' parameter exceeds the 'to' parameter in the export request"
                }
              ]
            },
            "default": [
              "not_found",
              "size_no_errors",
              "total"
            ]
          }
        }
      },
      "minItems": 1
    },
    "skip_total_count": {
      "type": "boolean",
      "description": "Determines whether to skip populating the total count before starting the export",
      "default": false
    },
    "config": {
      "type": "object",
      "properties": {
        "export_type": {
          "description": "Determines the type of destination for the export",
          "oneOf": [
            {
              "const": "local",
              "description": "Saves to local file system"
            },
            {
              "const": "s3",
              "description": "Pushes to remote S3 bucket"
            },
            {
              "const": "localstack",
              "description": "Pushes to S3 bucket within a LocalStack deployment"
            }
          ]
        },
        "file_path": {
          "type": "string",
          "description": "The path to a directory in which the output file is saved. The file path must be specified, even if 'export_type' is 's3' or 'localstack' as the output file is created here as a temporary local file before uploading to S3."
        },
        "file_name": {
          "type": "string",
          "description": "The name of the output file to be saved (including the file extension). If this is not specified, then the job id followed by the file extension will be used."
        },
        "s3_config": {
          "type": "object",
          "description": "Configuration for the S3 connection. Required if 'export_type' is set to 's3' or 'localstack'. Ignored otherwise.",
          "properties": {
            "bucket": {
              "type": "string",
              "description": "The name of the S3 bucket"
            },
            "region": {
              "type": "string",
              "description": "The AWS region",
              "default": "eu-west-1"
            },
            "access_key_id": {
              "type": "string",
              "description": "The AWS access key id"
            },
            "secret_key": {
              "type": "string",
              "description": "The AWS secret access key"
            },
            "uri": {
              "type": "string",
              "description": "The URI of the endpoint override to use if 'export_type' is set to 'localstack'. Ignored otherwise."
            },
            "prefix": {
              "type": "string",
              "description": "The path prefix to use when uploading objects to S3"
            }
          },
          "required": [
            "bucket",
            "access_key_id",
            "secret_key"
          ]
        },
        "columns": {
          "type": "array",
          "description": "An ordered list of column headers to be included in the CSV output. If not specified, then the headers are inferred from the structure of the exported records. In this case, the order of the headers is not guaranteed to be consistent.",
          "items": {
            "type": "string"
          }
        },
        "create_directories": {
          "type": "boolean",
          "description": "Creates any non-existent directories as necessary when saving the CSV files. If set to false (default), then the directory specified in 'file_path' will have to be created manually. Otherwise, an error will occur.",
          "default": false
        },
        "deduplicate": {
          "type": "boolean",
          "description": "If true, rows will only be written to the CSV output if they are unique.",
          "default": false
        },
        "deduplication_cache_size": {
          "type": "integer",
          "description": "The size of the deduplication cache. The cache is used to speed up checking whether a CSV row is a duplicate while limiting the maximum memory footprint.",
          "default": 100
        },
        "sort_on": {
          "type": "string",
          "description": "Name of column on which to sort. If none is specified, sorting is skipped."
        },
        "add_bom": {
          "type": "boolean",
          "description": "Determines whether to add a byte order mark (BOM) to the output file",
          "default": false
        }
      },
      "required": [
        "export_type",
        "file_path"
      ]
    }
  },
  "required": [
    "type",
    "processes"
  ]
}
# Example
{
  "type": "csv",
  "processes": [
    {
      "starting_request": {
        "profile": "search",
        "request": {
          "queries": [
            "John Smith"
          ],
          "from": 0,
          "size": 100
        }
      },
      "increment_type": "size",
      "exit_conditions": [
        "not_found",
        "size_no_errors",
        "total"
      ]
    }
  ],
  "skip_total_count": false,
  "config": {
    "export_type": "s3",
    "file_path": "/rosetta/data/temp/s3_export",
    "s3_config": {
      "bucket": "my-bucket",
      "region": "eu-west-1",
      "access_key_id": "abc123",
      "secret_key": "def456",
      "prefix": "export-output/csv"
    },
    "columns": [
      "id",
      "name",
      "description",
      "birth_year",
      "death_year"
    ],
    "create_directories": true,
    "deduplicate": true,
    "sort_on": "name",
    "add_bom": true
  }
}

# json

Export to a JSON file, either on the local file system or in an S3 bucket.

The output JSON file takes the form of a JSON array whose items are the results that have been exported.

# Request Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "type": {
      "const": "json"
    },
    "processes": {
      "type": "array",
      "description": "An array of processes that are executed in sequence by the export service for the purposes of retrieving the results to be exported.",
      "items": {
        "type": "object",
        "properties": {
          "starting_request": {
            "type": "object",
            "description": "The initial data service request made to the Rosetta data service to retrieve the result set for export",
            "properties": {
              "profile": {
                "type": "string",
                "description": "The profile used to retrieve the result set"
              },
              "request": {
                "type": "object",
                "description": "The initial request",
                "properties": {
                  "from": {
                    "type": "integer",
                    "description": "The zero-based result offset for the initial request."
                  },
                  "size": {
                    "type": "integer",
                    "description": "The maximum number of results per page. Functions as the export batch size when 'increment_type' is set to 'size'."
                  }
                }
              }
            },
            "default": {
              "request": {
                "size": 100
              }
            }
          },
          "increment_type": {
            "description": "Determines the method of incrementing the 'from' parameter.",
            "oneOf": [
              {
                "const": "size",
                "description": "Increments the 'from' parameter by the 'size' field of the starting Rosetta request"
              },
              {
                "const": "one",
                "description": "Increments the 'from' parameter by 1"
              },
              {
                "const": "custom",
                "description": "Increments the 'from' parameter by a custom amount given by 'custom_batch_size', which may be different from the size parameter in the starting request"
              }
            ],
            "default": "size"
          },
          "custom_batch_size": {
            "type": "integer",
            "description": "Batch size when used with 'custom' increment type."
          },
          "to": {
            "type": "integer",
            "description": "Maximum value for 'from' (exclusive). Has no effect unless used with 'to' exit condition."
          },
          "exit_conditions": {
            "type": "array",
            "description": "List of exit conditions. Combined with Boolean OR.",
            "items": {
              "oneOf": [
                {
                  "const": "not_found",
                  "description": "Exits if the data response field 'found' is false"
                },
                {
                  "const": "size",
                  "description": "Exits if the number of results is smaller than the request size"
                },
                {
                  "const": "size_no_errors",
                  "description": "Exits if the result set is smaller than the request size and there are no errors"
                },
                {
                  "const": "total",
                  "description": "Exits if the total number of records is smaller than the 'from' request parameter"
                },
                {
                  "const": "to",
                  "description": "Exits if the 'from' parameter exceeds the 'to' parameter in the export request"
                }
              ]
            },
            "default": [
              "not_found",
              "size_no_errors",
              "total"
            ]
          }
        }
      },
      "minItems": 1
    },
    "skip_total_count": {
      "type": "boolean",
      "description": "Determines whether to skip populating the total count before starting the export",
      "default": false
    },
    "config": {
      "type": "object",
      "properties": {
        "export_type": {
          "description": "Determines the type of destination for the export",
          "oneOf": [
            {
              "const": "local",
              "description": "Saves to local file system"
            },
            {
              "const": "s3",
              "description": "Pushes to remote S3 bucket"
            }
          ]
        },
        "file_path": {
          "type": "string",
          "description": "The path to a directory in which the output file is saved. The file path must be specified, even if 'export_type' is 's3' as the output file is created here as a temporary local file before uploading to S3."
        },
        "s3_config": {
          "type": "object",
          "description": "Configuration for the S3 connection. Required if 'export_type' is set to 's3'. Ignored otherwise.",
          "properties": {
            "bucket": {
              "type": "string",
              "description": "The name of the S3 bucket"
            },
            "region": {
              "type": "string",
              "description": "The AWS region",
              "default": "eu-west-1"
            },
            "access_key_id": {
              "type": "string",
              "description": "The AWS access key id"
            },
            "secret_key": {
              "type": "string",
              "description": "The AWS secret access key"
            },
            "prefix": {
              "type": "string",
              "description": "The path prefix to use when uploading objects to S3"
            }
          },
          "required": [
            "bucket",
            "access_key_id",
            "secret_key"
          ]
        }
      },
      "required": [
        "export_type",
        "file_path"
      ]
    }
  },
  "required": [
    "type",
    "processes"
  ]
}
# Example
{
  "type": "json",
  "processes": [
    {
      "starting_request": {
        "profile": "search",
        "request": {
          "queries": [
            "John Smith"
          ],
          "from": 0,
          "size": 100
        }
      },
      "increment_type": "size",
      "exit_conditions": [
        "not_found",
        "size_no_errors",
        "total"
      ]
    }
  ],
  "skip_total_count": false,
  "config": {
    "export_type": "s3",
    "file_path": "/rosetta/data/temp/s3_export",
    "s3_config": {
      "bucket": "my-bucket",
      "region": "eu-west-1",
      "access_key_id": "abc123",
      "secret_key": "def456",
      "prefix": "export-output/json"
    }
  }
}

# elasticsearch

Export to an Elasticsearch index. Each Metadata result is indexed such that the data field is used as the document source and the id field is used as the document id, or a unique id is automatically generated if this field is not populated.

# Request Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "type": {
      "const": "elasticsearch"
    },
    "processes": {
      "type": "array",
      "description": "An array of processes that are executed in sequence by the export service for the purposes of retrieving the results to be exported.",
      "items": {
        "type": "object",
        "properties": {
          "starting_request": {
            "type": "object",
            "description": "The initial data service request made to the Rosetta data service to retrieve the result set for export",
            "properties": {
              "profile": {
                "type": "string",
                "description": "The profile used to retrieve the result set"
              },
              "request": {
                "type": "object",
                "description": "The initial request",
                "properties": {
                  "from": {
                    "type": "integer",
                    "description": "The zero-based result offset for the initial request."
                  },
                  "size": {
                    "type": "integer",
                    "description": "The maximum number of results per page. Functions as the export batch size when 'increment_type' is set to 'size'."
                  }
                }
              }
            },
            "default": {
              "request": {
                "size": 100
              }
            }
          },
          "increment_type": {
            "description": "Determines the method of incrementing the 'from' parameter.",
            "oneOf": [
              {
                "const": "size",
                "description": "Increments the 'from' parameter by the 'size' field of the starting Rosetta request"
              },
              {
                "const": "one",
                "description": "Increments the 'from' parameter by 1"
              },
              {
                "const": "custom",
                "description": "Increments the 'from' parameter by a custom amount given by 'custom_batch_size', which may be different from the size parameter in the starting request"
              }
            ],
            "default": "size"
          },
          "custom_batch_size": {
            "type": "integer",
            "description": "Batch size when used with 'custom' increment type."
          },
          "to": {
            "type": "integer",
            "description": "Maximum value for 'from' (exclusive). Has no effect unless used with 'to' exit condition."
          },
          "exit_conditions": {
            "type": "array",
            "description": "List of exit conditions. Combined with Boolean OR.",
            "items": {
              "oneOf": [
                {
                  "const": "not_found",
                  "description": "Exits if the data response field 'found' is false"
                },
                {
                  "const": "size",
                  "description": "Exits if the number of results is smaller than the request size"
                },
                {
                  "const": "size_no_errors",
                  "description": "Exits if the result set is smaller than the request size and there are no errors"
                },
                {
                  "const": "total",
                  "description": "Exits if the total number of records is smaller than the 'from' request parameter"
                },
                {
                  "const": "to",
                  "description": "Exits if the 'from' parameter exceeds the 'to' parameter in the export request"
                }
              ]
            },
            "default": [
              "not_found",
              "size_no_errors",
              "total"
            ]
          },
          "deletion_query": {
            "type": "object",
            "description": "An optional Elasticsearch query that, if specified, is used to delete documents that match the query from the Elasticsearch index before proceeding with the export."
          }
        }
      },
      "minItems": 1
    },
    "skip_total_count": {
      "type": "boolean",
      "description": "Determines whether to skip populating the total count before starting the export",
      "default": false
    },
    "batch_size": {
      "type": "integer",
      "description": "The maximum number of results to be included in each bulk index request before it is executed."
    },
    "config": {
      "type": "object",
      "properties": {
        "client": {
          "type": "object",
          "properties": {
            "url": {
              "type": "string",
              "description": "The URL for the Elasticsearch cluster to which the results are to be exported. Should either be of the form '{scheme}://{host}:{port}' or '{scheme}://{host}'."
            },
            "path_prefix": {
              "type": "string",
              "description": "The path prefix of the Elasticsearch cluster (i.e. the path to its root/info endpoint). Not required if the cluster is exposed directly. Used in cases where the cluster is mapped to a specific path behind a reverse-proxy, for instance."
            },
            "username": {
              "type": "string",
              "description": "The username to be used in Elasticsearch requests."
            },
            "password": {
              "type": "string",
              "description": "The password to be used in Elasticsearch requests."
            }
          },
          "required": [
            "url"
          ]
        },
        "index": {
          "type": "string",
          "description": "The Elasticsearch index to which the results are to be exported."
        },
        "mappings": {
          "type": "object",
          "description": "The Elasticsearch mappings to use when indexing the results. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/mapping.html."
        },
        "settings": {
          "type": "object",
          "description": "The Elasticsearch index settings to use when indexing the results. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-modules.html."
        }
      },
      "required": [
        "client",
        "index"
      ]
    }
  },
  "required": [
    "type",
    "processes"
  ]
}
# Example
{
  "type": "elasticsearch",
  "processes": [
    {
      "starting_request": {
        "profile": "search",
        "request": {
          "queries": [
            "John Smith"
          ],
          "from": 0,
          "size": 100
        }
      },
      "increment_type": "size",
      "exit_conditions": [
        "not_found",
        "size_no_errors",
        "total"
      ],
      "deletion_query": {
        "term": {
          "data_type": "person"
        }
      }
    }
  ],
  "skip_total_count": false,
  "batch_size": 50,
  "config": {
    "client": {
      "url": "http://myfakedomain:1234",
      "path_prefix": "/path/to/elasticsearch",
      "username": "my-user",
      "password": "my-pass"
    },
    "index": "target-index",
    "mappings": {
      "dynamic_templates": [
        {
          "all_strings": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ],
      "date_detection": false
    },
    "settings": {
      "index": {
        "max_result_window": 100000
      }
    }
  }
}

# Views

# export/request

An export/request View can be used to submit export requests to the export service and performs the following steps:

  1. Transforms the received HttpDataRequest into an ExportRequest using the Glyphs specified in the transforms.request configuration property.
  2. Submits the ExportRequest to the export service and receives an ExportResponse
  3. Transforms the ExportResponse into the final response using the Glyphs specified in the transforms.response configuration property.

This View can be used to simplify the submitting of export jobs by allowing the request transform to handle the population of export request fields. This can either be with static values if they are not expected to change between requests (such as output file paths or Elasticsearch client information), or with values dynamically determined by query string parameters or other aspects of the HTTP request (e.g. for fields within the starting request that are used to select the results for export), or with a combination of the two.

# Properties

See ExportView.


# export/get

An export/get View can be used to retrieve read-only information about the export service (such as job status) and performs the following steps:

  1. Transforms the received HttpDataRequest into an ExportGetRequest using the Glyphs specified in the transforms.request configuration property.
  2. Retrieves a response from one of three GET endpoints that the export service provides depending on the type field of the ExportGetRequest.
  3. Transforms the response into the final response using the Glyphs specified in the transforms.response configuration property.

# Properties

See ExportView.

# Providers

# export/job-history

The export/job-history Provider retrieves a sublist of the full list of historical export jobs determined by the from and size parameters of the received CustomPagedRequest.

# Properties

This Provider has no configuration properties.

# Example

name: my-provider
type: export/job-history

# Configuration Properties

Property Type Default Value Description
rosetta.plugins.export.service.queue_size Integer 1 Maximum number of export jobs allowed to be queued before subsequent export requests are rejected
rosetta.plugins.export.service.history_size Integer 10 Maximum number of export jobs retained in memory after completion so that the outcome can be inspected in the API
rosetta.plugins.export.service.pipelines Map<String,ExportRequest[]> null A map of 'pipelines' who keys are the names of the pipelines and the values are lists of the export requests to be executed in sequence when a pipeline is invoked.
rosetta.plugins.export.csv.profile String default The name of the root profile used when executing CSV export jobs.
rosetta.plugins.export.csv.default_config Map<String, Object> null The default CSV config to use if not specified in the request. See the schema for the config field of a CSV export request for more details.
rosetta.plugins.export.json.profile String default The name of the root profile used when executing JSON export jobs.
rosetta.plugins.export.json.default_config Map<String, Object> null The default JSON config to use if not specified in the request. See the schema for the config field of a JSON export request for more details.
rosetta.plugins.export.elasticsearch.profile String default The name of the root profile used when executing Elasticsearch export jobs.
rosetta.plugins.export.elasticsearch.default_config Map<String, Object> null The default Elasticsearch config to use if not specified in the request. See the schema for the config field of a Elasticsearch export request for more details.

# Data Structures

# ExportGetRequest

The target model for the request phase of an export/get View.

# Schema

{
   "$schema": "https://json-schema.org/draft/2020-12/schema",
   "type": "object",
   "properties": {
      "type": {
         "oneOf": [
            {
               "const": "status",
               "description": "Retrieve the status of the export service from the '/export/status' endpoint"
            },
            {
               "const": "job",
               "description": "Retrieve a single export job from the '/export/job/{id}' endpoint"
            },
            {
               "const": "jobs",
               "description": "Retrieve the full list of export jobs from the '/export/job' endpoint"
            }
         ]
      },
      "id": {
         "type": "string",
         "description": "The export job id used in the '/export/job/{id}' request. Required if 'type' is set to 'job'. Ignored otherwise."
      }
   },
   "required": [
      "type"
   ]
}

# Example

{
   "type": "job",
   "id": "abc123"
}

# ExportRequest

An export request contains all the information necessary to execute an export job and is used as the request body for the POST /export endpoint as well as the components of an export pipeline.

# Schema

{
   "$schema": "https://json-schema.org/draft/2020-12/schema",
   "type": "object",
   "properties": {
      "type": {
         "type": "string",
         "description": "The type of export. Determines the schema for the other properties of the export request."
      }
   },
   "additionalProperties": {
      "description": "For the full schema of an export request, including additional properties, see the documentation for a specific export target."
   }
}

# Example

See export targets for specific examples.


# ExportResponse

The response body for the POST /export endpoint.

# Schema

{
   "$schema": "https://json-schema.org/draft/2020-12/schema",
   "type": "object",
   "properties": {
      "job_id": {
         "type": "string",
         "description": "An automatically generated unique id for the submitted export job. Store this value in order to poll the status of this job in 'GET /export/job/{id}' requests."
      },
      "status": {
         "description": "A status string describing the outcome of export request submission.",
         "oneOf": [
            {
               "const": "accepted",
               "description": "The job was accepted by the export service and has been queued for execution."
            },
            {
               "const": "refused",
               "description": "The job was refused. Possible reasons include the queue being full or that the request was invalid."
            },
            {
               "const": "error",
               "description": "An internal server error occurred when attempting to submit the export request."
            }
         ]
      },
      "message": {
         "type": "string",
         "description": "A message describing the outcome of the export request submission attempt. Is absent if the request was accepted."
      },
      "error": {
         "type": "object",
         "description": "Describes the error if one occurred while fulfilling the request.",
         "properties": {
            "message": {
               "type": "string",
               "description": "A message describing the error."
            },
            "cause": {
               "type": "string",
               "description": "A message describing the root cause of the error."
            }
         }
      }
   }
}

# Example

{
   "job_id": "d41d8cd9-8f00-3204-a980-0998ecf8427e",
   "status": "accepted"
}

# ExportJob

Describes an export job that has been submitted to the export service. Used in the response bodies for the GET /export/job/{id} and GET /export/job endpoints.

# Schema

{
   "$schema": "https://json-schema.org/draft/2020-12/schema",
   "$defs": {
      "instant": {
         "type": "string",
         "description": "A date-time string conforming to the ISO 8601 standard."
      }
   },
   "type": "object",
   "properties": {
      "sequence": {
         "type": "integer",
         "description": "The sequence number of this job, where each job is allocated a sequence number representing the order in which the job was created since the Rosetta service was started (starting at zero)."
      },
      "id": {
         "type": "string",
         "description": "An automatically generated unique id for the export job."
      },
      "request": {
         "description": "The export request that causes the job to be created. See the schema for ExportRequest for more details."
      },
      "status": {
         "oneOf": [
            {
               "const": "QUEUED",
               "description": "The export job is queued but not yet started."
            },
            {
               "const": "RUNNING",
               "description": "The export job is current being executed."
            },
            {
               "const": "COMPLETED",
               "description": "The export job has finished execution."
            },
            {
               "const": "FAILED",
               "description": "The export job encountered an error during execution."
            }
         ]
      },
      "error": {
         "type": "object",
         "description": "Describes the error if one occurred while executing the export job.",
         "properties": {
            "message": {
               "type": "string",
               "description": "A message describing the error."
            },
            "cause": {
               "type": "string",
               "description": "A message describing the root cause of the error."
            }
         }
      },
      "created": {
         "$ref": "#/$defs/instant",
         "description": "The instant that the export job was created."
      },
      "started": {
         "$ref": "#/$defs/instant",
         "description": "The instant that the export job started being executed."
      },
      "finished": {
         "$ref": "#/$defs/instant",
         "description": "The instant that the export job finished executing (whether due to completing successfully or encountering an error)."
      },
      "duration": {
         "type": "string",
         "description": "The duration between the job starting and finishing, or between the job starting and the current time if the job is still running. Represented as an ISO 8601 duration."
      },
      "progress": {
         "type": "integer",
         "description": "The number of results that have so far been exported successfully."
      },
      "total": {
         "type": "integer",
         "description": "The total number of results to be exported."
      },
      "percentage": {
         "type": "integer",
         "description": "The floored percentage of the total number of results that have so far been exported."
      }
   }
}

# Example

{
   "id": "d41d8cd9-8f00-3204-a980-0998ecf8427e",
   "sequence": 7,
   "request": {
      "type": "csv",
      "processes": [
         {
            "starting_request": {
               "profile": "search",
               "request": {
                  "queries": [
                     "John Smith"
                  ],
                  "from": 0,
                  "size": 100
               }
            },
            "increment_type": "size",
            "exit_conditions": [
               "not_found",
               "size_no_errors",
               "total"
            ]
         }
      ],
      "skip_total_count": false,
      "config": {
         "export_type": "s3",
         "file_path": "/rosetta/data/temp/s3_export",
         "s3_config": {
            "bucket": "my-bucket",
            "region": "eu-west-1",
            "access_key_id": "abc123",
            "secret_key": "def456",
            "prefix": "export-output/csv"
         },
         "columns": [
            "id",
            "name",
            "description",
            "birth_year",
            "death_year"
         ],
         "create_directories": true,
         "deduplicate": true,
         "sort_on": "name",
         "add_bom": true
      }
   },
   "status": "RUNNING",
   "created": "2025-05-07T16:21:26.358120372Z",
   "started": "2025-05-07T16:21:26.358160291Z",
   "duration": "PT27.172225887S",
   "progress": 1200,
   "total": 1500,
   "percentage": 80
}

# ExportServiceStatus

The response body for the GET /export/status endpoint.

# Schema

{
   "$schema": "https://json-schema.org/draft/2020-12/schema",
   "type": "object",
   "properties": {
      "job_count": {
         "type": "integer",
         "description": "The number of active jobs (those currently running or queued)."
      }
   }
}

# Example

{
   "job_count": 10
}

# ExportView

The configuration schema for both export/request and export/get Views.

# Properties

As BaseView but with the following additional properties.

Property Type Default Description
media_type String application/json The media (or MIME) type of the response body for this view.
transforms.request Selector The Glyphs to apply to the incoming HttpDataRequest that yield the request to be forwarded to the export service.
transforms.response Selector The Glyphs to apply to the response from the export service in order to produce the final response body.

# Example

name: my-export-view
paths:
  - /my-export
methods:
  - POST
type: export/request
transforms:
  request:
    policy: list
    names:
      - http-to-export-request
  response:
    policy: list
    names:
      - format-export-response
media_type: text/plain

Note that the above example requires two Glyphs, http-to-export-request and format-export-response to be declared, e.g. using the property rosetta.transform.glyphs.


# ExplainResponse

The response body for the GET /export/explain endpoint.

# Schema

{
   "$schema": "https://json-schema.org/draft/2020-12/schema",
   "type": "object",
   "properties": {
      "class": {
         "type": "string",
         "description": "The resolved fully-qualified Java class of the export request."
      },
      "type": {
         "type": "string",
         "description": "The type of the export request."
      },
      "request": {
         "type": "string",
         "description": "The fully resolved export request including any default values determined by the export request class."
      },
      "error": {
         "type": "object",
         "description": "Describes the error if one occurred while fulfilling the request.",
         "properties": {
            "message": {
               "type": "string",
               "description": "A message describing the error."
            },
            "cause": {
               "type": "string",
               "description": "A message describing the root cause of the error."
            }
         }
      }
   }
}