The Pureinsights Discovery Platform is a cloud-based family of products, aimed to help with the creation, maintenance and monitoring of production-ready AI applications.
There is not a single recipe that fits all: each collection, each use case and each implementation is different in its own way, and evolving from a prototype to a live solution is not a trivial task. Here is where Discovery shows its value:
-
An architecture that follows a pay-as-you-go model for only the resources required by the specific use case.
-
A no-code approach with building blocks configured through finite-state machines that provides flexibility while reducing the hassle of developer-related tasks such as error handling and orchestration of services.
-
Changes in configuration and tuning up happens on-the-fly, without the downtime of redeploying.
-
Data processing pipelines to extract, transform and load collections from different sources (ETL).
-
Data storage as a "push" model alternative to traditional ETL solutions.
-
Custom REST endpoints with advanced capabilities that adapts to the complexities of processing a query with minimum overhead.
-
Observability with standard monitoring and alerting tools.
What’s new in 2.2.0?
Troubleshooting tool for Ingested Records
When a seed execution is completed successfully, its final state is stored and can be accessed through the Records API, which helps to keep track of the processed records since it allows to know the status of each record, as well as the errors, if there was any during its execution, and other relevant metadata.
Support for connections with mTLS
Now you can configure a Server to communicate to services protected with mTLS. Similarly to the already supported certificates, now you can add keys and the certificates associated to those keys, to stablish a successful connection with your favorite services protected with mTLS.
Vespa integration
We added two new Discovery Ingestion and Discovery QueryFlow components to ingest and query documents from a Vespa app, either in Vespa Cloud or self hosted.
Troubleshooting Tool for Seed Executions
A new tool was added to help troubleshoot seed executions by providing a summary of the status of all jobs involved. It allows users to quickly identify how many jobs are in states such as DONE, FAILED, RUNNING, or CREATED, making it easier to monitor execution progress and spot potential issues during runtime.
Breaking changes
The configuration for Ingestion batch policies has been restructured
In order to have more intuitive configurations for record grouping mechanisms in Discovery Ingestion, the policies referring to an outgoing batch of records have been included as part of a new type of policy, referred as Outbound Policy
. Aditionally, policies regarding records are now recommended to be included as part of a recordPolicy
configuration, tough the previous record
denomation is still accepted.
Please note that this will make Batch Policies
defined outside an Outbound Policy
no longer affect outbound batches of records, and in fact be ignored. This change pertains to both the global recordPolicy
(or record
) setting of a Seed and its optional override in a Pipeline processor state.
In order to include settings for outgoing record batches, the Batch Policies
must now be included as follows:
{
"recordPolicy": {
"outboundPolicy": {
"batchPolicy": {
<Batch configuration>
}
},
"batchPolicy": {
// Settings found here will now be ignored
// because they aren't included as part of the "outboundPolicy"
}
},
"batchPolicy": {
// These settings will also be ignored
}
}
It’s recommended to check the affected entities respective sections for deatils of how outbound batches' settings are included in the general configurations.
The Ingestion component Engine Score has been renamed to Insights
The Ingestion component Engine Score is now named Insights. We change the purpose of this component to provide more actions that generate information that can be used for later analysis, so a rebranding of the component was needed for a more accurate name.
Now the configuration for an engine score action will look a little bit different:
{
"type": "insights",
"name": "My component",
"config": {
"action": "engine-score:non-contextual",
...
},
...
}
Please notice the type
is now insights
instead of engine-score
and the name of the action has changed to engine-score:non-contextual
.
Basics
Discovery is a platform composed by 3 products:
-
Discovery Ingestion: a distributed ETL, supported by a finite-state machine that represents the data transformation and loading.
-
Discovery Staging: an abstraction for a Document Database, where collections are represented as buckets with an HTTP interface for simple CRUD operations.
-
Discovery QueryFlow: a configurable REST API with custom endpoints, supported by a finite-state machine that represents the query processing.
All products are supported by the Discovery Core Libraries and API: a common layer for shared concepts and configurations.
Each independent product has its own value and can exist by itself (although both Discovery Ingestion and Discovery QueryFlow require Discovery Staging for their internal operations). However, using them together brings all the tools to create an end-to-end solution.
Architecture
One of the main goals of Discovery, is being cloud agnostic: using the native resources of each cloud provider without affecting the application itself.
This decision has a direct impact in the overall architecture, as external services should be abstracted in a way they can be mapped by a managed service available on each cloud provider.
-
The relational database is the main storage for configurations, metadata and the state of the multiple executions.
-
The document database is the service abstracted by Discovery Staging. The default implementation for all cloud providers is MongoDB Atlas.
-
The object storage supports the file server and data processing of binary files in Discovery Ingestion.
-
The message queue handles asynchronous communication between the components.
-
The secrets manager stores secure information such as passwords and credentials. The default implementation is an internal secrets provider.
The Discovery Core Libraries and API is an interface for every Discovery product to interact with these external services. However, despite of this standardization, each independent product is designed based on its own needs: Discovery QueryFlow and Discovery Staging are monoliths due to their need of fast responses, but Discovery Ingestion follows an event-driven architecture that targets scalability with distributed components processing data in parallel.
AWS
Discovery can be integrated with Amazon Web Services (AWS) with managed services that natively support the installation requirements.
The application is deployed using Amazon Elastic Container Service and AWS Lambda in a private subnet, later exposed with Amazon API Gateway and Amazon Route 53.
Other services such as Amazon EventBridge and Amazon CloudWatch support the correct control, autoscaling and monitoring of all components.
Monitoring
All Discovery products constantly publish metrics to a selected monitoring and observability tool:
Integrations
Integrations to external services are represented by Servers, optionally authenticated with a Credential that references an encrypted Secret.
They are re-usable configurations that can later be referenced in Discovery Ingestion Components and Discovery QueryFlow Components.
Connecting to an external service
Servers API
$ curl --request POST 'core-api:8080/v2/server' --data '{ ... }'
$ curl --request GET 'core-api:8080/v2/server'
$ curl --request GET 'core-api:8080/v2/server/{id}'
$ curl --request GET 'core-api:8080/v2/server/{id}/ping'
Note
|
Not all integrations support the |
$ curl --request PUT 'core-api:8080/v2/server/{id}' --data '{ ... }'
Note
|
The type of an existing server can’t be modified. |
$ curl --request DELETE 'core-api:8080/v2/server/{id}'
$ curl --request POST 'core-api:8080/v2/server/{id}/clone?name=clone-new-name'
Query Parameters
name
-
(Required, String) The name of the new Server
$ curl --request POST 'core-api:8080/v2/search/search' --data '{ ... }'
Body
The body payload is a DSL Filter to apply to the search
$ curl --request GET 'core-api:8080/v2/search/autocomplete?q=value'
Query Parameters
q
-
(Required, String) The query to execute the autocomplete search
A server has the properties to create an authenticated connection to an external service:
{
"type": "my-external-service",
"name": "My External Service Configuration",
"config": {
...
},
...
}
type
-
(Required, String) The type of external supported service
name
-
(Required, String) The unique name to identify the external service
description
-
(Optional, String) The description for the configuration
config
-
(Required, Object) The configuration to connect to the external service
credential
-
(Optional, UUID) The ID of the credential to authenticate in the external service
certificates
-
(Optional, Object) The custom certificates for encrypted connection (SSL/TLS), loaded using the file storage. The value can be either the string with the location of the certificate, or a detailed configuration
Details
{ "certificates": { "sampleA": { "type": "X.509", "value": "certificates/sample.crt" }, "sampleB": { "value": "certificates/sample.crt" }, "sampleC": "certificates/sample.crt" }, ... }
type
-
(Optional, String) The type of certificate. Defaults to
X.509
value
-
(Required, String) The location of the certificate in the file storage
NoteThe existence of the certificate will be verified.
keys
-
(Optional, Object) The keys with their respective certificate chain for encrypted connection (mTLS), loaded using the file storage.
Details
{ "keys": { "keyA": { "value": "keys/sample.pem", "certificateChain": [ { "type": "X.509", "value": "certificates/sample.pem" }, { "value": "certificates/sample.pem" }, "certificates/sample.pem" ] } } }
value
-
(Required, String) The location of the key in the file storage. The contents are expected to be PKCS8 encoded key in PEM format.
certificateChain
-
(Required, List) One or more public certificates associated to the key
NoteThe existence of the key will be verified.
circuitBreaker
-
(Optional, Object) The circuit breaker configuration as a mechanism to handle request errors and limitations of an external service
Details
{ "circuitBreaker": { "waitInOpenState": "90s", "maxTestRequests": 1 }, ... }
waitInOpenState
-
(Optional, Duration) The maximum time to wait in
OPEN
state before transitioning toHALF_OPEN
state maxTestRequests
-
(Optional, Integer) The maximum number of requests on
HALF_OPEN
state
NoteWhile all server types can be configured with a circuit breaker, not all may necessarily utilize it.
labels
-
(Optional, Array of Objects) The labels for the configuration
Details
{ "labels": [ { "key": "My Label Key", "value": "My Label Value" }, ... ], ... }
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Authentication and Credentials
Credentials API
$ curl --request POST 'core-api:8080/v2/credential' --data '{ ... }'
$ curl --request GET 'core-api:8080/v2/credential'
$ curl --request GET 'core-api:8080/v2/credential/{id}'
Note
|
Reading credentials will never expose the referenced secret. |
$ curl --request PUT 'core-api:8080/v2/credential/{id}' --data '{ ... }'
Note
|
The type of an existing credential can’t be modified. |
$ curl --request DELETE 'core-api:8080/v2/credential/{id}'
$ curl --request POST 'core-api:8080/v2/credential/{id}/clone?name=clone-new-name'
Query Parameters
name
-
(Required, String) The name of the new Server
$ curl --request POST 'core-api:8080/v2/credential/search' --data '{ ... }'
Body
The body payload is a DSL Filter to apply to the search
$ curl --request GET 'core-api:8080/v2/credential/autocomplete?q=value'
Query Parameters
q
-
(Required, String) The query to execute the autocomplete search
A credential references a secret with the authentication parameters required to connect to an external service:
{
"type": "my-external-service",
"name": "My External Service Credential",
"secret": "MY_SECRET",
...
}
When the secret provider is internal, it is possible to create a secret during the creation of the credential:
{
"type": "my-external-service",
"name": "My External Service Credential",
"secret": {
"name": "MY_SECRET",
"content": {
"username": <username>,
"password": <password>,
},
...
},
...
}
Note
|
It is assumed that the referenced secret exists, and has the correct JSON-formatted authentication information. However, this is only a soft-reference, and any deletion of secret keys won’t be noticed until the next time it is required. |
type
-
(Required, String) The type of credentials for the external supported service
name
-
(Required, String) The unique name to identify the credentials
description
-
(Optional, String) The description for the configuration
secret
-
(Required, String or Object) Either the secret key to connect to the external service, or an object with the authentication details
labels
-
(Optional, Array of Objects) The labels for the configuration
Details
{ "labels": [ { "key": "My Label Key", "value": "My Label Value" }, ... ], ... }
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Secrets
Secrets API
$ curl --request POST 'core-api:8080/v2/secret' --data '{ ... }'
$ curl --request GET 'core-api:8080/v2/secret'
$ curl --request GET 'core-api:8080/v2/secret/{id}'
Note
|
Reading secrets will never expose their encrypted data. |
$ curl --request PUT 'core-api:8080/v2/secret/{id}' --data '{ ... }'
$ curl --request DELETE 'core-api:8080/v2/secret/{id}'
A secret is a representation of a secure JSON. This document could be anything, but its most common usage is for credentials:
{
"name": "MY_SECRET",
"content": {
"username": <username>,
"password": <password>,
},
...
}
Note
|
When the secrets are backed up by an external service, Discovery won’t expose any CRUD for their management. |
name
-
(Required, String) The unique name to identify the secret
description
-
(Optional, String) The description for the configuration
content
-
(Required, Object) The JSON to securely store
labels
-
(Optional, Array of Objects) The labels for the configuration
Details
{ "labels": [ { "key": "My Label Key", "value": "My Label Value" }, ... ], ... }
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Supported Services
Amazon Bedrock
{
"type": "amazon-bedrock",
"name": "My Amazon Bedrock Server",
"config": {
...
}
}
Note
|
This integration supports the circuit breaker configuration. |
region
-
(Required, String) The AWS region
apiCallTimeout
-
(Optional, Duration) The complete duration of an API call
connection
-
(Optional, Object) The configuration of the connection to Amazon Bedrock
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
backoffPolicy
-
(Optional, Object) The configuration for retries to Amazon Bedrock
Details
type
-
(Optional, String) The type of backoff policy to apply. One of
NONE
,CONSTANT
, orEXPONENTIAL
. Defaults toEXPONENTIAL
initialDelay
-
(Optional, Duration) The initial delay before retrying. Defaults to
50ms
retries
-
(Optional, Integer) The maximum number of retries. Defaults to
5
Authentication
{
"type": "aws",
"name": "My Amazon Bedrock Credentials",
"secret": {
...
}
}
accessKeyId
-
(Required, String) The ID of your access key, used to identify the user
secretAccessKey
-
(Required, String) The secret access key, used to authenticate the user
sessionToken
-
(Optional, String) The session token from an AWS token service, used to authenticating that this user has received temporary permission to access some resource
expirationTime
-
(Optional, Duration) The time after which this identity will no longer be valid. If not provided, the expiration is unknown but it may still expire at some time
Note
|
Elasticsearch
{
"type": "elasticsearch",
"name": "My Elasticsearch Server",
"config": {
...
}
}
Note
|
This integration supports the |
servers
-
(Required, Array of Strings) The URI for the Elasticsearch installation. Multiple
servers
will be invoked in round-robin pathPrefix
-
(Optional, String) The path prefix to add to the
servers
on each call connection
-
(Optional, Object) The configuration of the HTTP connection to Elasticsearch
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
cloudId
-
(Required, String) The ID of the instance in Elastic Cloud
connection
-
(Optional, Object) The configuration of the HTTP connection to Elasticsearch
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
Authentication
{
"type": "http",
"name": "My Elasticsearch Credentials",
"secret": {
...
}
}
username
-
(Required, String) The username of the credentials
password
-
(Required, String) The password of the credentials
{
"type": "http",
"name": "My Elasticsearch Credentials",
"secret": {
...
}
}
token
-
(Required, String) The token of the credentials
{
"type": "http",
"name": "My Elasticsearch Credentials",
"secret": {
...
}
}
apiKey
-
(Required, String) The API key of the credentials
DSL
Filter | Elasticsearch Query Operator |
---|---|
Term Query when |
|
Bool Query with |
|
Bool Query with |
|
Hugging Face
{
"type": "hugging-face",
"name": "My Hugging Face Server",
"config": {
...
}
}
servers
-
(Required, Array of Strings) The URI for the Hugging Face Inference API service. Multiple
servers
will be invoked in round-robin connection
-
(Optional, Object) The configuration of the HTTP connection to Hugging Face
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
Authentication
{
"type": "http",
"name": "My Hugging Face Credentials",
"secret": {
...
}
}
token
-
(Required, String) The token of the credentials
MongoDB
{
"type": "mongo",
"name": "My MongoDB Server",
"config": {
...
}
}
Note
|
This integration supports the |
servers
-
(Required, Array of Strings) The connection string for the MongoDB/MongoDB Atlas installation. Multiple
servers
represent a replica set connection
-
(Optional, Object) The configuration of the connection to MongoDB
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressors
-
(Optional, Array of Strings) A list of data compressors. One of
SNAPPY
,ZLIB
orZSTD
tls
-
(Optional, Boolean)
true
if the connection should be done through SSL. Defaults tofalse
retryWrites
-
(Optional, Boolean)
true
if the connection should retry requests. Defaults totrue
Authentication
{
"type": "mongo",
"name": "My MongoDB Credentials",
"secret": {
"mechanism": "SCRAM-SHA-1",
...
}
}
mechanism
-
(Required, String) The authentication mechanism. Must be
SCRAM-SHA-1
username
-
(Required, String) The username of the credentials
password
-
(Required, String) The password of the credentials
source
-
(Required, String) The database name associated with the user’s authentication data. Defaults to
admin
{
"type": "mongo",
"name": "My MongoDB Credentials",
"secret": {
"mechanism": "SCRAM-SHA-256",
...
}
}
mechanism
-
(Required, String) The authentication mechanism. Must be
SCRAM-SHA-256
username
-
(Required, String) The username of the credentials
password
-
(Required, String) The password of the credentials
source
-
(Required, String) The username of the credentials. Defaults to
admin
{
"type": "mongo",
"name": "My MongoDB Credentials",
"secret": {
"mechanism": "MONGODB-AWS",
...
}
}
mechanism
-
(Required, String) The authentication mechanism. Must be
MONGODB-AWS
accessKeyId
-
(Required, String) The AWS access key ID
secretAccessKey
-
(Optional, String) The AWS secret access key
sessionToken
-
(Optional, String) The AWS session token for authentication with temporary credentials when using an AssumeRole request, or when working with AWS resources that specify this value such as Lambda
DSL
Filter | Mongo Query Operator |
---|---|
|
Filter | Mongo Query Operator |
---|---|
Text Operator with a string query |
|
Range Operator with numbers or dates |
|
Range Operator with numbers or dates |
|
Range Operator with numbers or dates |
|
Range Operator with numbers or dates |
|
Range Operator with numbers or dates |
|
Compound Operator with |
|
Compound Operator with |
|
Regex Operator with the keyword analyzer |
Neo4j
{
"type": "neo4j",
"name": "My Neo4j Server",
"config": {
...
}
}
Note
|
This integration supports the |
server
-
(Required, String) The URI to connect to Neo4j, following the supported schemes
connection
-
(Optional, Object) The configuration of the connection to Neo4j
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
Authentication
{
"type": "neo4j",
"name": "My Neo4j Credentials",
"secret": {
...
}
}
username
-
(Required, String) The username of the credentials
password
-
(Required, String) The password of the credentials
OpenAI
{
"type": "openai",
"name": "My OpenAI Server",
"config": {
...
}
}
Note
|
This integration supports the circuit breaker configuration. |
organizationId
-
(Optional, String) The Organization ID to be added to the requests header
connection
-
(Optional, Object) The configuration of the connection for OpenAI
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
Authentication
{
"type": "http",
"name": "My OpenAI Credentials",
"secret": {
...
}
}
apiKey
-
(Required, String) The API key of the credentials
OpenSearch
{
"type": "opensearch",
"name": "My OpenSearch Server",
"config": {
...
}
}
Note
|
This integration supports the |
servers
-
(Required, Array of Strings) The URI for the OpenSearch installation. Multiple
servers
will be invoked in round-robin pathPrefix
-
(Optional, String) The path prefix to add to the
servers
on each call connection
-
(Optional, Object) The configuration of the HTTP connection to OpenSearch
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
endpoint
-
(Required, String) The host to make the request to, without the
http://
signature
-
(Required, Object) The signature for the requests
Details
region
-
(Required, String) The AWS region of the service
serviceName
-
(Required, String) The signing service name
connection
-
(Optional, Object) The configuration of the HTTP connection to OpenSearch
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
Authentication
{
"type": "http",
"name": "My OpenSearch Credentials",
"secret": {
...
}
}
username
-
(Required, String) The username of the credentials
password
-
(Required, String) The password of the credentials
{
"type": "aws",
"name": "My OpenSearch Credentials",
"secret": {
...
}
}
accessKeyId
-
(Required, String) The ID of your access key, used to identify the user
secretAccessKey
-
(Required, String) The secret access key, used to authenticate the user
sessionToken
-
(Optional, String) The session token from an AWS token service, used to authenticating that this user has received temporary permission to access some resource
expirationTime
-
(Optional, Duration) The time after which this identity will no longer be valid. If not provided, the expiration is unknown but it may still expire at some time
Note
|
DSL
Filter | OpenSearch Query Operator |
---|---|
Term Query when |
|
Bool Query with |
|
Bool Query with |
|
Solr
{
"type": "solr",
"name": "My Solr Server",
"config": {
...
}
}
Note
|
This integration supports the |
servers
-
(Required, Array of Strings) The URI for the Solr installation. Multiple
servers
will be invoked in round-robin connection
-
(Optional, Object) The configuration of the HTTP connection to Solr
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
Authentication
{
"type": "http",
"name": "My Solr Credentials",
"secret": {
...
}
}
username
-
(Required, String) The username of the credentials
password
-
(Required, String) The password of the credentials
DSL
Filter | Solr Query Operator |
---|---|
Vespa
{
"type": "vespa",
"name": "My Vespa Server",
"config": {
...
}
}
Note
|
This integration supports the |
Note
|
This integration supports the circuit breaker configuration. |
Note
|
servers
-
(Required, Array of Strings) The URI for the Vespa service. Multiple
servers
will be invoked in round-robin connection
-
(Optional, Object) The configuration of the connection for Vespa
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
backoffPolicy
-
(Optional, Object) The configuration for retries to the Vespa service
Details
type
-
(Optional, String) The type of backoff policy to apply. One of
NONE
,CONSTANT
, orEXPONENTIAL
. Defaults toEXPONENTIAL
initialDelay
-
(Optional, Duration) The initial delay before retrying. Defaults to
50ms
retries
-
(Optional, Integer) The maximum number of retries. Defaults to
5
scroll
-
(Optional, Object) The scroll configuration for paginated requests
Details
{ "scroll": { "size": 50 } }
size
-
(Required, String) The size of the scroll request
Authentication
{
"type": "http",
"name": "My Vespa Credentials",
"secret": {
...
}
}
username
-
(Required, String) The username of the credentials
password
-
(Required, String) The password of the credentials
{
"type": "http",
"name": "My Vespa Credentials",
"secret": {
...
}
}
token
-
(Required, String) The token of the credentials
{
"type": "http",
"name": "My Vespa Credentials",
"secret": {
...
}
}
apiKey
-
(Required, String) The API key of the credentials
Voyage AI
{
"type": "voyage-ai",
"name": "My Voyage AI Server",
"config": {
...
}
}
Note
|
This integration supports the circuit breaker configuration. |
connection
-
(Optional, Object) The configuration of the connection for Voyage AI
Details
connectTimeout
-
(Optional, Duration) The timeout to connect to the service. Defaults to
60s
readTimeout
-
(Optional, Duration) The timeout to read the first package from the service. Defaults to
60s
pool
-
(Optional, Object) The configuration for the connection pool
Details
size
-
(Optional, Integer) The size of the connection pool. Defaults to
5
keepAlive
-
(Optional, Duration) The duration before evicting a connection from the pool. Defaults to
5m
compressRequest
-
(Optional, Boolean)
true
if the requests must be compressed followRedirects
-
(Optional, Boolean)
true
if redirects must be followed
backoffPolicy
-
(Optional, Object) The configuration of the back off policy for Voyage AI
Details
type
-
(Optional, String) The type of backoff policy to apply. One of
NONE
,CONSTANT
, orEXPONENTIAL
. Defaults toEXPONENTIAL
initialDelay
-
(Optional, Duration) The initial delay before retrying. Defaults to
50ms
retries
-
(Optional, Integer) The maximum number of retries. Defaults to
5
Authentication
{
"type": "http",
"name": "My Voyage AI Credentials",
"secret": {
...
}
}
token
-
(Required, String) The token of the credentials
DSL
The Discovery Domain-Specific Language is a standardized definition on how to write JSON expressions that can be applied to all Discovery products.
Filters
"Equals" Filter
The value of the field must be exactly as the one provided.
{
"equals": {
"field": "my-field",
"value": "my-value",
"normalize": true
}
}
When supported, the normalize
field enables normalization as described by the filter provider. It is enabled by default.
"Less Than" Filter
The value of the field must be less than the one provided.
{
"lt": {
"field": "my-field",
"value": 1
}
}
"Less Than or Equal to" Filter
The value of the field must be less than or equals to the one provided.
{
"lte": {
"field": "my-field",
"value": 1
}
}
"Between" Filter
The value of the field must be greater than or equals to the "from" value (inclusive), and less than the "to" value (exclusive).
{
"between": {
"field": "my-field",
"from": 1,
"to": 10
}
}
"Greater Than" Filter
The value of the field must be greater than the one provided.
{
"gt": {
"field": "my-field",
"value": 1
}
}
"Greater Than or Equal To" Filter
The value of the field must be greater than or equals to the one provided.
{
"gte": {
"field": "my-field",
"value": 1
}
}
"In" Filter
The value of the field must be one of the provided values.
{
"in": {
"field": "my-field",
"values": [
"my-value-a",
"my-value-b"
]
}
}
"Empty" Filter
Checks if a field is empty:
-
For a collection,
true
if its size is 0. -
For a String,
true
if its length is 0. -
For any other type,
true
if it isnull
.
{
"empty": {
"field": "my-field"
}
}
"Exists" Filter
Checks if a field exists.
{
"exists": {
"field": "my-field"
}
}
"Not" Filter
Negates the inner clause.
{
"not": {
"equals": {
"field": "my-field",
"value": "my-value"
}
}
}
"Null" Filter
Checks if a field is null
. Note that while the "exists" filter checks whether the field is present or not, the "null" filter
expects the field to be present but with null
value.
{
"null": {
"field": "my-field"
}
}
"Regex" Filter
Checks if a field matches with the given regex pattern.
{
"regex": {
"field": "my-field",
"pattern": "my-pattern"
}
}
"Boolean" Filter
-
and
: All conditions in the list must be evaluated totrue
.
{
"and": [
{
"equals": {
"field": "my-field-a",
"value": "my-value-a"
}
}, {
"equals": {
"field": "my-field-b",
"value": "my-value-b"
}
}
]
}
-
or
: At least one condition in the list must be evaluated totrue
.
{
"or": [
{
"equals": {
"field": "my-field-a",
"value": "my-value-a"
}
}, {
"equals": {
"field": "my-field-b",
"value": "my-value-b"
}
}
]
}
Projections
A projection allows you to select specific fields (attributes) to filter out from a request:
-
If no includes or excludes fields are defined, all fields are returned.
-
If only the includes fields are defined, only those fields are returned.
-
If only the excludes fields are defined, all available fields, except the ones in the exclusions are returned.
-
If both includes and excludes fields are defined, both are included in the projection.
Note
|
The details of how projections are processed might vary between uses cases of the DSL and/or providers, specially when it comes to projections with both included and excluded fields. It’s recommended to check the documentation of the specific component or API that’s going to be used for details like projections that aren’t allowed. |
{
"includes": ["my-field-a", "my-field-b"],
"excludes": ["my-field-c", "my-field-d"]
}
Expression Language
The Expression Language is a flexible but simple way to manage and handle configurations. In a JSON, the use of expressions allows for values to be ambiguous to later be contextually processed:
{
"dynamicField": "#{ first_math_function('input') + second_math_function('input') }",
"staticField": "value"
}
As shown in the previous example, the syntax of an expression is one or multiple constants, operators and functions wrapped between the #{
and }
tokens.
Note
|
The Expression Language is case-sensitive and all functions are defined in snake_case. |
Constants
Constant | Value |
---|---|
NULL |
|
Constant | Value |
---|---|
PI |
3.14159265... |
E |
2.71828182... |
Constant | Value |
---|---|
TRUE |
|
FALSE |
|
Operators
Operator | Token |
---|---|
Plus |
+ |
Minus |
- |
Multiplication |
* |
Division |
/ |
Power of |
^ |
Module |
% |
Operator | Token |
---|---|
Equals |
= |
Equals |
== |
Not equals |
<> |
Not equals |
!= |
Greater than |
> |
Greater than or equal to |
>= |
Less than |
< |
Less than or equal to |
<= |
Operator | Token |
---|---|
And |
&& |
Or |
|| |
Not |
! |
Operator | Token |
---|---|
Plus |
+ |
Minus |
- |
Operator | Token |
---|---|
Concat |
+ |
Functions
Function | Description | Example |
---|---|---|
|
Returns the first non-null value, or null if there are none |
|
Function | Description | Example |
---|---|---|
|
Returns the absolute value of a value |
|
|
Rounds a number towards positive infinity |
|
|
Returns the factorial of a number |
|
|
Rounds a number towards negative infinity |
|
|
Performs the logarithm with base e on a value |
|
|
Performs the logarithm with base 10 on a value |
|
|
Returns the highest value from all the parameters provided |
|
|
Returns the lowest value from all the parameters provided |
|
|
Returns random number between 0 and 1 |
|
|
Rounds a decimal number to a specified scale |
|
|
Returns the sum of the parameters |
|
|
Returns the square root of the value provided |
|
Function | Description | Example |
---|---|---|
|
Returns the arc-cosine in degrees |
|
|
Returns the hyperbolic arc-cosine in degrees |
|
|
Returns the arc-cosine in radians |
|
|
Returns the arc-co-tangent in degrees |
|
|
Returns the hyperbolic arc-co-tangent in degrees |
|
|
Returns the arc-co-tangent in radians |
|
|
Returns the arc-sine in degrees |
|
|
Returns the hyperbolic arc-sine in degrees |
|
|
Returns the arc-sine in radians |
|
|
Returns the arc-tangent in degrees |
|
|
Returns the angle of arc-tangent2 in degrees |
|
|
Returns the angle of arc-tangent2 in radians |
|
|
Returns the hyperbolic arc-tangent in degrees |
|
|
Returns the arc-tangent in radians |
|
|
Returns the cosine in degrees |
|
|
Returns the hyperbolic cosine in degrees |
|
|
Returns the cosine in radians |
|
|
Returns the co-tangent in degrees |
|
|
Returns the hyperbolic co-tangent in degrees |
|
|
Returns the co-tangent in radians |
|
|
Returns the co-secant in degrees |
|
|
Returns the hyperbolic co-secant in degrees |
|
|
Returns the co-secant in radians |
|
|
Converts an angle from radians to degrees |
|
|
Converts an angle from degrees to radians |
|
|
Returns the sine in degrees |
|
|
Returns the hyperbolic sine in degrees |
|
|
Returns the sine in radians |
|
|
Returns the secant in degrees |
|
|
Returns the hyperbolic secant in degrees |
|
|
Returns the secant in radians |
|
|
Returns the tangent in degrees |
|
|
Returns the hyperbolic tangent in degrees |
|
|
Returns the tangent in radians |
|
Function | Description | Example |
---|---|---|
|
Formats a date/time with a given pattern as described in Date/Time |
|
|
Gets the current datetime |
|
to_date(string) |
Parses the input pattern as described in Date/Time |
|
to_date(number, number, number, number?, number?) |
When providing a set of integers, year, month and day are required. Hour, minute and second are all optional. The order of the parameters must be as previously mentioned |
|
Function | Description | Example |
---|---|---|
|
Conditional operation where if the boolean expression evaluates to |
|
|
Negates a boolean expression |
|
Function | Description | Example |
---|---|---|
|
Converts String to lower case |
|
|
Converts String to upper case |
|
|
Verifies with a boolean if a string begins with a given substring. Case sensitivity can optionally be specified. If the case sensitivity flag is not sent, it will be set to |
|
|
Verifies with a boolean if a string ends with a given substring. Case sensitivity can optionally be specified. If the case sensitivity flag is not sent, it will be set to |
|
|
Returns a boolean specifying if a string matches a given pattern |
|
|
Returns a boolean specifying if a String is empty |
|
|
Whether the variable is a blank String |
|
|
Returns the length of a given string |
|
|
Concatenates a given set of strings |
|
|
Splits a String into a List by a regex value |
|
|
Strips the punctuation, replacing it by a space |
|
|
Returns a boolean specifying if the first string contains the second one |
|
|
Generates a random UUID v4 |
|
|
Returns the number represented by a string |
|
Function | Description | Example |
---|---|---|
|
Hashes a given object using MD5 |
|
|
Hashes a given object using SHA-256 |
|
Function | Description | Example |
---|---|---|
|
Returns a boolean specifying if a list is empty |
|
|
Returns the amount of items in the list |
|
|
Returns a boolean specifying if the list contains the value |
|
|
Returns the value in the given index |
|
|
Joins a given set of arrays into one |
|
Note
|
An alternative syntax to access to the value on the position |
Function | Description | Example |
---|---|---|
|
Returns a boolean specifying if a map is empty |
|
|
Returns the amount of items in the map |
|
|
Returns a boolean specifying if the map contains the key |
|
|
Returns the value with the given key |
|
Function | Description | Example |
---|---|---|
|
Finds a specific value within a JSON with a JSONPath string |
|
Function | Description | Example |
---|---|---|
|
Reads a file from the storage. The file can optionally be obtained as a byte array representation if the |
|
Script Engine
The Script Engine enables the execution of scripts for advanced handling of execution data. Supports multiple scripting languages and provides tools for JSON manipulation and logging:
Bindings
Each script has bindings to interact with the execution context where it runs:
data
-
(Object) Allows the creation and manipulation of JsonNode instances. Also, if the script runs as part of Discovery Ingestion or Discovery Queryflow, the binding will expose its corresponding the context data.
Details
Method Description ArrayNode arrayNode()
Returns a new JSON array
JsonNode get(String)
Obtains a deep copy of the nodes from the data generated during the execution, takes the path to the value or node. The input must be the JSON Pointer of the data field, for example:
data.get("/myfield")
NullNode nullNode()
Returns a new null node
ObjectNode objectNode()
Returns a new JSON object
JsonNode output()
Obtains the value that will be used as output for the component
JsonNode parseJson(String)
Takes a JSON with a String format and parses it into a JSON document
JsonNode valueToJson(Object)
Takes any object and tries to convert it into a JSON document
void set(parameter)
Sets the output field to a primitive type (integer, long, str, float, double), or a JsonNode. The method will infer the type of parameter in languages that are dynamically typed such as python. If the use case needs it, a casting may help in controlling the output node
log
-
(Object) Supports a SLF4J logger that can log messages directly into the application
value = 5
if (value <= 10):
log.error("Example of an error")
var response = data.objectNode();
response.put("intValue", 3);
log.info("Node set with value: " + data.output().get("intValue").asText());
var requestBody = data.get("/numberTest").asInt();
data.set(requestBody);
Template Engine
The Template Engine, provided by Freemarker, acts as a blueprint that uses a given input to generate various types of documents as either plain text or JSON.
Supported in both Discovery Ingestion and Discovery QueryFlow, can take a standard template and process it with contextual structured data to create verbalized representation of the information.
Consider the following input data:
{
"name": "mary",
"users": [
{
"name": "Jane Doe",
"id": 0
},
{
"name": "Mary",
"id": 2
},
{
"name": "Alice",
"id": 3
}
]
}
When processed with the following template:
Hello, ${name?capitalize}!
Users registered:
<#list users as user>
<#if user.id == 0>
name: admin, ID: ${user.id}
<#else>
Name: ${user.name}, ID: ${user.id}
</#if>
</#list>
Then, the output would be:
Hello, Mary!
Users registered:
name: admin, ID: 0
Name: Mary, ID: 2
Name: Alice, ID: 3
-
Hello, ${name?capitalize}!
will output a greeting to the name specified in the JSON data and capitalize the first letter. Given the JSON data, it will outputHello, Mary!
becausemary
now is capitalized. -
<#list users as user>
is a directive that iterates over the users array in the JSON data. -
<#if user.id == 0>
within the list, checks if the user’s id is0
. Iftrue
, it outputsname: admin, ID: 0
instead of using the user’s actual name. -
<#else>
for all other users (where id is not0
), it outputsName: user.name, ID: user.id
.
Template Language
Placeholders
Placeholders are references to the data model passed to the template.
Syntax:
${variableName}
Example data model:
{
"name": "Mary"
}
Example:
${name}
Output: Mary
Comments
Comments are a way to add notes or explanations within your templates.
Syntax:
<#-- Comment --#>
Example:
<#-- Hello this is my comment --#>
Directives
== Directives These are instructions that control the processing flow of the template (like loops and conditionals). The full list of the directives can be found in the Directive reference of the Freemarker documentation.
=== Assign
Used to define a variable.
Syntax:
<#assign name1=value1>
Example:
<#assign x=1>
Attempt, Recover
Used for error handling in templates. Attempt is to execute code that might fail and recover to define what to do if an error occurs in the attempt block.
Syntax:
<#attempt>
attempt block
<#recover>
recover block
</#attempt>
Example:
<#attempt>
${user.name}
<#recover>
Unknown User
</#attempt>
Function, Return
Used to create a method variable, it must have a parameter that specifies the return value of the method.
Syntax:
<#function name param1 param2 ... paramN>
...
<#return returnValue>
...
</#function>
Example:
<#function avg x y>
<#return (x + y) / 2>
</#function>
${avg(10, 20)}
Output: 15
.
Global
Used to define a variable for all namespaces.
Syntax:
<#global name=value>
Example:
<#global x=1>
If, else, elseif
Used to conditionally skip a section of the template.
Syntax:
<#if condition>
...
<#elseif condition2>
...
<#else>
...
</#if>
Example:
<#if x == 1>
x is 1
<#elseif x == 2>
x is 2
<#else>
x is not 1 nor 2
</#if>
Import
Used to bring all macros and functions from another template file into the current template namespace. Path is the path of the template file to import and hash is the name of variable by which you can access the namespace.
Syntax:
<#import path as hash>
Example:
<#import "library.ftl" as lib>
Include
Used to include the content of another template file into the current template output. It does not make macros or functions from the included file available in the current namespace. Path is the path of the template file to include.
Syntax:
<#include path>
Example:
<#include "header.ftl">
List
Used for iterating over a collection.
-
else: Within a list, it is used to specify output if the list is empty.
-
items: It refers to the current item in the iteration.
-
sep: Used to output something between items, like a separator.
-
break: Exits the loop prematurely.
-
continue: Skips the current iteration and moves to the next item.
Syntax:
<#list sequence as item>
Part repeated for each item
</#list>
Example:
<#list users as user>
${user.name}
<#else>
No users found.
</#list>
Macro
Used to define a reusable block of template code.
Syntax:
<#macro name param1 param2 ... paramN>
...
</#macro>
Example:
<#macro test>
Test text
</#macro>
<#-- call the macro: -->
<@test/>
Output: Test text
File Storage
Files API
$ curl --request PUT 'core-api:8080/v2/file/my/key/to/my/file' --form 'file=@"/../../test.txt"'
$ curl --request GET 'core-api:8080/v2/file/my/key/to/my/file'
$ curl --request GET 'core-api:8080/v2/file'
$ curl --request DELETE 'core-api:8080/v2/file/my/key/to/my/file'
The File Storage handles files with the help of a dedicated "folder" in the Object Storage.
File names can be constructed as nested paths, where a slash at the end of each sub path denotes this sub path as a parent/folder. This name must follow the next rules:
-
Parent names can contain alphanumeric characters (i.e.
[A-Z]
,[a-z]
,[0-9]
), hyphen (i.e.-
), underscore (i.e._
) and spaces. -
Character quantity must range from 1 to 255.
-
A nested path can consist of up to 10 levels.
Note
|
When executing the endpoints to upload or delete files, the Expression Language is notified about the changes to clear its internal cache. |
Discovery Staging
Discovery Staging is a REST API on top of a Document Database. Its goal is to simplify and standardize the interactions of all Discovery products with any supported provider that can handle JSON content, while enabling the final user features such as:
-
A push model alternative for the ETL process in Discovery Ingestion.
-
An intermediate repository for Discovery Ingestion, reducing the time and costs of content reprocessing for each processing iteration.
-
Advanced search capabilities in Discovery QueryFlow such as facet snapping based on the user’s input.
Supported Providers
MongoDB
Being the NoSQL industry standard for document storage and with a MongoDB Atlas managed service available in the marketplace of all major Cloud Providers, makes MongoDB the default Document Database provider for Discovery Staging in all Discovery installations.
DocumentDB
Amazon DocumentDB (with MongoDB compatibility) is a good alternative Document Database provider for Discovery Staging in installations fully-managed by AWS,
Content
Content API
$ curl --request GET 'staging-api:8081/v2/content/{bucketName}/{contentId}?action={action}&include={include}&exclude={exclude}'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
contentId
-
(Required, String) The document ID.
Query Parameters
action
-
(Optional, String) The actions to filter the documents. Defaults to
STORE
. include
-
(Optional, Array of Strings) determine the fields of the document’s content that will be included in the response.
exclude
-
(Optional, Array of Strings) determine the fields of the document’s content that will be excluded in the response.
$ curl --request POST 'staging-api:8081/v2/content/{bucketName}/{contentId}?parentId={parentId}'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
contentId
-
(Required, String) The document ID.
Query Parameters
parentId
-
(Optional, String) The parent ID of the documents.
Details
Note
|
This endpoint is capable of updating an existing document by using the |
The final content must not exceed the maximum size supported by the chosen provider. Exceeding the limit is depicted by a 413
error in the Staging APIs response.
Documents are stored with metadata. This adds an extra size to the final document besides the content in the request body, so it is recommended to write the body with less than the limit depicted by the provider.
The following table details maximum provider limits:
Provider |
Limit |
MongoDB |
BSON size limit (~16Mb / 16793600 Bytes) |
DocumentDB |
BSON size limit (~16Mb / 16793600 Bytes) |
$ curl --request DELETE 'staging-api:8081/v2/content/{bucketName}/{contentId}'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
contentId
-
(Required, String) The document ID.
$ curl --request DELETE 'staging-api:8081/v2/content/{bucketName}?parentId={parentId}' --data '{ ... }'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
Query Parameters
parentId
-
(Optional, String) The parent ID of the documents.
Body
The body payload is an optional
DSL Filter to apply to the delete
Note
|
The |
$ curl --request POST 'staging-api:8081/v2/content/{bucketName}/scroll?token={token}&parentId={parentId}&size={size}&action={action}' --data '{ ... }'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
Query Parameters
token
-
(Optional, Hex String) the token to paginate the documents.
parentId
-
(Optional, String) The parent ID of the documents.
size
-
(Optional, Int) The number of documents to scroll. Defaults to
25
. action
-
(Optional, Array of String) The actions to filter the documents. Defaults to
STORE
.
Body
The body payload is is an optional
DSL Filter and an optional
DSL Projection to apply to the scroll
{
"fields": <Projection DSL>,
"filters": <Filter DSL>
}
Details
Note
|
The |
$ curl --request POST 'staging-api:8081/v2/content/{bucketName}/search?parentId={parentId}&action={action}&page={page}&size={size}&sort={sort}' --data '{ ... }'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
Query Parameters
parentId
-
(Optional, String) The parent ID of the documents.
action
-
(Optional, Array of String) The actions to filter the documents. Defaults to
STORE
. page
-
(Optional, Int) The page number. Defaults to
0
. size
-
(Optional, Int) The size of the page. Defaults to
20
. sort
-
(Optional, Array of String) The sort definition for the page.
Body
The body payload is an optional
DSL Filter and an optional
DSL Projection to apply to the search
{
"fields": <Projection DSL>,
"filters": <Filter DSL>
}
Note
|
The For both The |
The content of the bucket is the data, stored as JSON.
Buckets
Buckets API
$ curl --request GET 'staging-api:8081/v2/bucket'
$ curl --request GET 'staging-api:8081/v2/bucket/{bucketName}'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
$ curl --request DELETE 'staging-api:8081/v2/bucket/{bucketName}'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
$ curl --request DELETE 'staging-api:8081/v2/bucket/{bucketName}/purge'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
Note
|
Only purges the documents with the |
$ curl --request DELETE 'staging-api:8081/v2/bucket/{bucketName}/index/{indexName}'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
indexName
-
(Required, String) The name of the index.
$ curl --request PUT 'staging-api:8081/v2/bucket/{bucketName}/index/{indexName}' --data '{ ... }'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
indexName
-
(Required, String) The name of the index.
Body
[
{ "fieldA": "ASC" },
{ "fieldB": "DESC" },
...
]
Note
|
However, an empty or null body is also allowed. In these cases, an ascending index is created with the index name as the field name. |
$ curl --request POST 'staging-api:8081/v2/bucket/{bucketName}' --data '{ ... }'
Path Parameters
bucketName
-
(Required, String) The name of the bucket.
Body
{
"indices": [
{
"name": "myIndexA",
"fields": [
{ "fieldA": "ASC" },
{ "fieldB": "DESC" },
...
]
},
...
],
"config":{}
}
A bucket is a complete collection of data. Several operations can be performed on a bucket, where the results vary depending on the user input from the HTTP request.
Metadata description
{
"name": "<Text>",
"documentCount": {
"STORE": "<Number>",
"DELETE": "<Number>"
},
"content": {
"oldest": "<StagingDocument>",
"newest": "<StagingDocument>"
},
"indices": [
{
"name": "<Text>",
"fields": [
{
"fieldA": "ASC|DESC"
}
]
}
]
}
Property | Type | Description |
---|---|---|
name |
Text |
The bucket name |
documentCount |
JSON Object |
The total documents in the bucket, divided by action |
documentCount.STORE |
Number |
The number of documents currently in the bucket with a |
documentCount.DELETE |
Number |
The number of documents currently in the bucket with a |
content |
JSON Object |
The content of the bucket, including the oldest and newest documents |
content.oldest |
Staging Document |
The oldest document in the bucket |
content.newest |
Staging Document |
The newest document in the bucket |
indices |
JSON Array |
Array with the name and fields of every index in the bucket |
Index description
Property | Type | Description |
---|---|---|
name |
Text |
The index name |
fields |
Key/Value Pair Array |
The fields used for the index. The key of every element is the name of the field, and the value is its sort direction (ASC or DESC). |
Note
|
All indices are over the content of the document |
Note
|
When creating an index, if any of the fields is duplicated the last value specification for a field will take precedence. |
Note
|
Also in the value of the fields, apart from using (
|
Discovery Ingestion
Discovery Ingestion is a fully-featured extract, transform, and load (ETL) tool that orchestrates the communication with external services while applying data enrichment to the records detected in the given data source. It enables features such as:
-
Flexibility to represent complex data processing scenarios through a finite-state machine.
-
Distributed, auto-scalable model that only consumes resources as needed.
-
Extensive component library for data source scanning, records processing and hooks triggering.
Data Seed
Seeds API
$ curl --request POST 'ingestion-api:8080/v2/seed' --data '{ ... }'
$ curl --request POST 'ingestion-api:8080/v2/seed/{id}?scanType={scan-type}' --data '{ ... }'
Query Parameters
scanType
-
(Required, String) The scan type for the seed execution. Currently, only
FULL
is supported
Body
The body payload is the execution properties, which overrides the ones configured in the Seed
$ curl --request POST 'ingestion-api:8080/v2/seed/{id}/halt'
$ curl --request GET 'ingestion-api:8080/v2/seed'
$ curl --request GET 'ingestion-api:8080/v2/seed/{id}'
$ curl --request PUT 'ingestion-api:8080/v2/seed/{id}' --data '{ ... }'
Note
|
The type of an existing seed can’t be modified. |
$ curl --request POST 'ingestion-api:8080/v2/seed/{id}/reset'
$ curl --request DELETE 'ingestion-api:8088/v2/seed/{id}'
$ curl --request POST 'ingestion-api:8088/v2/seed/{id}/clone?name=clone-new-name'
Query Parameters
name
-
(Required, String) The name of the new Seed
$ curl --request POST 'ingestion-api:8080/v2/seed/search' --data '{ ... }'
Body
The body payload is a DSL Filter to apply to the search
$ curl --request GET 'ingestion-api:8080/v2/seed/autocomplete?q=value'
Query Parameters
q
-
(Required, String) The query to execute the autocomplete search
A seed defines the data source for the configuration and the pipeline to follow during the processing of each record through the finite-state machine.
{
"type": "my-component-type",
"name": "My Component Seed",
"config": {
...
},
"pipeline": <Pipeline ID>,
...
}
type
-
(Required, String) The name of the component to execute
name
-
(Required, String) The unique name to identify the configuration
description
-
(Optional, String) The description for the configuration
config
-
(Required, Object) The configuration for the corresponding action of the component. All configurations will be affected by the Expression Language
server
-
(Optional, UUID/Object) Either the ID of the server configuration for the integration or an object with the detailed configuration
Details
{ "server": { "id": "ba637726-555f-4c68-bfed-1c91f4803894", ... }, ... }
id
-
(Required, UUID) The ID of the server configuration for the integration
credential
-
(Optional, UUID) The ID of the credential to override the default authentication in the external service
pipeline
-
(Required, UUID) The ID of the pipeline configuration for all detected records
recordPolicy
-
(Optional, Object) The global configuration for each record during its processing. Can also be referred to as
record
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { ... }, "pipeline": <Pipeline ID>, "recordPolicy": { ... }, ... }
id
-
(Optional, String) The expression that represents the ID of the record during its processing through the finite-state machine. If not provided, the plain ID of the record will be used
retryPolicy
-
(Optional, Object) The retry policy for failed records in the Seed Execution
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { ... }, "pipeline": <Pipeline ID>, "recordPolicy": { "retryPolicy": { ... }, ... }, ... }
maxRetries
-
(Required, Integer) The maximum number of retries for processing the record. The retries are executed from the point where the records failed. Defaults to
3
timeoutPolicy
-
(Optional, Object) The timeout policy for records in the Seed Execution
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { ... }, "pipeline": <Pipeline ID>, "recordPolicy": { "timeoutPolicy": { ... }, ... }, ... }
scan
-
(Required, Duration) The timeout for scan on each slice with records. Default to
1h
process
-
(Required, Duration) The timeout for each record during their execution through the finite-state machine. Defaults to
60s
errorPolicy
-
(Optional, Object) The error policy for records in the Seed Execution
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { ... }, "pipeline": <Pipeline ID>, "recordPolicy": { "errorPolicy": { ... }, ... }, ... }
scan
-
(Required, String) The error policy for scanned records. Defaults to
FATAL
-
FATAL
: A single failed document aborts the complete process -
IGNORE
: Ignores the record scan error and the Seed Execution continues
-
processor
-
(Required, String) The error policy for records during their execution through the finite-state machine. Defaults to
FAIL
-
FATAL
: A single failed document aborts the complete process -
FAIL
: Either marks the document as failed, or sends it to a configured error handling state (if any). Other records continue their execution as expected -
IGNORE
: Ignores the record processing error and its execution continues
-
outboundPolicy
-
(Optional, Object) The policy for groups of records sent for processing within the finite-state machine. Applied by default to outbound records (i.e., records sent to the next state) in each state thereafter
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { ... }, "pipeline": <Pipeline ID>, "recordPolicy": { "outboundPolicy": { ... }, ... }, ... }
batchPolicy
-
(Optional, Object) The batch policy for outbound batches of records
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { ... }, "pipeline": <Pipeline ID>, "recordPolicy": { "outboundPolicy": { "batchPolicy": { ... } }, ... }, ... }
maxCount
-
(Required, Integer) The maximum record count in a batch before flushing. Defaults to
25
flushAfter
-
(Required, Duration) The timeout to flush if no other condition has been met. Default to
1m
beforeHooks
-
(Optional, Object) The Hooks to execute before starting the record processing
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { ... }, "pipeline": <Pipeline ID>, "beforeHooks": { "hooks": [ ... ], "timeout": "60s", "errorPolicy": "IGNORE" }, ... }
hooks
-
(Required, Array of Objects) The list of Hooks to execute
Details
{ "hooks": [ { "id": <Hook ID>, ... } ], "timeout": "60s", "errorPolicy": "IGNORE" }
id
-
(Required, UUID) The ID of the Hook to execute
errorPolicy
-
(Optional, String) Overrides the global policy for errors during the execution of the Hook
-
FATAL
: A single failed hook aborts the complete process -
IGNORE
: Ignores the hook error and the Seed Execution continues
-
timeout
-
(Optional, Duration) Overrides the global timeout for the execution of the Hook
active
-
(Optional, Boolean)
false
to disable the execution of the Hook
errorPolicy
-
(Required, String) The policy for errors during the execution of the Hook. Defaults to
IGNORE
-
FATAL
: A single failed hook aborts the complete process -
IGNORE
: Ignores the hook error and the Seed Execution continues
-
timeout
-
(Required, Duration) The timeout for the execution of the Hook. Defaults to
60s
afterHooks
-
(Optional, Object) The Hooks to execute after completing the record processing
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { ... }, "pipeline": <Pipeline ID>, "afterHooks": { "hooks": [ ... ], "timeout": "60s", "errorPolicy": "IGNORE" }, ... }
hooks
-
(Required, Array of Objects) The list of Hooks to execute
Details
{ "hooks": [ { "id": <Hook ID>, ... } ], "timeout": "60s", "errorPolicy": "IGNORE" }
id
-
(Required, UUID) The ID of the Hook to execute
errorPolicy
-
(Optional, String) Overrides the global policy for errors during the execution of the Hook
-
FATAL
: A single failed hook aborts the complete process -
IGNORE
: Ignores the hook error and the Seed Execution continues
-
timeout
-
(Optional, Duration) Overrides the global timeout for the execution of the Hook
active
-
(Optional, Boolean)
false
to disable the execution of the Hook
errorPolicy
-
(Required, String) The policy for errors during the execution of the Hook. Defaults to
IGNORE
-
FATAL
: A single failed hook aborts the complete process -
IGNORE
: Ignores the hook error and the Seed Execution continues
-
timeout
-
(Required, Duration) The timeout for the execution of the Hook. Defaults to
60s
properties
-
(Optional, Object) The properties to be referenced with the help of the Expression Language in the configuration of the seed itself, in processors and in hooks
Details
{ "type": "my-component-type", "name": "My Component Seed", "config": { "myProperty": "#{ seed.properties.keyA }" }, "properties": { "keyA": "valueA" }, "pipeline": <Pipeline ID>, ... }
{ "type": "my-component-type", "name": "My Component Processor", "config": { "myProperty": "#{ seed.properties.keyA }" }, ... }
labels
-
(Optional, Array of Objects) The labels for the configuration
Details
{ "labels": [ { "key": "My Label Key", "value": "My Label Value" }, ... ], ... }
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Records
Seed Records reflect the status and parent-child relationship of records during a specific seed’s latest execution.
Each seed record in a given seed is identifiable by combining its seed, its parent plainId
(if any) and its own plainId
assigned at scan time. However, these IDs are used to generate a more systems-friendly ID for each record, known as hashId
. This last ID is made by first applying the SHA-256 algorithm to the combination of the parent plainId
plus the record plainId
, and then using padded Base64URL encoding to make the result safe to use in URLs.
Records API
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/record'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/record/{recordId}'
{
"id": {
...
},
"creationTimestamp": "2025-04-22T16:51:44Z",
"lastUpdatedTimestamp": "2025-04-22T16:51:44Z",
"parent": "FOMM0WPHMpEuBIxMg34VxOkMBi67eVq5R9V3BuLRDdg=",
"status": "FAILURE",
"errors": [
...
]
}
id
-
(Object) The ID of the record
Details
{ "plain": "1", "hash": "a4ayc_80_OGda4BO_1o_V0etpOqiLx1JwB5S3beHW0s=" }
plain
-
(String) The ID of the record before hashing it
hash
-
(String) The ID of the record as a Base64URL string
creationTimestamp
-
(Timestamp) The timestamp when the record was created
lastUpdatedTimestamp
-
(Timestamp) The timestamp when the record was last updated
parent
-
(String) The parent record’s id as a Base64URL string
status
-
(String) The status of the record
Details
-
SUCCESS
: The record was successfully processed -
FAILURE
: The record reported errors during its processing -
QUARANTINE
: The record has been processed many times, and it should not be processed again
-
errors
-
(Array of Objects) The record’s errors, if any
Pipeline
Pipelines API
$ curl --request POST 'ingestion-api:8080/v2/pipeline' --data '{ ... }'
$ curl --request GET 'ingestion-api:8080/v2/pipeline'
$ curl --request GET 'ingestion-api:8080/v2/pipeline/{id}'
$ curl --request PUT 'ingestion-api:8080/v2/pipeline/{id}' --data '{ ... }'
$ curl --request DELETE 'ingestion-api:8088/v2/pipeline/{id}'
$ curl --request POST 'ingestion-api:8088/v2/pipeline/{id}/clone?name=clone-new-name'
Query Parameters
name
-
(Required, String) The name of the new Pipeline
$ curl --request POST 'ingestion-api:8080/v2/pipeline/search' --data '{ ... }'
Body
The body payload is a DSL Filter to apply to the search
$ curl --request GET 'ingestion-api:8080/v2/pipeline/autocomplete?q=value'
Query Parameters
q
-
(Required, String) The query to execute the autocomplete search
A pipeline is the definition of the finite-state machine for records processing:
{
"name": "My Pipeline",
"initialState": "stateA",
"states": {
"stateA": {
...
},
"stateB": {
...
}
},
...
}
name
-
(Required, String) The unique name to identify the pipeline
description
-
(Optional, String) The description for the configuration
initialState
-
(Required, String) The state to use starting point of the pipeline as defined in the
states
field states
-
(Required, Object) The states associated to the pipeline
labels
-
(Optional, Array of Objects) The labels for the configuration
Details
{ "labels": [ { "key": "My Label Key", "value": "My Label Value" }, ... ], ... }
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Processors
Processors API
$ curl --request POST 'ingestion-api:8080/v2/processor' --data '{ ... }'
$ curl --request GET 'ingestion-api:8080/v2/processor'
$ curl --request GET 'ingestion-api:8080/v2/processor/{id}'
$ curl --request PUT 'ingestion-api:8080/v2/processor/{id}' --data '{ ... }'
Note
|
The type of an existing processor can’t be modified. |
$ curl --request DELETE 'ingestion-api:8088/v2/processor/{id}'
$ curl --request POST 'ingestion-api:8088/v2/processor/{id}/clone?name=clone-new-name'
Query Parameters
name
-
(Required, String) The name of the new Processor
$ curl --request POST 'ingestion-api:8080/v2/processor/search' --data '{ ... }'
Body
The body payload is a DSL Filter to apply to the search
$ curl --request GET 'ingestion-api:8080/v2/processor/autocomplete?q=value'
Query Parameters
q
-
(Required, String) The query to execute the autocomplete search
Each component is stateless, and it’s driven by the configuration defined in the processor and by the context created by the current seed execution. This design makes the processor the main building block of Discovery Ingestion.
They are intended to solve very specific tasks, which makes them re-usable and simple to integrate into any part of the configuration.
{
"type": "my-component-type",
"name": "My Component Processor",
"config": {
...
},
...
}
type
-
(Required, String) The name of the component to execute
name
-
(Required, String) The unique name to identify the configuration
description
-
(Optional, String) The description for the configuration
config
-
(Required, Object) The configuration for the corresponding action of the component. All configurations will be affected by the Expression Language
server
-
(Optional, UUID/Object) Either the ID of the server configuration for the integration or an object with the detailed configuration
Details
{ "server": { "id": "ba637726-555f-4c68-bfed-1c91f4803894", ... }, ... }
id
-
(Required, UUID) The ID of the server configuration for the integration
credential
-
(Optional, UUID) The ID of the credential to override the default authentication in the external service
labels
-
(Optional, Array of Objects) The labels for the configuration
Details
{ "labels": [ { "key": "My Label Key", "value": "My Label Value" }, ... ], ... }
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Hooks
Hooks are a type of Processor, detached from the record processing. They are related to the execution of pre- or post-actions associated with a long seed execution (e.g. creating indices, changing aliases…).
There are two types of hooks BEFORE_HOOK
and AFTER_HOOK
. The before hooks are executed at the beginning and the after hooks are executed at the end of the record processing of an execution.
They are useful to do some single pre- or post-actions associated with a long seed execution (e.g. creating indices, changing aliases…).
Data Processing with a State Machine
State Types
Processor State
Executes a single or multiple processors in sequence:
{
"myProcessorState": {
"type": "processor",
"processors": [
...
]
}
}
type
-
(Required, String) The type of state. Must be
processor
processors
-
(Required, Array of Objects) The processors to execute
Details
{ "stateA": { "type": "processor", "processors": [ { "id": <Processor ID>, ... } ], ... } }
id
-
(Required, UUID) The ID of the processor to execute
outputField
-
(Optional, String) The output field that wraps the result of the processor execution. Defaults to the one defined in the component
active
-
(Optional, Boolean)
false
to disable the execution of the processor. Default istrue
recordPolicy
-
(Optional, Object) The custom records configuration for the execution of the processor. Overrides the global one defined in the seed being executed. Can also be referred to as
record
Details
{ "id": <Processor ID>, "recordPolicy": { ... } }
id
-
(Optional, String) The expression that represents the ID of the record during its processing. If not provided, the plain ID of the record will be used
timeout
-
(Optional, Duration) The timeout for each record during its processing
retryable
-
(Optional, Boolean) Whether the processor should be retried if failed. Defaults to
false
errorPolicy
-
(Required, String) The error policy for records during their processing
-
FATAL
: A single failed document aborts the complete process -
FAIL
: Either marks the document as failed, or sends it to a configured error handling state (if any). Other records continue their execution as expected -
IGNORE
: Ignores the record processing error and its execution continues
-
outboundPolicy
-
(Optional, Object) The custom policy for groups of records sent for processing to the next state within the finite-state machine
Details
{ "id": <Processor ID>, "recordPolicy": { "outboundPolicy": { ... } } }
batchPolicy
-
(Optional, Object) The batch policy for outbound batches of records, once their processor execution is completed
Details
{ "id": <Processor ID>, "recordPolicy": { "outboundPolicy": { "batchPolicy": { ... } } } }
maxCount
-
(Required, Integer) The maximum record count in a batch before flushing. Defaults to
25
flushAfter
-
(Required, Duration) The timeout to flush if no other condition has been met. Default to
1m
next
-
(Optional, String) The next state for the HTTP Request Execution after the completion of the state. If not provided, the current one will be assumed as the final state
onError
-
(Optional, String) The state of the to use as fallback if the execution of the current state fails. If undefined, the current HTTP Request Execution will complete with the corresponding error message
The output of each processor will be stored in the JSON Data Channel wrapped in the configured outputField
:
{
"defaultFieldName": {
"outputKey": "outputValue"
}
}
Switch State
Use DSL Filters and JSON Pointers over the JSON Data Channel to control the flow of the execution given the first matching condition:
{
"mySwitchState": {
"type": "switch",
"options": [
...
],
"default": "myDefaultState"
}
}
type
-
(Required, String) The type of state. Must be
switch
options
-
(Required, Array of Objects) The options to evaluate in the state
Details
{ "type": "switch", "options": [ { "condition": { "equals": { "field": "/my/input/field", "value": "valueA" }, ... }, "state": "myFirstState" }, ... ], ... }
condition
-
(Required, Object) The predicate described as a DSL Filter over the JSON processing data
state
-
(Optional, String) The next state for the finite-state machine if the
condition
evaluates totrue
default
-
(Optional, String) The default state for the finite-state machine if no option evaluates to
true
Note
|
If no state for the finite-state machine is selected, the current one will be assumed as the final state. |
Seed Execution
Seed Executions API
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}'
$ curl --request POST 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}/halt'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}/audit'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}/config/seed'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}/config/pipeline/{pipelineId}'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}/config/processor/{processorId}'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}/config/server/{serverId}'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}/config/credential/{credentialId}'
$ curl --request GET 'ingestion-api:8080/v2/seed/{seedId}/execution/{executionId}/job/summary'
{
"id": "1ed146d8-e5d8-49df-9b65-b9f6396183ff",
"creationTimestamp": "2025-03-13T08:59:15Z",
"lastUpdatedTimestamp": "2025-03-13T09:46:59Z",
"triggerType": "MANUAL",
"status": "DONE",
"scanType": "FULL",
"stages": [
"BEFORE_HOOKS",
"INGEST",
"AFTER_HOOKS"
]
}
id
-
(UUID) A unique ID that identifies the seed execution
creationTimestamp
-
(Timestamp) The timestamp when the execution was triggered
lastUpdatedTimestamp
-
(Timestamp) The timestamp when the execution was last updated
triggerType
-
(String) The origin who triggered the execution. Currently, only
MANUAL
status
-
(String) The status of the execution
Details
-
CREATED
: The seed has been triggered, but the execution has not started -
RUNNING
: The seed is being executed -
HALTING
: The seed execution received aHALT
request, but some processing might still be happening -
HALTED
: The seed execution is completely halted -
DONE
: The seed completed its execution successfully -
FAILED
: The seed failed during its execution
-
scanType
-
(String) The scan type for the execution. Currently, only
FULL
stages
-
(Array of Strings) The competed stages of the execution
Details
-
BEFORE_HOOKS
: The hooks before record data processing (if any). -
INGEST
: The record data processing -
AFTER_HOOKS
: The hooks after record data processing (if any).
-
The Seed Execution represents the currently active Seed. It updates with each complete stage of the execution.
When the user starts the execution of a seed, a copy of every user-created configuration related to the seed is stored for use during the entirety of the seed execution:
-
The configuration of the seed being executed
-
The configuration of any hook that’s used by the seed in execution
-
The configuration of the pipeline assigned to the seed
-
The configuration of any processor that’s used in any step of the pipeline
-
The configuration of any server that’s used either by the seed, or any of the related processors
-
The configuration of any credential that’s used either by the seed, or any of the related processors, or any of the related servers.
This means that changing the configuration of the entities used during the execution of a seed will have no effect on the outcome of it. This is done to avoid unexpected and inconsistent behaviors.
Note
|
For security reasons, when the snapshot of the configuration of a credential is stored, the associated secrets are not included in it. A reference to the underlying secret is saved instead. This means that changes applied to secrets mid-seed execution can unpredictably affect the current execution |
Every generated record is tagged with an corresponding action to apply during a specific execution:
-
CREATE
: It is a new record for the seed. -
UPDATE
: The record was processed during a previous seed execution, but its content has changed. -
DELETE
: The record is marked to be deleted.
During a seed execution, every record has a status that changes as the seed is processed:
-
PROCESSING
: The record was detected and is currently being processed. -
FAILED
: The processing of the record failed. -
DONE
: The record was successfully processed.
Record Data Channels
During a seed execution, records can produce data in JSON format, as well as binary files.
JSON data is stored in a dedicated bucket within Discovery Staging and can be later referenced using JSON Pointers.
Binary data, such as images, videos or PDFS, is stored in a dedicated container inside the Object Storage.
Record Batches
Seeds can configure how batches are flushed through the finite-state machine.
The seed configuration and its override in the processor state defines the boundaries of the batch, where the first condition to be met will trigger the flush process where all the records in the batch to the next stage in the pipeline (such as next processor, next state from the state machine or even the end of the pipeline).
Expression Language Extensions
Variable | Description | Example |
---|---|---|
|
The ID of the seed in execution |
|
|
|
|
|
The name of the seed |
|
|
The description of the seed |
|
|
The labels of the seed, grouped by key |
|
|
The properties to use during placeholders resolution |
|
|
The ID of the seed execution |
|
|
The start time of the seed execution |
|
|
The scan type of the seed execution |
|
|
Trigger type of the seed execution |
|
|
The properties to use during placeholders resolution |
|
|
The ID of the processor |
|
|
|
|
|
The name of the processor |
|
|
The description of the processor |
|
|
The labels of the processor, grouped by key |
|
|
The ID of the pipeline |
|
|
The name of the pipeline |
|
|
The description of the pipeline |
|
|
The labels of the pipeline, grouped by key |
|
|
The ID of a generated record from a seed execution |
|
|
The action of a generated record from a seed execution |
|
|
The parent ID of a generated record from a seed execution |
|
Components
Elasticsearch
Uses the Elasticsearch integration to invoke the Elasticsearch API.
Scan Action: scan
, search-after
Seed that uses the Search after parameter to retrieve all the documents from an index.
{
"type": "elasticsearch",
"name": "My Elasticsearch Scan Action",
"config": {
"action": "search-after",
"index": "my-index",
"sort": [
...
]
},
"pipeline": <Pipeline>,
"server": <Elasticsearch Server>,
...
}
index
-
(Required, Array of String) The list of Elasticsearch indexes to search on
sort
-
(Required, Array of Objects) The list of sort options
Details
[ { "<field>": "<sort_value>" }, { "<field>": { "<sort_option>": "<sort_value>", ... } }, ... ]
query
-
(Optional, Object) The query body for the search request. If not provided, a match all query will be used instead
size
-
(Optional, Integer) The maximum number of hits to return. Defaults to
100
metadata
-
(Optional, Boolean) Whether to include the metadata or no. Defaults to
false
Hook Action: aliases
Hook that executes a native Elasticsearch query to the Aliases API.
{
"type": "elasticsearch",
"name": "My Elasticsearch Hook Action",
"config": {
"action": "aliases",
"actions": [
...
]
},
"server": <Elasticsearch Server>,
...
}
actions
-
(Required, Array) The request body
Note
|
Currently, if at least one of the actions on the list is successful, the whole request will be successful. On the other side, the request only fails if none of them is successful. |
Hook Action: create-index
Hook that executes a native Elasticsearch query to the Create Index API.
{
"type": "elasticsearch",
"name": "My Elasticsearch Hook Action",
"config": {
"action": "create-index",
"index": "my-index",
"body": {
...
}
},
"server": <Elasticsearch Server>,
...
}
index
-
(Required, String) The index name
body
-
(Required, Object) The request body
waitForActiveShards
-
(Optional, Integer) The number of copies of each shard that must be active before proceeding with the operation */
masterTimeout
-
(Optional, String) The period to wait for the master node
timeout
-
(Optional, String) The period to wait for a response
Processor Action: bulk
, hydrate
Processor that executes a bulk request to the Elasticsearch Bulk API
{
"type": "elasticsearch",
"name": "My Elasticsearch Processor Action",
"config": {
"action": "hydrate",
"index": "my-index",
"data": "#{ data('/my/record') }",
...
},
"server": <Elasticsearch Server>,
...
}
index
-
(Required, String) The Elasticsearch index to perform the action
data
-
(Required, Object) The data to hydrate
allowOverride
-
(Optional, Boolean) Wheter allow overriding an existing document or not. Defaults to
true
bulk
-
(Optional, Object) The bulk configuration
Details
pipeline
-
(Optional, String) The ID of the Elasticsearch Pipeline to use to preprocess incoming documents
routing
-
(Optional, String) Used to route operations to a specific shard
waitForActiveShards
-
(Optional, String) The number of copies of each shard that must be active before proceeding with the Elasticsearch operation
timeout
-
(Optional, String) The period of time to wait for some operations
requireAlias
-
(Optional, Boolean) Whether the request’s actions must target an index alias
refresh
-
(Optional, String) The refresh type. Supported are:
TRUE
,FALSE
andWAIT_FOR
Details
WAIT_FOR
: Waits for a refresh to make the Elasticsearch operation visible to searchTRUE
: Refreshes the affected shards to make the Elasticsearch operation visible to searchFALSE
: Do nothing with the refreshes flush
-
(Optional, Object) The flush configuration
Details
maxOperations
-
(Optional, Integer) The maximum number of operations. Defaults to
1000
maxConcurrent
-
(Optional, Integer) The maximum number of concurrent requests waiting to be executed by Elasticsearch. Deafult is
1
maxSize
-
(Optional, String) The maximum size of the bulk request. Defaults to
5MB
flushInterval
-
(Optional, Duration) The interval between flushes
Insights
The Insights component is designed to provide various actions that generate different metrics or information for later analysis.
Processor Action: engine-score:non-contextual
Processor that calculates the query score based on the result position
metadata field value only.
The engine scoring action is designed to power the Engine Scoring Dashboards by evaluating the quality of a search engine’s results in terms of precision
and recall
.
{
"type": "insights",
"name": "My Engine-Score Non-Contextual Processor Action",
"config": {
"action": "engine-score:non-contextual",
"resultPosition": "25",
...
}
}
resultPosition
-
(Required, Integer) Field containing the position of the search result to be used in the engine scoring calculation.
kfactor
-
(Optional, Double) Value between 0 and 1 used to determine the importance of the relevant records. Defaults to
0.9
. startPosition
-
(Optional, Integer) Indicate the start position to take into account when doing the K-factor calculation. Defaults to
1
. precision
-
(Optional, Integer) Number of digits to return after the decimal point for the score value. Defaults to
4
.
MongoDB
Performs different actions on MongoDB collections, either reading or writing data depending on the action.
Scan Action: scan
Seed that finds all documents in a MongoDB collection and creates a record for each document found.
{
"type": "mongo",
"name": "My Mongo Scan Action",
"config": {
"action": "scan",
"database": "my-database",
"collection": "my-collection"
},
"pipeline": <Pipeline ID>,
"server": <Mongo Server ID>,
...
}
database
-
(Required, String) The database to connect to
collection
-
(Required, Boolean) The collection whose documents are turned to records
Processor Action: bulk
, hydrate
Processor that stores the records in the pipeline via Bulk Write operations on the specified MongoDB collection.
{
"type": "mongo",
"name": "My Mongo Processor Action",
"config": {
"database": "my-database",
"collection": "my-collection",
"allowOverride": true,
...
},
"server": <Mongo Server ID>,
...
}
database
-
(Required, String) The database to connect to
collection
-
(Required, String) The collection where the records are bulk written
allowOverride
-
(Optional, Boolean) Whether the records should be stored if there is one already with their ID. Defaults to
true
data
-
(Optional, Object) The data to store on the collection. If not provided, it will store the data generated on a previous processor
flush
-
(Optional, Object) The flush configuration
Details
maxCount
-
(Optional, Integer) The maximum number of records in the bulk before flusing
maxWeight
-
(Optional, Long) The maximum weight allowed in a bulk request
flushAfter
-
(Optional, Duration) The time to wait before flushing a bulk request
OpenAI
Processor Action: embeddings
Processor that execute embedding requests to the OpenAI API.
{
"type": "openai",
"name": "My OpenAI Processor Action",
"config": {
"action": "embeddings",
"model": "openai-model",
"input": "#{ data('/my/input') }",
...
},
"server": <OpenAI Server>,
...
}
model
-
(Required, String) The OpenAI model to use
input
-
(Required, String) The input to generate the embeddings
user
-
(Optional, String) An unique identifier representing the end-user
flush
-
(Optional, Object) The flush configuration
Details
maxCount
-
(Optional, Integer) The maximum number of records in the bulk before flusing
maxWeight
-
(Optional, Long) The maximum weight allowed in a bulk request
flushAfter
-
(Optional, Duration) The time to wait before flushing a bulk request
Script
Uses the Script Engine to execute a script for advanced handling of the execution data. Supports multiple scripting languages and provides tools for JSON manipulation and for logging.
Processor Action: process
Processor that executes a script that interacts with the record data generated from a seed execution.
{
"type": "script",
"name": "My Script Processor Action",
"config": {
"action": "process",
"script": <Script>,
...
}
}
action
-
(Required, String) The default script action. Must be
process
language
-
(Optional, String) The language of the script. One of the supported script languages. Defaults to
groovy
script
-
(Required, String) The script to run
Staging
Interacts with buckets and content from Discovery Staging.
Scan Action: scan
, scroll
Seed that scrolls throughout a bucket, and creates the records to be ingested into the pipeline.
{
"type": "staging",
"name": "My Staging Scan Action",
"config": {
"action": "scroll",
"bucket": "my-bucket",
...
},
"pipeline": <Pipeline ID>,
...
}
bucket
-
(Required, String) The bucket to scroll
metadata
-
(Optional, Boolean) Whether to include the metadata or not. Defaults to
false
size
-
(Optional, Integer) The size of the contents result
filter
-
(Optional, DSL Filter) The filter to apply when scrolling
projection
-
(Optional, Projection) The projection to apply when scrolling
actions
-
(Optional, Array of Strings) The actions from the content to be scanned. Defaults to
STORE
andDELETE
parentId
-
(Optional, String) The parent ID to match
Hook Action: create-bucket
Hook that creates a bucket with the given configuration.
{
"type": "staging",
"name": "My Staging Action",
"config": {
"action": "create-bucket",
...
}
...
}
bucket
-
(Required, String) The bucket name
config
-
(Optional, Object) The bucket configuration
indices
-
(Optional, Array of Objects) The indices for the bucket
Details
{ "indices": [ { "name": "myIndexA", "fields": [ ... ] } ] }
name
-
(Required, String) The index name
fields
-
(Required, Array of Objects) The index fields. Key/Value pairs with the field name, and the corresponding sort ordering, either
ASC
orDESC
Details
{ "fields": [ { "fieldA": "ASC" }, { "fieldB": "DESC" } ] }
Processor Action: store
, hydrate
Processor that stores the records into the given bucket.
{
"type": "staging",
"name": "My Staging Processor Action",
"config": {
"action": "store",
"bucket": "my-bucket",
...
}
}
bucket
-
(Required, String) The bucket where the documents will be stored
parentId
-
(Optional, String) The parent ID of the documents to store
data
-
(Optional, Object) The data to store on the bucket. If not provided, it will store the data generated on a previous processor
Template
Uses the Template Engine to process dynamic data provided by the user to generate a text output based on a custom template.
Processor Action: process
Processor that processes the provided template with the defined configuration.
{
"type": "template",
"name": "My Template Processor Action",
"config": {
"action": "process",
...
}
}
template
-
(Required, String) The template to process
bindings
-
(Required, Object) The bindings to replace in the template
Details
{ "bindingA": "#{ data('/my/binding/field') }", ... }
Can be later referenced in a template:
My bindingA value is ${bindingA}
outputFormat
-
(Optional, String) The output format of the precessed template. Supported formats are:
JSON
andPLAIN
. Defaults toPLAIN
Vespa
Uses the Vespa integration to send HTTP requests to a Vespa service.
Action: store
Processor that upsert or delete documents from a Vespa app using the Document API.
{
"type": "vespa",
"name": "My Vespa Store Action",
"config": {
"action": "store",
"namespace": "my-namespace",
"documentType": "my-document-type",
...
},
"server": <Vespa Server>,
...
}
namespace
-
(Required, String) The namespace of the vespa document
documentType
-
(Required, String) The document type of the vespa document. Described in the schemas.sd.
data
-
(Optional, Object) The fields of the vespa document. If not provided, it will store the data generated on a previous processor
Voyage AI
Uses the Voyage AI integration to send requests to the Voyage AI API. Supports multiple actions for different endpoints of the service.
Action: embeddings
Processor that given input string and other arguments such as the preferred model name, it returns a response containing a list of embeddings. See Voyage AI Embeddings and the API Text embedding models endpoint.
{
"type": "voyage-ai",
"name": "My Embeddings Action",
"config": {
"action": "embeddings",
"model": "voyage-large-2",
"input": "#{ data('/input') }",
...
},
"server": <Voyage AI Server>,
...
}
model
-
(Required, String) The model to use for the request. See models.
input
-
(Required, String) The input document to be embedded.
truncation
-
(Optional, Boolean) Whether to truncate the input to satisfy the context length limit on the query and the documents. Defaults to
true
. inputType
-
(Optional, String) Type of the input text. One of:
QUERY
orDOCUMENT
. Defaults tonull
. outputDimension
-
(Optional, Integer) The number of dimensions for resulting output embeddings. Defaults to
null
. outputDatatype
-
(Optional, String) The data type for the embeddings to be returned. One of:
FLOAT
,INT8
,UINT8
,BINARY
orUBINARY
. Default toFLOAT
. encodingFormat
-
(Optional, String) Format in which the embeddings are encoded. One of:
base64
. Defaults tonull
. flush
-
(Optional, Object) The flush configuration
Details
maxCount
-
(Optional, Integer) The maximum number of records in the bulk before flusing
maxWeight
-
(Optional, Long) The maximum weight allowed in a bulk request
flushAfter
-
(Optional, Duration) The time to wait before flushing a bulk request
Action: multimodal-embeddings
Processor that given an input list of multimodal inputs consisting of text, images, or an interleaving of both modalities and other arguments such as the preferred model name, it returns a response containing a list of embeddings. See Voyage AI Multimodal Embedding and the API Text multimodal embedding models endpoint.
{
"type": "voyage-ai",
"name": "My Multimodal Embeddings Action",
"config": {
"action": "multimodal-embeddings",
"model": "voyage-multimodal-3",
"input": "#{ data('/input') }",
...
},
"server": <Voyage AI Server>,
...
}
model
-
(Required, String) The model to use for the request. See models.
input
-
(Required, Object) The input object to be embedded.
Details
type
-
(Required, String) The type. One of:
text
,image_url
orimage_base64
. text
-
(Optional, String) The text if the type
text
is choosen.Details
{ "type": "text", "text": "This is a banana." }
imageUrl
-
(Optional, String) The image url if the type
image_url
is choosen.Details
{ "type": "image_url", "imageUrl": "https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg" }
imageBase64
-
(Optional, Object) The base 64 encoded image if the type
image_base64
is choosen.Details
{ "type": "image_base64", "imageBase64": { "mediaType": "image/jpeg", "base64": true, "data": "/9j/4AAQSkZJRgABAQEAYABgAAD(...)" } }
mediaType
-
(Required, String) The data media type. Supported media types are:
image/png
,image/jpeg
,image/webp
, andimage/gif
. base64
-
(Required, Boolean) Whether the data is encoded in Base64.
data
-
(Required, String) The data itself.
truncation
-
(Optional, Boolean) Whether to truncate the inputs to fit within the context length. Defaults to
true
. inputType
-
(Optional, String) Type of the input text. One of:
QUERY
orDOCUMENT
. Defaults tonull
. outputEncoding
-
(Optional, String) Format in which the embeddings are encoded. One of:
base64
. Defaults tonull
. flush
-
(Optional, Object) The flush configuration
Details
maxCount
-
(Optional, Integer) The maximum number of records in the bulk before flusing
maxWeight
-
(Optional, Long) The maximum weight allowed in a bulk request
flushAfter
-
(Optional, Duration) The time to wait before flushing a bulk request
Discovery QueryFlow
Discovery QueryFlow is a lightweight tool that allows users to create Custom REST Endpoints that interacts with external services with minimum overhead. It enables features such as:
-
Flexibility to represent complex query processing scenarios through a finite-state machine.
-
On-the-fly tuning of configurations for a fast feedback loop.
-
Extensive component library for advanced interpretation of the HTTP Request.
Custom REST Endpoints
Endpoints API
$ curl --request POST 'queryflow-api:8088/v2/endpoint' --data '{ ... }'
$ curl --request GET 'queryflow-api:8088/v2/endpoint'
$ curl --request GET 'queryflow-api:8088/v2/endpoint/{id}'
$ curl --request PUT 'queryflow-api:8088/v2/endpoint/{id}' --data '{ ... }'
Note
|
The type of an existing endpoint can’t be modified. |
$ curl --request DELETE 'queryflow-api:8088/v2/endpoint/{id}'
$ curl --request PATCH 'queryflow-api:8088/v2/endpoint/{id}/enable'
$ curl --request PATCH 'queryflow-api:8088/v2/endpoint/{id}/disable'
$ curl --request POST 'queryflow-api:8088/v2/endpoint/{id}/clone?name=clone-new-name&uri=clone-new-uri&method=clone-new-method'
Query Parameters
method
-
(Required, String) The HTTP Method of the new Endpoint
uri
-
(Required, String) The URI of the new Endpoint
name
-
(Required, String) The name of the new Endpoint
$ curl --request POST 'queryflow-api:8088/v2/endpoint/search' --data '{ ... }'
Body
The body payload is a DSL Filter to apply to the search
$ curl --request GET 'queryflow-api:8088/v2/endpoint/autocomplete?q=value'
Query Parameters
q
-
(Required, String) The query to execute the autocomplete search
An endpoint is the definition of the finite-state machine for query processing:
{
"uri": "/my/custom/endpoint",
"httpMethod": "GET",
"name": "My Custom Endpoint",
"initialState": "stateA",
"states": {
"stateA": {
...
},
"stateB": {
...
}
},
"timeout": "60s"
...
}
httpMethod
-
(Required, String) The HTTP method for the custom endpoint. Must be one of:
GET
,POST
,PUT
,DELETE
uri
-
(Required, String) The URI path for the custom endpoint (e.g.
/my/path
)The URI can contain variables in any of its paths (e.g.
/my/{pathA}
,/{pathA}/{pathB}
). If present, the values for every placeholder will be available as part of the metadata of the HTTP request and can be accessed in the configuration of the processors with the help of the Expression LanguageDetails
{ "uri": "/my/{pathA}/endpoint", "httpMethod": "GET", "name": "My Custom Endpoint", "initialState": "stateA", "states": { ... }, "timeout": "60s" ... }
{ "type": "my-component-type", "name": "My Component Processor", "config": { "myProperty": "#{ data('/httpRequest/pathVariables/pathA') }" } ... }
type
-
(Required, String) The
content-type
of the HTTP response for the custom endpoint. Eitherdefault
forapplication/json
orstream
fortext/event-stream
. Defaults todefault
name
-
(Required, String) The unique name to identify the custom endpoint
description
-
(Optional, String) The description for the configuration
initialState
-
(Required, String) The state to use as starting point of the custom endpoint as defined in the
states
field states
-
(Required, Object) The states associated to the endpoint
properties
-
(Optional, Object) The properties to be referenced with the help of the Expression Language in the configuration of the processors
Details
{ "uri": "/my/custom/endpoint", "httpMethod": "GET", "name": "My Custom Endpoint", "initialState": "stateA", "states": { ... }, "properties": { "keyA": "valueA" }, "timeout": "60s" ... }
{ "type": "my-component-type", "name": "My Component Processor", "config": { "myProperty": "#{ endpoint.properties.keyA') }" }, ... }
timeout
-
(Required, Duration) The timeout for the execution of the custom endpoint
labels
-
(Optional, Array of Objects) The labels for the configuration
Details
{ "labels": [ { "key": "My Label Key", "value": "My Label Value" }, ... ], ... }
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Note
|
Loops are not forbidden as they might represent valid use cases depending on the configuration of the states. To avoid getting stuck in infinite loops, all endpoints are required to be configured with a timeout. |
Processors
Processors API
$ curl --request POST 'queryflow-api:8080/v2/processor' --data '{ ... }'
$ curl --request GET 'queryflow-api:8088/v2/processor'
$ curl --request GET 'queryflow-api:8088/v2/processor/{id}'
$ curl --request PUT 'queryflow-api:8088/v2/processor/{id}' --data '{ ... }'
Note
|
The type of an existing processor can’t be modified. |
$ curl --request DELETE 'queryflow-api:8088/v2/processor/{id}'
$ curl --request POST 'queryflow-api:8088/v2/processor/{id}/clone?name=clone-new-name'
Query Parameters
name
-
(Required, String) The name of the new Processor
$ curl --request POST 'queryflow-api:8088/v2/processor/search' --data '{ ... }'
Body
The body payload is a DSL Filter to apply to the search
$ curl --request GET 'queryflow-api:8088/v2/processor/autocomplete?q=value'
Query Parameters
q
-
(Required, String) The query to execute the autocomplete search
Each component is stateless, and it’s driven by the configuration defined in the processor and by the context created by the current HTTP Request. This design makes the processor the main building block of Discovery QueryFlow.
They are intended to solve very specific tasks, which makes them re-usable and simple to integrate into any part of the configuration.
{
"type": "my-component-type",
"name": "My Component Processor",
"config": {
...
},
...
}
type
-
(Required, String) The name of the component to execute
name
-
(Required, String) The unique name to identify the configuration
description
-
(Optional, String) The description for the configuration
config
-
(Required, Object) The configuration for the corresponding action of the component. All configurations will be affected by the Expression Language
snippets
-
(Optional, Object) The snippets to be referenced in the configuration with the help of the Expression Language
Details
{ "type": "my-component-type", "name": "My Component Processor", "config": { "myProperty": "#{ snippets.snippetA }" }, "snippets": { "snippetA": { ... } }, ... }
NoteAvoid the usage of any reserved operator such as hyphens in the name of a snippet.
server
-
(Optional, UUID/Object) Either the ID of the server configuration for the integration or an object with the detailed configuration
Details
{ "server": { "id": "ba637726-555f-4c68-bfed-1c91f4803894", ... }, ... }
id
-
(Required, UUID) The ID of the server configuration for the integration
credential
-
(Optional, UUID) The ID of the credential to override the default authentication in the external service
labels
-
(Optional, Array of Objects) The labels for the configuration
Details
{ "labels": [ { "key": "My Label Key", "value": "My Label Value" }, ... ], ... }
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Query Processing with a State Machine
State Types
Processor State
Executes a single or multiple processors in sequence:
{
"myProcessorState": {
"type": "processor",
"processors": [
...
]
}
}
type
-
(Required, String) The type of state. Must be
processor
processors
-
(Required, Array of Objects) The processors to execute
Details
{ "stateA": { "type": "processor", "processors": [ { "id": <Processor ID>, ... } ], ... } }
id
-
(Required, UUID) The ID of the processor to execute
outputField
-
(Optional, String) The output field that wraps the result of the processor execution. Defaults to the one defined in the component
continueOnError
-
(Optional, Boolean) If
true
and the processor execution fails, its HTTP response will be stored in its corresponding Data Channel while the other processors in the state continue with their normal execution. Iffalse
, the error will either be handled by theonError
state, or be spread to its invoker. Defaults tofalse
active
-
(Optional, Boolean)
false
to disable the execution of the processor
next
-
(Optional, String) The next state for the HTTP Request Execution after the completion of the state. If not provided, the current one will be assumed as the final state
onError
-
(Optional, String) The state of the to use as fallback if the execution of the current state fails. If undefined, the current HTTP Request Execution will complete with the corresponding error message.
The output of each processor will be stored in the JSON Data Channel wrapped in the configured outputField
:
{
"defaultFieldName": {
"outputKey": "outputValue"
}
}
If a processor produces a Server-Sent Event, each batch of the received chunks will be stored in the SSE Data Channel as a list wrapped in the configured outputField
:
{
"defaultFieldName": [
{
"name": "testFieldName",
"data": "some "
},
{
"name": "testFieldName",
"data": "chunked d"
},
{
"name": "testFieldName",
"data": "ata"
},
{
"name": "testFieldName",
"data": "."
}
]
}
Parallel Endpoint State
Executes a single or multiple Custom REST Endpoints in parallel:
{
"myParallelEndpointState": {
"type": "endpoint",
"endpoints": {
...
}
}
}
type
-
(Required, String) The type of state. Must be
endpoint
endpoints
-
(Required, Object) The endpoints to execute in parallel
Details
{ "type": "endpoint", "endpoints": { "myEndpointA": { "id": <Endpoint ID>, ... }, ... }, ... }
id
-
(Required, UUID) The ID of the endpoint to execute in parallel
httpRequest
-
(Optional, Object) The custom HTTP Request to invoke the endpoint. Defaults to the same HTTP Request as to one from the invoker. All fields in the
httpRequest
can be configured with the help of the Expression LanguageDetails
{ "httpRequest": { "uri": "/my-endpoint", "method": <HTTP Method>, ... } }
uri
-
(Required, String) The URI paths for the HTTP Request
method
-
(Required, String) The HTTP Method for the HTTP Request
headers
-
(Optional, Object) The headers for the HTTP Request. The value of each header can be either a single String, or an Array of Strings
Details
{ "httpRequest": { "uri": "/my-endpoint", "headers": { "header-a": "value-a", "header-b": [ "value-b-1", "value-b-2" ] } } }
queryParams
-
(Optional, Object) The query parameters for the HTTP Request. The value of each query parameter can be either a single String, or an Array of Strings
Details
{ "httpRequest": { "uri": "/my-endpoint", "queryParams": { "param-a": "value-a", "param-b": [ "value-b-1", "value-b-2" ] } } }
cookies
-
(Optional, Array of Objects) The list of cookies for the HTTP Request
Details
{ "httpRequest": { "uri": "/my-endpoint", "cookies": [ { "name": "cookie-name-a", "path": "/some/path/a", "value": "cookie-value-a", "domain": "cookie-domain-a", "maxAge": 1234 }, { "name": "cookie-name-b", "value": "cookie-value-b". ... } ] } }
name
-
(Required, String) The name of the cookie
value
-
(Required, String) The value of the cookie
path
-
(Optional, String) The path of the cookie
domain
-
(Optional, String) The domain of the cookie
maxAge
-
(Optional, Integer) The maximum age of the cookie
body
-
(Optional, Object) The body of the HTTP Request
continueOnError
-
(Optional, Boolean) If
true
and the endpoint execution fails, its HTTP response will be stored in its corresponding Data Channel while the other endpoints in the state continue with their normal execution. Iffalse
, the error will either be handled by theonError
state, or be spread to its invoker. Defaults tofalse
active
-
(Optional, Boolean)
false
to disable the execution of the endpoint. If all endpoints are disabled, the state output will be empty.
next
-
(Optional, String) The next state for the HTTP Request Execution after the completion of all configured endpoints. If not provided, the current one will be assumed as the final state and a
204 - No Content
HTTP Response will be returned onError
-
(Optional, String) The state of the to use as fallback if the execution of the current state fails. If undefined, the current HTTP Request Execution will complete with the corresponding error message.
Note
|
Circular references with endpoints in endpoint states are not allowed. |
Note
|
Endpoints with response type |
The output of the state stored in the JSON Data Channel is a collection with each HTTP Response:
{
"myParallelEndpointState": {
"myEndpointA": {
"statusCode": <HTTP Status Code>,
...
},
...
}
}
statusCode
-
(Integer) The HTTP Status Code of the HTTP Response
headers
-
(Object) The headers of the HTTP Response. The value of each header can be either a single String, or an Array of Strings
Details
{ "httpResponse": { "headers": { "header-a": "value-a", "header-b": [ "value-b-1", "value-b-2" ] }, ... } }
body
-
(Object) The body of the HTTP Response
Switch State
Use DSL Filters and JSON Pointers over the JSON Data Channel to control the flow of the execution given the first matching condition:
{
"mySwitchState": {
"type": "switch",
"options": [
...
],
"default": "myDefaultState"
}
}
type
-
(Required, String) The type of state. Must be
switch
options
-
(Required, Array of Objects) The options to evaluate in the state
Details
{ "type": "switch", "options": [ { "condition": { "equals": { "field": "/httpRequest/queryParams/input", "value": "valueA" }, ... }, "state": "myFirstState" }, ... ], ... }
condition
-
(Required, Object) The predicate described as a DSL Filter over the JSON processing data
state
-
(Optional, String) The next state for the finite-state machine if the
condition
evaluates totrue
default
-
(Optional, String) The default state for the [queryflow-state-machine-request_finite-state machine_] if no option evaluates to
true
Note
|
If no state for the finite-state machine is selected, the current one will be assumed as the final state. |
Response State
Final state that formats the expected HTTP response when the endpoint is configured to return application/json
.
{
"myResponseState": {
"type": "response",
"statusCode": <HTTP Status Code>,
...
}
}
type
-
(Required, String) The type of state. Must be
response
statusCode
-
(Required, Integer) The HTTP status code in the range of
[200, 400[
for the response. Defaults to200
headers
-
(Optional, Object) The HTTP headers to return as part of the response
Details
{ "type": "response", "headers": { "Etag": "#{ data('/my/etag/value') }", ... }, ... }
body
-
(Optional, Object) The HTTP JSON body to return as part of the response
Details
{ "type": "response", "body": { "keyA": "#{ data('/my/data') }", ... }, ... }
snippets
-
(Optional, Object) The snippets to be referenced in the configuration with the help of the Expression Language
Details
{ "type": "response", "body": { "myProperty": "#{ snippets.snippetA }" }, "snippets": { "snippetA": { ... } }, ... }
NoteAvoid the usage of any reserved operator such as hyphens in the name of a snippet.
Error State
Final state that returns an error message with error code 9999
and custom HTTP Status Code and message.
{
"myErrorState": {
"type": "error",
"statusCode": <HTTP Status Code>,
"message": <String>
}
}
type
-
(Required, String) The type of state. Must be
error
statusCode
-
(Required, Integer) The HTTP status code in the range of
[400, 600[
for the error message. Defaults to500
message
-
(Required, String) The message to display with the error. Defaults to
The request has failed due to reaching a configured error endpoint state
HTTP Request Execution
Data Channels
Some state types produce data that is available for subsequent states.
JSON Data Channel
The JSON Data Channel handles output in JSON format which can be later referenced using JSON Pointers.
The execution starts with the metadata of the invocation:
{
"id": "55d22c60-6d61-41ce-b8b1-c0f1acd6e5e4",
"httpRequest": {
...
},
"pageable": {
...
},
"properties": {
...
}
}
id
-
(UUID) An auto-generated ID for the request
httpRequest
-
(Object) The HTTP Request tat triggered the execution
Details
{ "httpRequest": { "uri": "/my-endpoint", "method": <HTTP Method>, ... } }
uri
-
(Required, String) The URI paths for the HTTP Request
method
-
(Required, String) The HTTP Method for the HTTP Request
headers
-
(Optional, Object) The headers for the HTTP Request. The value of each header can be either a single String, or an Array of Strings
Details
{ "httpRequest": { "uri": "/my-endpoint", "headers": { "header-a": "value-a", "header-b": [ "value-b-1", "value-b-2" ] } } }
queryParams
-
(Optional, Object) The query parameters for the HTTP Request. The value of each query parameter can be either a single String, or an Array of Strings
Details
{ "httpRequest": { "uri": "/my-endpoint", "queryParams": { "param-a": "value-a", "param-b": [ "value-b-1", "value-b-2" ] } } }
cookies
-
(Optional, Array of Objects) The list of cookies for the HTTP Request
Details
{ "httpRequest": { "uri": "/my-endpoint", "cookies": [ { "name": "cookie-name-a", "path": "/some/path/a", "value": "cookie-value-a", "domain": "cookie-domain-a", "maxAge": 1234 }, { "name": "cookie-name-b", "value": "cookie-value-b". ... } ] } }
name
-
(Required, String) The name of the cookie
value
-
(Required, String) The value of the cookie
path
-
(Optional, String) The path of the cookie
domain
-
(Optional, String) The domain of the cookie
maxAge
-
(Optional, Integer) The maximum age of the cookie
body
-
(Optional, Object) The body of the HTTP Request
pageable
-
(Object) The pagination request parameters
Details
{ "page": 0, "size": 25, "sort": [ ... ] }
page
-
(Integer) The page number
size
-
(Integer) The size of the page
sort
-
(Array of Objects) The sort definition for the page
Details
{ "property" : "fieldA", "direction" : "ASC" }
property
-
(String) The property where the sort was applied
direction
-
(String) The direction where the sort was applied. Either
ASC
orDESC
properties
-
(Object) The execution properties as configured in the Endpoint
Following outputs will be added in order. The data will never override anything previously generated.
When searching for a path, the JSON Pointer will be evaluated against the most recent output. If it is a match, the node is returned. Otherwise, the search continues with the previous ones until reaching the original HTTP request.
Note
|
If multiple states generate the same data structure and the one in the back needs to be referenced, the root name of the output can be customized. |
SSE Data Channel
Some components produce text/event-stream
content type. This data is later emitted to the corresponding HTTP Response for the request.
Note
|
It is possible to have multiple configurations producing events |
Expression Language Extensions
Variable | Description | Example |
---|---|---|
|
The ID of the endpoint in execution |
|
|
The HTTP method of the endpoint in execution |
|
|
The URI paths of the endpoint in execution |
|
|
The name of the endpoint in execution |
|
|
The description of the endpoint in execution |
|
|
The properties of the endpoint in execution |
|
|
The labels of the endpoint in execution, grouped by key |
|
|
The ID of the processor in execution during a processor state |
|
|
The type of the processor in execution during a processor state |
|
|
The name of the processor in execution during a processor state |
|
|
The description of the processor in execution during a processor state |
|
|
The labels of the processor in execution during a processor state, grouped by key |
|
|
The unique ID of the current HTTP request |
|
|
The timestamp when the current HTTP request started |
|
Function | Description | Example |
---|---|---|
|
Finds a specific node within the JSON processing channel using a JSON Pointer |
|
HTTP Response
By default, all Endpoints return a response of application/json
content type. This HTTP response is either:
-
The configured response state or error state.
-
The most recent entry in the JSON Data Channel, where:
-
If the root of the document is named
httpResponse
, thestatusCode
,headers
andbody
will be used accordinglyDetails
{ "httpResponse": { "statusCode": 200, "headers": { "header-a": "value-a", "header-b": [ "value-b-1", "value-b-2" ] }, "body": { ... } } }
statusCode
-
(Integer) The HTTP Status Code of the HTTP Response
headers
-
(Object) The headers of the HTTP Response. The value of each header can be either a single String, or an Array of Strings
Details
{ "httpResponse": { "headers": { "header-a": "value-a", "header-b": [ "value-b-1", "value-b-2" ] }, ... } }
body
-
(Object) The body of the HTTP Response
-
If the node is any other from a processor state, the body will be unwrapped from the
outputField
and the status code will be 200.
-
-
In any other case, the response will be
204 - No Content
.
The Endpoints of type stream
return a 207 - Multi-Status
response of type text/event-stream
. Their body consists on the data of every emitted Server-Sent Event (SSE) and stored in the SSE Data Channel.
Details
name: <Output Field Name>
data: <Chunk>
name
-
(String) The configured
outputFieldName
for the processor data
-
(String) The chunked data from the stream
Invoking a Configured Endpoint
Once an Endpoint is fully configured, it can be invoked as any other REST API by calling its HTTP Method/URI under the /api
root path:
$ curl --request GET 'queryflow-api:8088/v2/api/my-endpoint?param=value'
The HTTP Response corresponds to the execution of the finite-state machine for the context created by the HTTP Request:
{
"myResponse": {
...
}
}
Given that the definition of the Endpoint can grow in complexity, the risk of something breaking increases: a condition failed, the output was not as expected, a parameter was wrong…
In order to identify the problem, the /debug
root path offers a complete tracing of the execution for the endpoint. Each one of the states, their output, their errors and the overall step-by-step followed by the finite-state machine will be displayed:
$ curl --request GET 'queryflow-api:8088/v2/debug/my-endpoint?param=value'
{
"duration": 692,
"execution": [
{
"state": "stateA",
...
},
...
]
}
duration
-
(Integer) The duration of the HTTP Request Execution in milliseconds
execution
-
(Array of Objects) The details of the finite-state machine invocation. Each entry will depend on the state type in execution
Details
state
-
(String) The name of the state in execution. The same state can be invoked multiple times with different results
Processor State
{ "state": "myProcessorState", "result": [ { "processor": <Processor ID>, ... }, ... ] }
result
-
(Array of Objects) The result of the state execution
Details
processor
-
(UUID) The ID of the executed processor
JSON Data Channel
{ "processor": <Processor ID>, "output": { "myOutputField": { ... } }, "duration": 12 }
output
-
(Object) The data stored in the JSON Data Channel after wrapping the execution of the processor in the configured
outputField
duration
-
(Integer) The duration of the execution of the state in milliseconds
SSE Data Channel
{ "processor": <Processor ID>, "output": [ { "name": "testFieldName", "data": "some " }, { "name": "testFieldName", "data": "chunked d" }, { "name": "testFieldName", "data": "ata" }, { "name": "testFieldName", "data": "." } ], "duration": { "execution": 5, "stream": 20 } }
output
-
(Array of Objects) The chunks stored in the SSE Data Channel, using the configured
outputField
asname
Details
{ "output": [ { "name": "testFieldName", "data": "some " }, { "name": "testFieldName", "data": "chunked d" }, { "name": "testFieldName", "data": "ata" }, { "name": "testFieldName", "data": "." }, ... ], ... }
name
-
(String) The configured
outputField
for the processor in the state data
-
(String) The chunk of data
Parallel Endpoint State
{ "state": "myParallelEndpointState", "result": { "tagA": { ... }, ... } }
result
-
(Object) The tag of each configured endpoint and their corresponding response
Details
{ "state": "myParallelEndpointState", "result": { "tagA": { "status": 200, "headers": { ... }, "body": { ... } } } }
statusCode
-
(Integer) The HTTP Status Code of the HTTP Response
headers
-
(Object) The headers of the HTTP Response. The value of each header can be either a single String, or an Array of Strings
Details
{ "httpResponse": { "headers": { "header-a": "value-a", "header-b": [ "value-b-1", "value-b-2" ] }, ... } }
body
-
(Object) The body of the HTTP Response
Switch State
{ "state": "mySwitchState", "matchType": "MATCH", "condition": { ... } }
matchType
-
(String) The type of match after executing the state. One of
MATCH
for a condition that matched,DEFAULT
for the default option orNONE
of no condition matched and no default option is configured condition
-
(Object) The configured DSL Filter that matched during the state execution
Response State
{ "state": "myResponseState", "result": { ... } }
result
-
(Object) The HTTP Response as configured in the state
Details
{ "state": "myResponseState", "result": { "status": 200, "headers": { ... }, "body": { ... } } }
statusCode
-
(Integer) The HTTP Status Code of the HTTP Response
headers
-
(Object) The headers of the HTTP Response. The value of each header can be either a single String, or an Array of Strings
Details
{ "httpResponse": { "headers": { "header-a": "value-a", "header-b": [ "value-b-1", "value-b-2" ] }, ... } }
body
-
(Object) The body of the HTTP Response
Error State
{ "state": "myErrorState", "result": { ... } }
result
-
(Object) The HTTP Response as configured in the state
Details
{ "state": "myErrorState", "result": { "status": 200, "headers": { ... }, "body": { ... } } }
statusCode
-
(Integer) The HTTP Status Code of the HTTP Response
headers
-
(Object) The headers of the HTTP Response. The value of each header can be either a single String, or an Array of Strings
Details
{ "httpResponse": { "headers": { "header-a": "value-a", "header-b": [ "value-b-1", "value-b-2" ] }, ... } }
body
-
(Object) The body of the HTTP Response
Note
|
The debug request is exactly the same as the one sent to the |
Note
|
The debug response contains the |
Components
Amazon Bedrock
Send requests to Amazon Bedrock. Supports multiple actions for different endpoints of the service.
Processor Action: invoke-model
Processor that invokes the specified Amazon Bedrock model to run inference using the prompt and inference parameters provided in the configuration.
{
"type": "amazon-bedrock",
"name": "My Amazon Bedrock Processor Action",
"config": {
"action": "invoke-model",
...
},
"server": <Amazon Bedrock Server>,
...
}
model
-
(Required, String) The model to invoke
request
-
(Required, Object) The body of the request
stream
-
(Optional, Boolean) Whether to enable streaming. Defaults to
false
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"amazonBedrock": {
...
}
}
Elasticsearch
Uses the Elasticsearch integration to send requests to the Elasticsearch API. Support multiple actions for common operations such as search, but also provides a mechanism to send raw Elasticsearch queries.
Action: autocomplete
Processor that executes a completion suggester query.
{
"type": "elasticsearch",
"name": "My Elasticsearch Processor Action",
"config": {
"action": "autocomplete",
"index": "my-index",
"text": "#{ data('my/query') }",
"field": "content",
...
},
"server": <Elasticsearch Server>,
...
}
index
-
(Required, String) The index where to search
text
-
(Required, String) The text to search
field
-
(Required, String) The field where to search
skipDuplicates
-
(Optional, Boolean) Whether to skip duplicate suggestions
size
-
(Optional, Integer) The amount of suggestions
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"elasticsearch": {
...
}
}
Action: knn
Processor that executes a k-nearest neighbor (kNN) query using approximate kNN.
{
"type": "elasticsearch",
"name": "My Elasticsearch Processor Action",
"config": {
"action": "knn",
"index": "my-index",
"field": "content",
"maxResults": 5,
"vector": "#{ data('my/vector') }",
"k": 5,
"candidatesPerShard": 20,
...
},
"server": <Elasticsearch Server>,
...
}
index
-
(Required, String) The index where to search
field
-
(Required, String) The field where to search
maxResults
-
(Required, Integer) The maximum number of results
vector
-
(Required, Array of Float) The source vector to compare
k
-
(Required, Long) The number of nearest neighbors
candidatesPerShard
-
(Required, Long) The number of nearest neighbors considered per shard
query
-
(Optional, Object) The query to filter in addition to the kNN search
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"elasticsearch": {
...
}
}
Action: native
Processor that executes a native Elasticsearch query.
{
"type": "elasticsearch",
"name": "My Elasticsearch Processor Action",
"config": {
"action": "native",
"path": "/my-index/_doc/1",
"method": "GET",
...
},
"server": <Elasticsearch Server>,
...
}
path
-
(Required, String) The endpoint of the request, excluding schema, host, port and any path included as part of the connection
method
-
(Required, String) The HTTP method for the request
queryParams
-
(Optional, Map of String/String) The map of query parameters for the URL
body
-
(Optional, Object) The JSON body to submit
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"elasticsearch": {
...
}
}
Action: search
Processor that executes a match query on the index.
{
"type": "elasticsearch",
"name": "My Elasticsearch Processor Action",
"config": {
"action": "search",
"index": "my-index",
"text": "#{ data('my/query') }",
"field": "content",
...
},
"server": <Elasticsearch Server>
...
}
index
-
(Required, String) The index where to search
text
-
(Required, String) The text to search
field
-
(Required, String) The field where to search
suggest
-
(Optional, Object) The suggester to apply
aggregations
-
(Optional, Map of String/Object) The field with the aggregations to apply
filter
-
(Optional, DSL Filter) The filters to apply
highlight
-
(Optional, Object) The highlighter to apply
pageable
-
(Optional, Pagination) The pagination parameters
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"elasticsearch": {
...
}
}
Action: store
Processor that executes a store request to Elasticsearch.
{
"type": "elasticsearch",
"name": "My Elasticsearch Processor Action",
"config": {
"action": "store",
"index": "my-index",
"document": {
...
},
...
},
"server": <Elasticsearch Server>,
...
}
index
-
(Required, String) The index where to store the document
document
-
(Required, Object) The document to be stored
id
-
(Optional, String) The ID of the document to be stored. If not provided, it will be autogenerated
allowOverride
-
(Optional, Boolean) Whether the document can be overridden or not. Defaults to
false
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"elasticsearch": {
...
}
}
Action: vector
Processor that executes a script score query using exact kNN.
{
"type": "elasticsearch",
"name": "My Elasticsearch Processor Action",
"config": {
"action": "vector",
"index": "my-index",
"field": "my_vector_field",
"vector": "#{ data('my/vector') }",
"minScore": 0.92,
"maxResults": 5,
"query": {
...
},
...
},
"server": <Elasticsearch Server>,
...
}
index
-
(Required, String) The index where to search
field
-
(Required, String) The field with the vector
vector
-
(Required, Array of Float) The source vector to compare
minScore
-
(Required, Double) The minimum score for results
maxResults
-
(Required, Integer) The maximum number of results
query
-
(Optional, Object) The query to apply together with the vector search
function
-
(Optional, String) The type of function to use. One of
cosineSimilarity
,dotProduct
,l1norm
orl2norm
. Defaults tocosineSimilarity
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"elasticsearch": {
...
}
}
Hugging Face
Uses the Hugging Face integration to send requests to the Inference API. Supports multiple actions for different endpoints of the service.
Action: summarization
Processor that summarizes a single or multiple texts organized by an autogenerated id. See Summarization Task
{
"type": "hugging-face",
"name": "My Summarization Action",
"config": {
"action": "summarization",
"model": "Falconsai/text_summarization",
"input": "#{ data(\"/httpRequest/body/input\") }",
...
},
"server": <Hugging Face Server>,
...
}
model
-
(Required, String) The model to use for the request
input
-
(Required, Array of Strings) The list of texts to summarize
parameters
-
(Optional, Object) The parameters for the request
Details
minLength
-
(Optional, Integer) The minimum length of the output tokens
maxLength
-
(Optional, Integer) The maximum length of the output tokens
topK
-
(Optional, Integer) The top tokens to consider to create new text
topP
-
(Optional, Float) Defines the tokens that are within the sample operation for the query
temperature
-
(Optional, Float) The temperature of the sampling operation. Defaults to
1.0
repetitionPenalty
-
(Optional, Float) The repetition penalty for the request.
maxTime
-
(Optional, Float) The maximum time that the request should take
options
-
(Optional, Object) The request options
Details
useCache
-
(Optional, Boolean) Whether to cache the results. Useful with deterministic models. Defaults to
true
waitForModel
-
(Optional, Boolean) Whether to wait until the model is ready or not. If
false
the response will be503 - Service Unavailable
Note
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
For a single text input:
{
"huggingFace": "Your summarized text"
}
For multiple text inputs:
{
"huggingFace": [
"Your summarized text for your first input",
"Your summarized text for your second input",
...
]
}
Note
|
Please note that the order of the responses corresponds to the order of the inputs. |
Action: text-generation
Processor that continues the text from a prompt. See Text Generation Task
{
"type": "hugging-face",
"name": "My Text Generation Action",
"config": {
"action": "text-geneartion",
"model": "gpt2-large",
"input": "#{ data(\"/httpRequest/body/input\") }",
...
},
"server": <Hugging Face Server>,
...
}
model
-
(Required, String) The model to use for the request
input
-
(Required, Strings) The prompt from which to generate the response
parameters
-
(Optional, Object) The parameters for the request
Details
maxNewTokens
-
(Optional, Integer) The number of tokens to be generated
returnFullText
-
(Optional, Boolean) Whether to include the input text within the answer or not. Defaults to
true
numReturnSequences
-
(Optional, Integer) The number of proposition to be returned
doSample
-
(Optional, Boolean) Whether to use sampling or not. Use greedy decoding otherwise. Defaults to
true
topK
-
(Optional, Integer) The top tokens to consider to create new text
topP
-
(Optional, Float) Defines the tokens that are within the sample operation for the query
temperature
-
(Optional, Float) The temperature of the sampling operation. Defaults to
1.0
repetitionPenalty
-
(Optional, Float) The repetition penalty for the request.
maxTime
-
(Optional, Float) The maximum time that the request should take
options
-
(Optional, Object) The request options
Details
useCache
-
(Optional, Boolean) Whether to cache the results. Useful with deterministic models. Defaults to
true
waitForModel
-
(Optional, Boolean) Whether to wait until the model is ready or not. If
false
the response will be503 - Service Unavailable
Note
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
{
"huggingFace": "My autogenerated text"
}
Action: feature-extraction
Processor that extracts a matrix of numerical features from a single or multiple texts organized by an autogenerated id. Seed Feature Extraction Task
{
"type": "hugging-face",
"name": "My Feature Extraction Action",
"config": {
"action": "feature-extraction",
"model": "facebook/bart-base",
"input": "#{ data(\"/httpRequest/body/input\") }",
...
},
"server": <Hugging Face Server>,
...
}
model
-
(Required, String) The model to use for the request
input
-
(Required, Array of String) The list of texts to extract the numerical features
options
-
(Optional, Object) The request options
Details
useCache
-
(Optional, Boolean) Whether to cache the results. Useful with deterministic models. Defaults to
true
waitForModel
-
(Optional, Boolean) Whether to wait until the model is ready or not. If
false
the response will be503 - Service Unavailable
Note
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
For a single text input:
{
"huggingFace":
[[ 2.2187119 , 2.7539337 , 1.0330348 , ... ],
[ -0.2937546 , 0.29999846 , -1.7008113 , ... ],
[ 0.09872855 , 0.53532976 , 0.7232368 , ... ]]
}
For a multiple text inputs:
{
"huggingFace":
[
[
[ 2.2187119 , 2.7539337 , 1.0330348 , ... ],
[ -0.2937546 , 0.29999846 , -1.7008113 , ... ],
...
],
[
[ 2.821799 , 2.7055995 , 1.1408421 , ... ],
[ 1.4287674 , 0.39487326 , -3.7841866 , ... ],
...
]
]
}
Note
|
Please note that the order of the responses corresponds to the order of the inputs. |
Action: fill-mask
Processor that replaces a missing word in a sentence with multiple fitting possibilities. The name of the [MASK] token to be replaced is defined by the chosen model. See Fill Mask Task
{
"type": "hugging-face",
"name": "My Fill Mask Action",
"config": {
"action": "fill-mask",
"model": "distilroberta-base",
"input": "#{ data(\"/httpRequest/body/input\") }",
...
},
"server": <Hugging Face Server>,
...
}
model
-
(Required, String) The model to use for the request
input
-
(Required, Array of String) The list of texts to fill their masks
options
-
(Optional, Object) The request options
Details
useCache
-
(Optional, Boolean) Whether to cache the results. Useful with deterministic models. Defaults to
true
waitForModel
-
(Optional, Boolean) Whether to wait until the model is ready or not. If
false
the response will be503 - Service Unavailable
Note
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
For a single text input:
{
"huggingFace": [
{
"sequence": "Paris is the capital of france",
"score": 0.2705707,
"token": 812,
"tokenStr": " capital"
},
...
]
}
For multiple text inputs:
{
"huggingFace": [
[
{
"sequence": "Paris is the capital of france",
"score": 0.2705707,
"token": 812,
"tokenStr": " capital"
},
...
],
[
{
"sequence": "The Eiffle tower is one of the main tourist spots in Paris.",
"score": 0.9013709,
"token": 8376,
"tokenStr": " tourist"
},
...
],
...
]
}
Action: text-clasification
Processor that classifies a text into a group of labels, it provides a score for each label. These labels are determined by the model that is used. See Text Clasification Task
{
"type": "hugging-face",
"name": "My Text Clasification Action",
"config": {
"action": "text-classification",
"model": "distilbert-base-uncased-finetuned-sst-2-english",
"input": "#{ data(\"/httpRequest/body/input\") }",
...
},
"server": <Hugging Face Server>,
...
}
model
-
(Required, String) The model to use for the request
input
-
(Required, Array of String) The list of texts to classify
options
-
(Optional, Object) The request options
Details
useCache
-
(Optional, Boolean) Whether to cache the results. Useful with deterministic models. Defaults to
true
waitForModel
-
(Optional, Boolean) Whether to wait until the model is ready or not. If
false
the response will be503 - Service Unavailable
Note
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
For a single text input:
{
"huggingFace": [
{"label": "LabelA", "score": 0.9998608827590942},
...
]
}
For multiple text inputs:
{
"huggingFace": [
[
{
"label": "LabelA",
"score": 0.9998608827590942
},
...
],
[
{
"label": "LabelC",
"score": 0.9968926310539246
},
...
],
...
]
}
Note
|
Please note that the order of the responses corresponds to the order of the inputs. |
Action: zero-shot-classification
Processor that classifies a text into a group of labels without having seen any training examples for those labels, it provides a score for each label. See Zero Shot Classification Task
{
"type": "hugging-face",
"name": "My Zero Shot Classification Action",
"config": {
"action": "zero-shot-classification",
"model": "facebook/bart-large-mnli",
"input": "#{ data(\"/httpRequest/body/input\") }",
...
},
"server": <Hugging Face Server>,
...
}
model
-
(Required, String) The model to use for the request
input
-
(Required, Array of String) The list of texts to classify
parameters
-
(Optional, Object) The parameters for the request
Details
candidateLabels
-
(Required, Array of String) The list of possible labels to classify the input
multiLabel
-
(Optional, Boolean) Whether classes can overlap or not. Defaults to
false
options
-
(Optional, Object) The request options
Details
useCache
-
(Optional, Boolean) Whether to cache the results. Useful with deterministic models. Defaults to
true
waitForModel
-
(Optional, Boolean) Whether to wait until the model is ready or not. If
false
the response will be503 - Service Unavailable
Note
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
For a single text input:
{
"huggingFace": [
{
"label": "labelA",
"score": 0.9998608827590942
},
...
]
}
For a multi text inputs:
{
"huggingFace": [
[
{
"label": "labelA",
"score": 0.9998608827590942
},
...
],
[
{
"label": "labelA",
"score": 0.9968926310539246
},
...
],
...
]
}
Note
|
Please note that the order of the responses corresponds to the order of the inputs. |
Action: token-classification
Processor that assigns a label to the tokens from a single or multiple texts organized by an autogenerated id. See Token Classification Task
{
"type": "hugging-face",
"name": "My Token Classification Action",
"config": {
"action": "token-classification",
"model": "dslim/bert-base-NER",
"input": "#{ data(\"/httpRequest/body/input\") }",
...
},
"server": <Hugging Face Server>,
...
}
model
-
(Required, String) The model to use for the request
input
-
(Required, Array of String) The list of texts to classify their tokens
parameters
-
(Optional, Object) The parameters for the request
Details
aggregationStrategy
-
(Optional, String) The aggregation strategy to use in the request
Details
NONE
: Every token gets classified without further aggregationSIMPLE
: Entities are grouped according to the default schema (B-, I- tags get merged when the tag is similar)FIRST
: Same as theSIMPLE
strategy except words cannot end up with different tags. Words will use the tag of the first token when there is ambiguityAVERAGE
: Same as theSIMPLE
strategy except words cannot end up with different tags. Scores are averaged across tokens and then the maximum label is appliedMAX
: Same as the SIMPLE strategy except words cannot end up with different tags. Word entity will be the token with the maximum score
options
-
(Optional, Object) The request options
Details
useCache
-
(Optional, Boolean) Whether to cache the results. Useful with deterministic models. Defaults to
true
waitForModel
-
(Optional, Boolean) Whether to wait until the model is ready or not. If
false
the response will be503 - Service Unavailable
Note
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
For a single text input:
{
"huggingFace": [
{
"score": 0.9990085,
"word": "Omar",
"start": 11,
"end": 15,
"entityGroup": "PER"
},
...
]
}
For a multi text inputs:
{
"huggingFace": [
[
{
"score": 0.9990085,
"word": "Omar",
"start": 11,
"end": 15,
"entityGroup": "PER"
},
...
],
[
{
"score": 0.9949533,
"word": "George Washington",
"start": 0,
"end": 17,
"entityGroup": "PER"
},
...
],
....
]
}
Note
|
Please note that the order of the responses corresponds to the order of the inputs. |
Action: question-answering
Processor that answers a question based on given contexts. See Question Answering Task
{
"type": "hugging-face",
"name": "My Question Answering Action",
"config": {
"action": "question-answering",
"model": "deepset/roberta-base-squad2",
"input": "#{ data(\"/httpRequest/body/input\") }",
...
},
"server": <Hugging Face Server>,
...
}
model
-
(Required, String) The model to use for the request
input
-
(Required, Object) The input for the request
Details
question
-
(Required, String) The question to anwser by the model
context
-
(Required, Array of String) The list of context to use to answer the question
minScore
-
(Optional, Float) The min score for each answer
options
-
(Optional, Object) The request options
Details
useCache
-
(Optional, Boolean) Whether to cache the results. Useful with deterministic models. Defaults to
true
waitForModel
-
(Optional, Boolean) Whether to wait until the model is ready or not. If
false
the response will be503 - Service Unavailable
Note
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
{
"huggingFace": [
{
"answer": "Clara",
"score": 0.8979613184928894,
"start": 11,
"end": 16
},
{
"answer": "Los Angeles",
"score": 0.013939359225332737,
"start": 20,
"end": 31
},
...
]
}
Note
|
Please note that the responses are sorted in descending order according to their score value. |
Language Detector
The language Detector component uses Lingua to identify the language from a specified text input. The languages are referenced using ISO-639-1 (alpha-2 code).
Note
|
Each time a language model is referenced, it will be loaded in memory. Loading too many languages increases the risk of high memory consumption issues. |
Processor Action: process
Processor that detects the language of a provided text.
{
"type": "language-detector",
"name": "My Language Detector Processor Action",
"config": {
"action": "process",
...
}
}
text
-
(Required, String) The text to be evaluated.
defaultLanguage
-
(Optional, String) Default language to select in case no other is detected. Defaults to
"en"
. minDistance
-
(Optional, Double) Distance between the input and the language model. Defaults to
0.0
. supportedLanguages
-
(Optional, Array of Strings) List of languages supported by the detector. Defaults to
[ "en" ]
.Details
{ "supportedLanguages": [ "en", "pt" ], ... }
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"language": {
...
}
}
Logger
Logs any type of key/value entry assigned to the message
configuration in the given level.
Note
|
The log component always returns an empty output in the data of the execution. The result of the component will only be available in the log file of the QueryFlow API. |
Processor Action: process
Processor that logs any key/value entry provided.
{
"type": "logger",
"name": "My Logger Processor Action",
"config": {
"action": "process",
...
}
}
message
-
(Required, String) Message to log.
level
-
(Optional, String) Logging level. One of:
INFO
,DEBUG
orERROR
. Defaults toINFO
. loggerName
-
(Optional, String) Name of the logger.
MongoDB
Uses the MongoDB integration to send requests to the MongoDB server.
Action: aggregate
Processor that runs a configured aggregation pipeline on a MongoDB database.
{
"type": "mongo",
"name": "My MongoDB Processor Action",
"config": {
"action": "aggregate",
"database": "my-database",
"collection": "my-collection",
"stages": [
...
],
...
},
"server": <MongoDB Server>,
...
}
database
-
(Required, String) The database name
collection
-
(Required, String) The collection name
stages
-
(Required, Array of Objects) The list of MongoDB stages
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"mongo": {
...
}
}
Action: autocomplete
Processor that uses the autocomplete operator in a compound must clause, filters are applied in the filter
clause.
{
"type": "mongo",
"name": "My MongoDB Processor Action",
"config": {
"action": "autocomplete",
"database": "my-database",
"collection": "my-collection",
"index": "my-index",
"path": "my-field",
"queries": [
...
],
...
},
"server": <MongoDB Server>,
...
}
database
-
(Required, String) The database name
collection
-
(Required, String) The collection name
index
-
(Required, String) The name for the MongoDB full-text search index
path
-
(Required, String) The indexed field to search
queries
-
(Required, Array of Strings) The phrase or phrases to autocomplete
tokenOrder
-
(Optional, String) The order in which the tokens will be searched. One of
ANY
orSEQUENTIAL
filter
-
(Optional, DSL Filter) The filter to apply
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"mongo": {
...
}
}
Action: search
Processor that uses the text operator in a compound must clause, filters are applied in the filter
clause.
{
"type": "mongo",
"name": "My MongoDB Processor Action",
"config": {
"action": "search",
"database": "my-database",
"collection": "my-collection",
"index": "my-index",
"paths": [
...
],
"queries": [
...
],
...
},
"server": <MongoDB Server>,
...
}
database
-
(Required, String) The database name
collection
-
(Required, String) The collection name
index
-
(Required, String) The name for the MongoDB full-text search index
paths
-
(Required, Array of Strings) The path or paths to the fields to search
queries
-
(Required, Array of Strings) The phrase or phrases to autocomplete
pageable
-
(Optional, Object) The pagination object
Details
{ "page": 0, "size": 25, "sort": [ ... ] }
page
-
(Integer) The page number
size
-
(Integer) The size of the page
sort
-
(Array of Objects) The sort definition for the page
Details
{ "property" : "fieldA", "direction" : "ASC" }
property
-
(String) The property where the sort was applied
direction
-
(String) The direction where the sort was applied. Either
ASC
orDESC
filter
-
(Optional, DSL Filter) The filter to apply
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"mongo": {
...
}
}
Action: vector
{
"type": "mongo",
"name": "My MongoDB Processor Action",
"config": {
"action": "vector",
"database": "my-database",
"collection": "my-collection",
"index": "my-index",
"vector": "#{ data('my/vector') }",
"path": "my-field",
"k": 5,
"minScore": 0.92,
...
},
"server": <MongoDB Server>,
...
}
database
-
(Required, String) The database name
collection
-
(Required, String) The collection name
index
-
(Required, String) The name for the MongoDB full-text search index
vector
-
(Required, Array of Float) The kNN vector
path
-
(Required, String) The path to search for the vector in the documents
k
-
(Required, Integer) The number of the nearest neighbors to return
minScore
-
(Required, Double) The minimum score for results
filter
-
(Optional, Object) The search operator filter to apply in the query
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"mongo": {
...
}
}
Neo4j
Executes a read query to a Neo4j server to gather search results from it.
Processor Action: process
Processor that executes a query to a Neo4j server.
{
"type": "neo4j",
"name": "My Neo4j Processor Action",
"config": {
"action": "process",
...
},
"server": <Neo4j Server>,
...
}
database
-
(Required, String) The Neo4j database to query
query
-
(Required, String) The query to be executed
parameters
-
(Required, Map of String/Object) Parameters to be used in the query
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"neo4j": {
...
}
}
OpenAI
Uses the OpenAI integration to send requests to OpenAI. Supports multiple actions for different endpoint of the service.
Action: chat-completion
Processor that executes a chat completion request to OpenAI API.
{
"type": "openai",
"name": "My Chat Completion Action",
"config": {
"action": "chat-completion",
"model": "gpt-4",
...
},
"server": <OpenAI Server>,
...
}
model
-
(Required, String) The OpenIA model to use
user
-
(Required, String) The unique identifier representing the end-user
messages
-
(Required, Array of Objects) The list of messages for the request
Details
[ {"role": "system", "content": "You are a helpful assistant" }, {"role": "user", "content": "Hi!" }, {"role": "assistant", "content": "Hi, how can assist you today?" }, ]
role
-
(Required, String) The role of the message. Must be one of
system
,user
orassistant
content
-
(Required, String) Then content of the message
name
-
(Optional, String) The name of the author of the message
frequencyPenalty
-
(Optional, Double) Positive values penalize new tokens based on their existing frequency in the text so far. Value must be between
-2.0
and2
. Defaults to0.0
presencePenalty
-
(Optional, Double) Positive values penalize new tokens based on whether they appear in the text so far. Value must be between
-2.0
and2
. Defaults to0.0
temperature
-
(Optional, Double) Sampling temperature to use. Value must be between
0
and2
. Defaults to1
Note
|
Is generally recommend altering this or topP but not both. |
topP
-
(Optional, Double) An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass. Defaults to
1
Note
|
Is generally recommend altering this or temperature but not both. |
n
-
(Optional, Integer) How many chat completion choices to generate for each input message. Defaults to
1
maxTokens
-
(Optional, Integer) The maximum number of tokens to generate in the chat completion. Defaults to
2048
stop
-
(Optional, Array of String) Up to 4 sequences where the API will stop generating further tokens
stream
-
(Optional, Boolean) Whether enable streaming or not. Defaults to
false
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
For non-streaming reponse:
{
"openai": {
"created": <Timestamp>,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The response from the model"
},
"finishReason": "stop"
}
],
"model": "gpt-4-0613",
"usage": {
"promptTokens": 34,
"completionTokens": 95,
"totalTokens": 129
}
}
}
For streaming response:
[
{
"name": "openai",
"data": "The"
},
{
"name": "openai",
"data": " reponse"
},
{
"name": "openai",
"data": " from"
},
{
"name": "openai",
"data": " the"
},
{
"name": "openai",
"data": " model"
},
...
]
Action: embeddings
Processor executes an embeddings request to OpenAI API.
{
"type": "openai",
"name": "My OpenAI Embeddings Action",
"config": {
"action": "embeddings",
"model": "text-embedding-ada-002",
...
}
}
model
-
(Required, String) The OpenIA model to use
user
-
(Required, String) The unique identifier representing the end-user
input
-
(Required, Array of Strings) The list of input texts to be processed
The response of the action is stored in the JSON Data Channel as returned by the invoked endpoint:
{
"openai": {
"embeddings": [
{
"embedding": [ ... ],
"index": 0
},
{
"embedding": [ ... ],
"index": 1
}
],
"model": "text-embedding-ada-002-v2",
"usage": {
"promptTokens": 4,
"totalTokens": 4
}
}
}
Opensearch
Uses the Opensearch integration to send requests to the Opensearch API. It supports multiple actions for common operations such as search, but also provides a mechanism to send raw OpenSearch queries.
Action: autocomplete
Processor that executes a completion suggester query.
{
"type": "opensearch",
"name": "My Opensearch Processor Action",
"config": {
"action": "autocomplete",
"index": "my-index",
"text": "#{ data('my/query') }",
"field": "content",
...
},
"server": <Opensearch Server>,
...
}
index
-
(Required, String) The index where to search
text
-
(Required, String) The text to autocomplete
field
-
(Required, String) The field where to search
skipDuplicates
-
(Optional, Boolean) Whether to skip duplicate suggestions
size
-
(Optional, Integer) The amount of suggestions
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"opensearch": {
...
}
}
Action: fetch
Processor that executes a GET request to retrieve a specified JSON document from an index.
{
"type": "opensearch",
"name": "My Opensearch Processor Action",
"config": {
"action": "fetch",
"index": "my-index",
"id": "document-ID",
...
},
"server": <Opensearch Server>,
...
}
index
-
(Required, String) The index where to search
id
-
(Required, String) The ID of the document
fields
-
(Optional, Projection) The source fields to be included or excluded
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"opensearch": {
...
}
}
Action: knn
Processor that executes an Approximate k-NN query.
{
"type": "opensearch",
"name": "My Opensearch Processor Action",
"config": {
"action": "knn",
"index": "my-index",
"field": "vector-field",
"vector": "#{ data('my/vector') }",
"minScore": 0.92,
"maxResults": 5,
"k": 5,
...
},
"server": <Opensearch Server>,
...
}
index
-
(Required, String) The index where to search
field
-
(Required, String) The field with the vector
vector
-
(Required, Array of Float) The source vector to compare
minScore
-
(Required, Double) The minimum score for results
maxResults
-
(Required, Integer) The maximum number of results
k
-
(Required, Integer) The number of neighbours the search of each graph will return
query
-
(Optional, Object) The query to filter in addition to the kNN search
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"opensearch": {
...
}
}
Action: native
Processor that executes a native Opensearch query.
{
"type": "opensearch",
"name": "My Opensearch Processor Action",
"config": {
"action": "native",
"path": "/my-index/_doc/1",
"method": "GET",
...
},
"server": <Opensearch Server>,
...
}
path
-
(Required, String) The endpoint of the request, excluding schema, host, port and any path included as part of the connection
method
-
(Required, String) The HTTP method for the request
queryParams
-
(Optional, Map of String/String) The map of query parameters for the URL
body
-
(Optional, Object) The JSON body to submit
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"opensearch": {
...
}
}
Action: search
Processor that executes a match query on the index.
{
"type": "opensearch",
"name": "My Opensearch Processor Action",
"config": {
"action": "search",
"index": "my-index",
"text": "#{ data('my/query') }",
"field": "content",
...
},
"server": <Opensearch Server>
...
}
index
-
(Required, String) The index where to search
text
-
(Required, String) The text to search
field
-
(Required, String) The field where to search
suggest
-
(Optional, Object) The suggester to apply
aggregations
-
(Optional, Map of String/Object) The field with the aggregations to apply
filter
-
(Optional, DSL Filter) The filters to apply
highlight
-
(Optional, Object) The highlighter to apply
pageable
-
(Optional, Pagination) The pagination parameters
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"opensearch": {
...
}
}
Action: store
Processor that stores or updates documents in the given index of Opensearch.
{
"type": "opensearch",
"name": "My Opensearch Processor Action",
"config": {
"action": "store",
"index": "my-index",
"id": "document-id",
"document": {
...
},
...
},
"server": <Opensearch Server>,
...
}
index
-
(Required, String) The index where to store the document
id
-
(Required, String) The ID of the document to be stored.
document
-
(Required, Object) The document to be stored
allowOverride
-
(Optional, Boolean) Whether the document can be overridden or not. Defaults to
false
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"opensearch": {
...
}
}
Action: vector
Processor that executes an Exact kNN with scoring script query.
{
"type": "opensearch",
"name": "My Opensearch Processor Action",
"config": {
"action": "vector",
"index": "my-index",
"field": "my_vector_field",
"vector": "#{ data('my/vector') }",
"minScore": 0.92,
"maxResults": 5,
"query": {
...
},
...
},
"server": <Opensearch Server>,
...
}
index
-
(Required, String) The index where to search
field
-
(Required, String) The field with the vector
vector
-
(Required, Array of Float) The source vector to compare
minScore
-
(Required, Double) The minimum score for results
maxResults
-
(Required, Integer) The maximum number of results
function
-
(Required, String) The function used for the k-NN calculation. The available functions can be found here
query
-
(Optional, Object) The query to apply together with the vector search
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"opensearch": {
...
}
}
Question Detector
The Question Detector component validates if an input text contains a question. It uses languages codes that are referenced using ISO-639-1 (alpha-2 code).
Processor Action: process
Processor that detects if the probided text is a question.
{
"type": "question-detector",
"name": "My Question Detector Processor Action",
"config": {
"action": "process",
...
}
}
text
-
(Required, String) The text to be evaluated.
language
-
(Required, String) The language to use.
questionPrefixes
-
(Optional, Map of String/List) Words that indicate a question. Defaults to
{ "en": [ "what", "who", "why", "where", "when", "how" ] }
Details
{ "questionPrefixes": { "es": [ "que", "quien", "porque", "donde", "cuando", "como" ] }, ... }
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"isQuestion": {
...
}
}
Script
Uses the Script Engine to execute a script for advanced handling of the execution data. Supports multiple scripting languages and provides tools for JSON manipulation and for logging.
Processor Action: process
Processor that executes a script to process and interact with data produced in previous states.
{
"type": "script",
"name": "My Script Processor Action",
"config": {
"action": "process",
"script": <Script>,
...
}
}
language
-
(Optional, String) The language of the script. One of the supported script languages. Defaults to
groovy
script
-
(Required, String) The script to run
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"script": {
...
}
}
Facet Snap
Tries to snap facet values based on a list of tokens extracted from the user query. These facet snaps are returned as a Filter (see Filters DSL) that can be later used as clauses on the query sent to the search engine
Action: filter
Processor that creates a filter based on the facet values snapped using the query tokens provided as input.
{
"type": "snap",
"name": "My Snap Filter Action",
"config": {
"action": "filter",
"query": "#{ data(\"/httpRequest/queryParams/q\") }",
"tokens": "#{ data(\"/tokens\") }",
...
},
...
}
tokens
-
(Required, Array of Strings) The list of tokens to snap to
query
-
(Required, String) The search query to use
facetStore
-
(Required, String) The Discovery Staging bucket to get the facets from
Details
The facets stored on the bucket are expected to have the following format:
{ "name": "name", "value": "value", "properties": {} }
name
-
(Required, String) The name of the facet
value
-
(Required, String) The value of the facet
properties
-
(Optional, Object) The facet properties. Useful to store additional information for the facet.
snapMode
-
(Optional, String) The mode to compare facets when snapping
Details
QUERY
: The facets will be matched against the input query textTOKENS
: The facets will be matched against the input tokens, separated by whitespace. This is useful if you are applying any processing to the tokens includeFacets
-
(Optional, Array of Strings) A list of facets to include when snapping
excludeFacets
-
(Optional, Array of Strings) A list of facets to ignore when snapping
matchAllFacets
-
(Optional, Boolean) If
true
, the returned Filter will match all facet fields usingand
. Iffalse
, the returned Filter will match any facet field usingor
. Defaults tofalse
greedyMatch
-
(Optional, Boolean) If
true
, snap to the biggest possible facet for each token only, preventing any overlap between matches. Iffalse
, snap to every possible facet for each token, allowing overlapped matches. Defaults tofalse
maxDisambiguateOffset
-
(Optional, Integer) The maximum offset size to check when disambiguating. If
-1
checks all tokens available on both sides. Defaults to-1
Tip
|
Input tokens for this action can be retrieved using the Tokenizer component. |
Tip
|
For faster query responses from the facet store, create indices for both name and value fields.
|
The response of the action is stored in the JSON Data Channel and besides outputting the filter, the Snap Filter action also provides the snapped facet objects and query ngrams that matched them, for later use as input on other actions.
{
"snap": {
"snappedFacets": [
{
"facet": { "name": "brand", "value": "nike", "properties": { "code": 123 } },
"ngram": {
"value": "nike",
"offset": { "start": 7, "end": 11 },
"tokens": [
{ "term": "nike", "offset": { "start": 7, "end": 11 } }
]
}
},
{
"facet": { "name": "size", "value": "7" },
"ngram": {
"value": "7",
"offset": { "start": 5, "end": 6 },
"tokens": [
{ "term": "size", "offset": { "start": 0, "end": 4 } },
{ "term": "7", "offset": { "start": 5, "end": 6 } }
]
}
}
],
"filter": {
"or": [
{ "in": { "field": "size", "values": [ "7" ] } },
{ "in": { "field": "brand", "values": [ "nike" ] } }
]
}
}
}
Note
|
The resulting snapped facets are ordered by ngram size, descending. If two ngrams have the same number of tokens they are ordered by appearance. |
Action: mask
Processor that creates a masked query based on the snap results of the Snap Filter Action. It replaces facet matches (both name and value) with a given map of facet masks in the input query.
{
"type": "snap",
"name": "My Snap Mask Filter Action",
"config": {
"action": "mask",
"query": "#{ data(\"/httpRequest/queryParams/q\") }",
"tokens": "#{ data(\"/tokens\") }",
"snappedFacets": "#{ data(\"/snap/snappedFacets\") }",
...
},
...
}
Note
|
Please note how the snappedFacets is reading from the output of a previously executed Snap Filter Action.
|
tokens
-
(Required, Array of Strings) The list of tokens to snap to
query
-
(Required, String) The search query to use
snappedFacets
-
(Required, List of Objects) The list of facets that matched a ngram value
Details
[ { "facet": { "name": "name", "value": "value", "properties": {} }, "ngram": { "value": "value", "offset": { "start": 0, "end": 7 }, "tokens": [ { "term": "term", "offset": { "start": 0, "end": 7 } }, ... ] } } ]
facet
-
(Required, Object) The snapped facet value
Details
name
-
(Required, String) The name of the facet
value
-
(Required, String) The value of the facet
properties
-
(Optional, Object) The facet properties. Useful to store additional information for the facet.
ngram
-
(Required, Object) The snapped ngram value
Details
value
-
(Required, String) The ngram value
offset
-
(Required, Object) The ngram query offset
Details
start
-
(Required, Integer) The offset start index
end
-
(Required, Integer) The offset end index
tokens
-
(Required, Array of Objects) The tokens that are part of the ngram
Details
term
-
(Required, String) The term for this token
offset
-
(Required, Object) The token offset
Details
start
-
(Required, Integer) The offset start index
end
-
(Required, Integer) The offset end index
entityMasks
-
(Optional, Object) Masks to apply to the given facets
Details
Each entry of the
entityMasks
object must be a pair of strings:{ "size": "[SIZE]", "brand": "[BRAND]", ... }
The response of the action is stored in the JSON Data Channel as:
{
"snap": "[SIZE] [BRAND] sneakers"
}
Action: clear
Processor that creates a simplified query based on the snap results of the Snap Filter Action. It removes facet matches (both name and value) from the input tokens and joins the remaining tokens with whitespace.
{
"type": "snap",
"name": "My Snap Clear Action",
"config": {
"action": "clear",
"tokens": "#{ data(\"/tokens\") }",
"snappedFacets": "#{ data(\"/snap/snappedFacets\") }"
}
}
Note
|
Please note how the snappedFacets is reading from the output of a previously executed Snap Filter Action.
|
tokens
-
(Required, Array of Strings) The list of tokens to snap to
snappedFacets
-
(Required, List of Objects) The list of facets that matched a ngram value
Details
[ { "facet": { "name": "name", "value": "value", "properties": {} }, "ngram": { "value": "value", "offset": { "start": 0, "end": 7 }, "tokens": [ { "term": "term", "offset": { "start": 0, "end": 7 } }, ... ] } } ]
facet
-
(Required, Object) The snapped facet value
Details
name
-
(Required, String) The name of the facet
value
-
(Required, String) The value of the facet
properties
-
(Optional, Object) The facet properties. Useful to store additional information for the facet.
ngram
-
(Required, Object) The snapped ngram value
Details
value
-
(Required, String) The ngram value
offset
-
(Required, Object) The ngram query offset
Details
start
-
(Required, Integer) The offset start index
end
-
(Required, Integer) The offset end index
tokens
-
(Required, Array of Objects) The tokens that are part of the ngram
Details
term
-
(Required, String) The term for this token
offset
-
(Required, Object) The token offset
Details
start
-
(Required, Integer) The offset start index
end
-
(Required, Integer) The offset end index
The response of the action is stored in the JSON Data Channel as:
{
"snap": "sneakers"
}
Solr
Action: native
Processor that executes a native indexing query.
{
"type": "solr",
"name": "My Solr Processor Action",
"config": {
"action": "native",
"path": "/select",
"method": "POST",
"queryParams": {
"q": "description:Pureinsights"
},
...
},
"server": <Solr Server>,
...
}
path
-
(Required, String) The Solr operation path to be used for the request
method
-
(Required, String) The HTTP method for the request
queryParams
-
(Required, Map of String/String) The map of query parameters for the request
body
-
(Optional, Object) The JSON body to submit for the request
maxResponseMapDepth
-
(Optional, Integer) The maximum depth for response object deserialization. Defaults to
5
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"solr": {
...
}
}
Action: search
Processor that executes a standard search query.
{
"type": "solr",
"name": "My Solr Processor Action",
"config": {
"action": "search",
"query": "#{ data('my/query') }",
...
},
"server": <Solr Server>,
...
}
query
-
(Required, String) The search query to be executed
fields
-
(Optional, Array of Strings) The optional returned fields of the document. If not set, all the fields in the document are returned
highlight
-
(Optional, Boolean) Whether to enable highlighting in the resulting query or not
filterQueries
-
(Optional, String) The filter queries to be applied to the search
maxResponseMapDepth
-
(Optional, Integer) The maximum depth for response object deserialization. Defaults to
5
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"solr": {
...
}
}
Staging
Interacts with buckets and content from Discovery Staging.
Action: fetch
Gets a document from the given bucket.
{
"type": "staging",
"name": "My Staging Processor Action",
"config": {
"action": "fetch",
"bucket": "my-bucket",
"id": "my-document-id",
...
}
}
bucket
-
(Required, String) The bucket name
id
-
(Required, String) The ID of the document to fetch
fields
-
(Optional, Projection) The projection to apply on the document
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"staging": {
...
}
}
Action: store
Stores a document into the given bucket.
{
"type": "staging",
"name": "My Staging Processor Action",
"config": {
"action": "store",
"bucket": "my-bucket",
"document": {
...
},
...
}
}
bucket
-
(Required, String) The bucket name
document
-
(Required, Object) The document to store
id
-
(Optional, String) The ID of the document to store. If not provided, a random UUID will be used
allowOverride
-
(Optional, Boolean) Whether allow overriding an existing document or not. Defaults to
false
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"staging": {
...
}
}
Action: search
Search for documents in the given bucket.
{
"type": "staging",
"name": "My Staging Processor Action",
"config": {
"action": "search",
"bucket": "my-bucket",
...
}
}
bucket
-
(Required, String) The bucket name
actions
-
(Required, Array of Strings) The actions to filter the documents. Defaults to
STORE
projection
-
(Optional, Projection) The projection to apply on the search
filter
-
(Optional, DSL Filter) The filter to apply on the search
parentId
-
(Optional, String) The parent ID to match
pageable
-
(Optional, Object) The pagination object
Details
{ "page": 0, "size": 25, "sort": [ ... ] }
page
-
(Integer) The page number
size
-
(Integer) The size of the page
sort
-
(Array of Objects) The sort definition for the page
Details
{ "property" : "fieldA", "direction" : "ASC" }
property
-
(String) The property where the sort was applied
direction
-
(String) The direction where the sort was applied. Either
ASC
orDESC
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"staging": {
...
}
}
Template
Uses the Template Engine to transform a standard template with contextual structured data, generating a verbalized representation of the information. It can generate various types of documents as either plain text or JSON.
Processor Action: process
Processor that processes the provided template with the defined configuration.
{
"type": "template",
"name": "My Template Processor Action",
"config": {
"action": "process",
...
}
}
template
-
(Required, String) The template to process
bindings
-
(Required, Object) The bindings to replace in the template
Details
{ "bindingA": "#{ data('/my/binding/field') }", ... }
Can be later referenced in a template:
My bindingA value is ${bindingA}
outputFormat
-
(Optional, String) The output format of the precessed template. Supported formats are:
JSON
andPLAIN
. Defaults toPLAIN
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"template": {
...
}
}
Tokenizer
Tokenizes a specified text input using Lucene. Supported analyzers:
Note
|
All analyzers (except the custom that needs the tokenizer configuration) will be used as they are built by default if no configuration is specified. Further configurations under the field |
Note
|
Currently, the custom analyzer does not support parameters that require a file name for certain filters. For example, the stop filter, which expects on an external file to specify the stop words, is not yet supported. |
Processor Action: process
Processor that tokenize any entry provided.
{
"type": "tokenizer",
"name": "My Tokenizer Processor Action",
"config": {
"action": "process",
...
}
}
analyzer
-
(Optional, String or Map of String/Object) The analyzer to use for the tokenization. Defaults to
standard
.Details
Custom analyzer configurationtokenizer
-
(Required, String or Map of String/Object) Tokenizer for the custom analyzer. Params of the tokenizer can be configured.
Details
{ "analyzer": { "type": "custom", "tokenizer": "whitespace", ... }, ... }
{ "analyzer": { "type": "custom", "tokenizer": { "type": "standard", "maxTokenLength": 4 }, ... }, ... }
filters
-
(Optional, List of Objects) List of filters to be applied. Params of the tokenfilters can be configured.
Details
{ "analyzer": { "type": "custom", "filters": [ "lowercase" ], ... }, ... }
{ "analyzer": { "type": "custom", "filters": [ "lowercase", { "type": "edgeNgram", "minGramSize": 2, "maxGramSize": 3 } ], ... }, ... }
Language analyzers configurationstopwords
-
(Optional, List of Strings or Map of String/Object) A set of common words usually not useful for search.
Details
{ "analyzer": { "type": "english", "stopwords":{ "tokens": [ "the" ], "ignoreCase": true }, ... }, ... }
{ "analyzer": { "type": "french", "stopwords": [ "va" ], ... }, ... }
stemExclusion
-
(Optional, List of Strings or Map of String/Object) A set of words to not be stemmed.
Details
{ "analyzer": { "type": "spanish", "stemExclusion":{ "tokens": [ "nunca" ], "ignoreCase": true }, ... }, ... }
{ "analyzer": { "type": "english", "stemExclusion": [ "quick" ] ... }, ... }
Standard analyzer configurationmaxTokenLength
-
(Optional, Int) The maximum token length the analyzer will emit. Defaults to
255
. stopwords
-
(Optional, List of Strings or Map of String/Object) A set of common words usually not useful for search.
Details
{ "analyzer": { "type": "standard", "stopwords":{ "tokens": [ "the" ], "ignoreCase": true }, ... }, ... }
{ "analyzer": { "type": "standard", "stopwords": [ "va" ], ... }, ... }
Whitespace analyzer configurationmaxTokenLength
-
(Optional, Int) The maximum token length the analyzer will emit. Defaults to
255
.
attributes
-
(Optional, List of strings) The attributes to include with each token. Supports
term
(adds the token itself) andoffset
(adds the relative start and end position of the token in the input text). Default["term", "term"]
Note
|
The attributes are added to the configuration as a list, all of those included will be added to the output. That list, if specified, cannot be empty. |
text
-
(Required, String) The text to tokenize.
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"tokens": {
...
}
}
Examples
{
"type": "tokenizer",
"name": "My Tokenizer Processor Action",
"config": {
"action": "process",
"text": "#{ data(\"/httpRequest/queryParams/q\") }"
}
}
{
"type": "tokenizer",
"name": "My Tokenizer Processor Action",
"config": {
"action": "process",
"analyzer": "whitespace",
"attributes": [
"term"
],
"text": "#{ data(\"/httpRequest/body/custom/field\") }"
}
}
{
"type": "tokenizer",
"name": "My Tokenizer Processor Action",
"config": {
"action": "process",
"analyzer": {
"type": "english",
"stopwords":{
"tokens": [
"the"
],
"ignoreCase": true
},
"stemExclusion": [
"quick"
]
},
"attributes": [
"term"
],
"text": "#{ data(\"/httpRequest/queryParams/q\") }"
}
}
{
"type": "tokenizer",
"name": "My Tokenizer Processor Action",
"config": {
"action": "process",
"analyzer": {
"type": "whitespace",
"maxTokenLength": 4
},
"attributes": [
"term"
],
"text": "#{ data(\"/httpRequest/queryParams/q\") }"
}
}
{
"type": "tokenizer",
"name": "My Tokenizer Processor Action",
"config": {
"action": "process",
"text": "Hi, my cat is INJURED in its paw.",
"analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filters": [
"lowercase"
]
},
"attributes": [
"term"
]
}
}
{
"type": "tokenizer",
"name": "My Tokenizer Processor Action",
"config": {
"action": "process",
"text": "Hi, my cat is INJURED in its paw.",
"analyzer": {
"type": "custom",
"tokenizer": {
"type": "standard",
"maxTokenLength": 4
},
"filters": [
"lowercase",
{
"type": "edgeNgram",
"minGramSize": 2,
"maxGramSize": 3
}
]
},
"attributes": [
"term"
]
}
}
Vespa
Action: native
Processor that executes an HTTP request to a Vespa service.
{
"type": "vespa",
"name": "My Vespa Native Action",
"config": {
"action": "native",
"method": "GET",
"path": "/state/v1/health",
...
},
"server": <Vespa Server>
}
method
-
(Required, String) The HTTP method for the request
path
-
(Required, String) The endpoint of the request, excluding schema, host, port and any path included as part of the connection
queryParams
-
(Optional, Map of String/String) The map of query parameters for the URL
body
-
(Optional, Object) The JSON body to submit
The response of the action is stored in the JSON Data Channel as returned by the invoked API:
{
"vespa": {
...
}
}
Voyage AI
Uses the Voyage AI integration to send requests to the Voyage AI API. Supports multiple actions for different endpoints of the service.
Action: reranking
Processor that given a query and many documents, returns the (ranks of) relevancy between the query and documents. See Voyage AI Rerankers and the API Rerankers endpoint.
{
"type": "voyage-ai",
"name": "My Reranking Action",
"config": {
"action": "reranking",
"model": "rerank-lite-1",
"query": "Sample query",
"documents": ["Sample document 1", "Sample document 2"],
...
},
"server": <Voyage AI Server>,
...
}
model
-
(Required, String) The model to use for the request. See models.
query
-
(Required, String) The query as a string.
documents
-
(Required, List of Strings) Documents to be reranked as a list of string.
truncation
-
(Optional, Boolean) Whether to truncate the input to satisfy the context length limit on the query and the documents. Defaults to
true
. topK
-
(Optional, Integer) The number of most relevant documents to return.
returnDocuments
-
(Optional, Boolean) Whether to return the documents in the response. Default to
false
.
Action: embeddings
Processor that given input string (or a list of strings) and other arguments such as the preferred model name, it returns a response containing a list of embeddings. See Voyage AI Embeddings and the API Text embedding models endpoint.
{
"type": "voyage-ai",
"name": "My Embeddings Action",
"config": {
"action": "embeddings",
"model": "voyage-large-2",
"input": ["Sample text 1", "Sample text 2"],
...
},
"server": <Voyage AI Server>,
...
}
model
-
(Required, String) The model to use for the request. See models.
input
-
(Required, String or List of Strings) Documents to be embedded.
truncation
-
(Optional, Boolean) Whether to truncate the input texts to fit within the context length. Defaults to
true
. inputType
-
(Optional, String) Type of the input text. One of:
QUERY
orDOCUMENT
. Defaults tonull
. outputDimension
-
(Optional, Integer) The number of dimensions for resulting output embeddings. Defaults to
null
. outputDatatype
-
(Optional, String) The data type for the embeddings to be returned. One of:
FLOAT
,INT8
,UINT8
,BINARY
orUBINARY
. Default toFLOAT
. encodingFormat
-
(Optional, String) Format in which the embeddings are encoded. One of:
base64
. Defaults tonull
.
Action: multimodal-embeddings
Processor that given an input list of multimodal inputs consisting of text, images, or an interleaving of both modalities and other arguments such as the preferred model name, it returns a response containing a list of embeddings. See Voyage AI Multimodal Embedding and the API Text multimodal embedding models endpoint.
{
"type": "voyage-ai",
"name": "My Multimodal Embeddings Action",
"config": {
"action": "multimodal-embeddings",
"model": "voyage-multimodal-3",
"inputs": [
{
"content": [
{
"type": "text",
"text": "This is a banana."
},
{
"type": "image_url",
"imageUrl": "https://myimageurl.com"
},
...
]
},
...
],
...
},
"server": <Voyage AI Server>,
...
}
model
-
(Required, String) The model to use for the request. See models.
inputs
-
(Required, List of Objects) A list of multimodal inputs to be vectorized.
Details
type
-
(Required, String) The type. One of:
text
,image_url
orimage_base64
. text
-
(Optional, String) The text if the type
text
is choosen.Details
{ "type": "text", "text": "This is a banana." }
imageUrl
-
(Optional, String) The image url if the type
image_url
is choosen.Details
{ "type": "image_url", "imageUrl": "https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg" }
imageBase64
-
(Optional, Object) The base 64 encoded image if the type
image_base64
is choosen.Details
{ "type": "image_base64", "imageBase64": { "mediaType": "image/jpeg", "base64": true, "data": "/9j/4AAQSkZJRgABAQEAYABgAAD(...)" } }
mediaType
-
(Required, String) The data media type. Supported media types are:
image/png
,image/jpeg
,image/webp
, andimage/gif
. base64
-
(Required, Boolean) Whether the data is encoded in Base64.
data
-
(Required, String) The data itself.
truncation
-
(Optional, Boolean) Whether to truncate the inputs to fit within the context length. Defaults to
true
. inputType
-
(Optional, String) Type of the input text. One of:
QUERY
orDOCUMENT
. Defaults tonull
. outputEncoding
-
(Optional, String) Format in which the embeddings are encoded. One of:
base64
. Defaults tonull
.
Labeling a Configuration
Labels API
$ curl --request POST 'core-api:8080/v2/label' --data '{ ... }'
Body
{
"key": "My Label Key",
"value": "My Label Value"
}
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
$ curl --request GET 'core-api:8080/v2/label?page={page}&size={size}&sort={sort}'
Query Parameters
page
-
(Optional, Int) The page number. Defaults to
0
. size
-
(Optional, Int) The size of the page. Defaults to
25
. sort
-
(Optional, Array of String) The sort definition for the page.
$ curl --request GET 'core-api:8080/v2/label/{id}'
Path Parameters
id
-
(Required, String) The label ID.
$ curl --request PUT 'core-api:8080/v2/label/{id}' --data '{ ... }'
Path Parameters
id
-
(Required, String) The label ID.
Body
{
"key": "My Label Key",
"value": "My Label Value"
}
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
$ curl --request DELETE 'core-api:8080/v2/label/{id}'
Path Parameters
id
-
(Required, String) The label ID.
Note
|
Both |
Labels are simple key/value pairs that can help to reference user configurations. Any configuration can be tagged with labels either previously created in here, or during the CRUD process of the entity itself. Labels are limited to 45 characters max, for both key and value.
Note
|
When creating multiple labels during the CRUD process of other entities (e.g. a server or a credential, duplicates will be ignored. |
To create a new label directly from an entity configuration, the following property must be included as part of the body payload:
{
"labels": [
{
"key": "My Label Key",
"value": "My Label Value"
},
...
],
...
}
key
-
(Required, String) The key of the label
value
-
(Required, String) The value of the label
Backup & Restore
Core Backup API
$ curl --request GET 'core-api:8080/v2/export'
$ curl --request POST 'core-api:8080/v2/import?onConflict={onConflict}' --form 'file=@/../../export-20240319T1030.zip'
Query Parameters
onConflict
-
(Optional, String) The action to execute when there is a conflict with imported entities. Defalts to
FAIL
. Supported actions are:IGNORE
,UPDATE
andPLAIN
.
Queryflow Backup API
$ curl --request GET 'queryflow-api:8088/v2/export'
$ curl --request POST 'queryflow-api:8088/v2/import?onConflict={onConflict}' --form 'file=@/../../export-20240319T1030.zip'
Query Parameters
onConflict
-
(Optional, String) The action to execute when there is a conflict with imported entities. Defalts to
FAIL
. Supported actions are:IGNORE
,UPDATE
andPLAIN
.
Ingestion Backup API
$ curl --request GET 'ingestion-api:8080/v2/export'
$ curl --request POST 'ingestion-api:8080/v2/import?onConflict={onConflict}' --form 'file=@/../../export-20240319T1030.zip'
Query Parameters
onConflict
-
(Optional, String) The action to execute when there is a conflict with imported entities. Defalts to
FAIL
. Supported actions are:IGNORE
,UPDATE
andPLAIN
.
Each product Core, Queryflow, and Ingestion has its own backup and restore API. The entity distribution is as follows:
Note
|
Labels are skipped as they will be handled during the creation of other entities. |
Note
|
Secrets are not part of this process due to security reasons. All credentials assume their referenced secret currently exists or will be created by different means. |
The backup and restore for the entities is done through a single export-{timestamp}.zip
ZIP file that contains a New Line Delimited JSON (ndjson) file per entity type. Each configuration is exported in the correct order, so it can be imported back without missing dependency
problems.
Note
|
Manual modifications of the exported file might corrupt the backup. |
Since the ID of each exported entity is expected to remain the same after importing, conflicts might arise. The restore process has 3 different resolution strategies:
-
IGNORE
: The input entity will be ignored, keeping the existing one unchanged. -
UPDATE
: The current entity will be updated with the input entity values. -
PLAIN
: The current entity will not be modified, and an error will be thrown
Appendix A: Pagination and Sorting
Any endpoint that paginates results receives the following optional query parameters:
page
-
(Optional, Integer) The page to retrieve. Defaults to
0
Note
|
If the provided value is invalid, it will be replaced by the default one |
size
-
(Optional, Integer) The size of the page. Must be an integer between 1 and 100. Defaults to
25
Note
|
If the provided value is invalid or out of range, it will be replaced by the default one |
sort
-
(Optional, String) The sorting fields, with an optional direction. Ascending by default:
sort=<string>[,(asc\|desc)]
Note
|
This parameter can be used multiple times: |
The response of a paginated request either an empty payload with a 204 - No Content
status code, or a 200 - OK
with the results page:
{
"content": [
{
...
},
...
],
"pageable": {
...
},
"totalSize": 1,
"totalPages": 1,
"numberOfElements": 1,
"pageNumber": 0,
"empty": false,
"size": 25,
"offset": 0
}
content
-
(Array of Objects) The page content
pageable
-
(Object) The page request information
Details
{ "page": 0, "size": 25, "sort": [ ... ] }
page
-
(Integer) The page number
size
-
(Integer) The size of the page
sort
-
(Array of Objects) The sort definition for the page
Details
{ "property" : "fieldA", "direction" : "ASC" }
property
-
(String) The property where the sort was applied
direction
-
(String) The direction where the sort was applied. Either
ASC
orDESC
totalSize
-
(Integer) The total number of records
totalPages
-
(Integer) The total number of pages
numberOfElements
-
(Integer) The number of elements on the returned slice of content
pageNumber
-
(Integer) The current page number
empty
-
(Boolean)
true
if the page has no content size
-
(Integer) The size of the returned slice of content
offset
-
(Integer) The page offset
Appendix B: Date and Time Patterns
In some instances, the use of string patterns may be required to represent dates. These patterns consist in a series of letters and symbols that represent the structure the date should follow as an output. To create them follow the next table of definitions:
Symbol | Meaning |
---|---|
G |
The era (i.e. AD) |
u |
The year |
y |
The year of the era |
D |
The day of the year |
M |
The month of the year |
L |
The month of the year |
d |
The day of the month |
Q |
Quarter of the year |
q |
Quarter of the year |
Y |
Week based year |
w |
Week of based year |
W |
The week of the month |
E |
The day of the week |
e |
The localized day of the week |
c |
The localized day of the week |
F |
The week of the month |
a |
The am/pm of the day |
h |
The clock hour (1-12) |
K |
The clock hour (0-11) |
k |
The clock hour (1-24) |
H |
The hour of the day (0-23) |
m |
Minute of the hour |
s |
Second of the minute |
S |
The fraction of the second |
A |
The milliseconds |
n |
The nanoseconds |
N |
The nanoseconds of the day |
V |
The time-zone ID |
z |
The time-zone name |
O |
The localized zone offset |
X |
The zone offset |
x |
The zone offset |
Z |
The zone offset |
p |
Pad the next |
' |
Escape for text |
'' |
A single quote |
[ |
Start of an optional section |
] |
End of an optional section |
Each symbol may be used 'n' consecutive times (e.g. uuuu), this will determine the use of a short or long form of the representation. The definition of these forms may vary depending on the type a symbol represents, the following list shows the basic representations depending on the type and the 'n' times a symbol is repeated:
-
Text
-
n < 4: Abbreviation (e.g. Wed for wednesday)
-
n = 4: Full
-
n = 5: Normally one letter (e.g. W for wednesday)
-
-
Number
-
n: The number with zero padding for the extra quantity (e.g. 3 -> 001)
-
c, F -> n <= 1
-
d, H, h, K, k, m, and s -> n <= 2
-
D -> n <= 3
-
-
-
Number and Text (Combination of both)
-
n >= 3: Seen as a Text
-
n < 3: Seen as a Number
-
-
Fractions
-
n <= 9: The number of truncations to the value
-
-
Year
-
n = 2: Two numbers (e.g. 23 for 2023)
-
n <= 4, n != 2: The full year
-
-
ZoneId
-
n = 2: Outputs the zone id
-
-
Zone names
-
1 <= n <= 3: Short name
-
n = 4: Full name
-
-
Offset for 'X' an 'x'
-
n = 1: Just the hour if minute is zero, otherwise include minute
-
n = 2: Hour and minute
-
n = 3: Hour and minute with a colon
-
n = 4: Hour, minute and second
-
n = 5: Hour, minute and second with a colon
-
-
Offset for 'O'
-
n = 1: Short offset (e.g. GMT+1)
-
n = 4: Full offset (e.g. GMT+1:00)
-
-
Offset for 'Z'
-
n <= 3: Hour and minute
-
n = 4: The full offset (e.g. GMT+1:00)
-
n = 5: The hour and minute with colon
-
-
Pad
-
n: Number of the width
-
For example, the pattern "dd MM:ppppppuuuu" creates the following date "30 11: 2023". Special characters can be combined with the symbols, such as the ':' and the spaces. The latter can also be achieved with the pad, in the example 2 additional spaces are added between the year and the ':'.
Appendix C: Error Messages
If a request to any API produces an error, a standard response will be returned:
{
"status": 409,
"code": 2001,
"messages": [
"Duplicate entry for field(s): name"
],
"timestamp": "2023-01-28T01:52:22.117244600Z"
}
Response Body
status
-
(Integer) The HTTP status code of the response
code
-
(Integer) The internal error code.
Each code is composed by 2 digits that represent the category, and 2 digits that represent the specific error, where the category can be:
messages
-
(Array of String) An optional list of messages describing the error
timestamp
-
(Timestamp) The UTC timestamp when the error happened
Error Codes
The error code is a better description of the error. It extends the information provided by the HTTP status code as the same status could be caused by different problems.
Each code is composed by 2 digits that represent the category, and 2 digits that represent the specific error, where the category can be:
-
10 - Resources: access to entities or endpoints
-
20 - Data integrity: entities referencing other entities, or other constraints such as unique keys
-
30 - Data validation: input data (format, missing fields…)
-
40 - Execution: problems while invoking an action
-
70 - Security: access and permissions
-
80 - Third-party: communication with external services
-
99 - Others: any other issue
Code | Description |
---|---|
1001 |
The endpoint or HTTP Method is undefined or disabled |
1002 |
The requested bucket is missing |
1003 |
The requested resource is missing |
2001 |
The entity already exists (same name or any other combination of fields defined as unique) |
2002 |
The entity to delete is referenced by other entities |
3001 |
The input data is corrupted |
3002 |
The input data is missing or invalid |
3003 |
The input data is too large |
4001 |
The action could not be executed due to the current state of the system |
4002 |
The action was terminated due to a timeout |
4003 |
The Core DSL expression could not be executed |
7001 |
The action could not be executed due to the permissions of the user |
8001 |
Could not establish connection to an external service |
8002 |
The external service returned an error |
9901 |
Custom user error |
9999 |
Undefined error |
Appendix D: Metrics
Each Discovery component publishes metrics regarding health, performance, and Discovery-specific workloads. The idea of this page is to give the user a headstart in the understading of these metrics, by highlighting those most commonly used, their meaning, and the dimensions each of them have. Keep in mind that the majority of these have, by default, a time lapse of one minute.
Dimensions
Each metric may have dimensions, which can be used as metric filters. These filters helps ensure only certain published values are taken into account. As an example, the component
dimension, which most metrics have, can be used to narrow down metrics to a specific component or API, like the Ingestion Script Component or the QueryFlow API. Below are some of the most commonly used dimensions, however, it’s important to note that there are plenty more, so it’s recommended to check them in each desired metric.
Dimension | Description |
---|---|
The Discovery component that published the metric |
|
Found in cache metrics, its the specific type of cache that produced the metric, i.e. |
|
Found in metrics that measure an operation, it refers to the operation’s result, such as a job status or a cache |
|
Found in metrics related to an Ingestion Seed Execution, it indicades the seed’s ID |
Common Metrics
These metrics are published by most Discovery Components and are related to more than one product.
Metric Name | Description | Dimensions |
---|---|---|
|
The amount of threads currently being throttled. Mostly used to monitor the thottler service used in Ingestion Components |
|
|
The amount of times a cache was called in the last time lapse |
Ingestion Metrics
These are metrics published by Ingestion Components that help monitor a Seed Execution.
Metric Name | Description | Dimensions |
---|---|---|
|
The amount of jobs currently being executed |
|
|
The amount of records collected (i.e that were processed) in the last time lapse |
|
|
The average time, in milliseconds, that it took to execute jobs completed in the last time lapse |
|
|
The amount of jobs completed in the last time lapse |
|
More information about default metrics published by all Discovery products can be found in the Micronaut-Micrometer documentation, at the the Provided Binders sub-section.