Core module
Let us discuss the tools that are used to manipulate a JSON schema, in order to ease the implementation of the generators and the validator. More precisely, we need to introduce how we handle the keywords that are defined as Boolean operations, and how we handle references inside a schema.
Boolean keywords
allOf
To handle allOf
keywords (i.e., AND operators), we consider the array as a big conjunction and we merge together all the terms in the conjunction.
For instance, the schema
{
"type": "integer",
"multipleOf": 3,
"allOf": [
{
"multipleOf": 5
},
{
"multipleOf": 2
}
]
}
gets reduced to
{
"type": "integer",
"multipleOf": 30
}
because we want the integer to be divisible by 2, 3, and 5 at the same time.
anyOf
Since an anyOf
is equivalent to the OR operation, we do not have anything to do when retrieving the value.
Indeed, what must be done with the array depends on whether the generator or the validator is running.
Therefore, more information is given later.
not
Since our goal is to get as much constraints as possible for the generator, we handle not
by propagating it as much as possible.
For instance, let us consider the following schema:
{
"type": "object",
"not": {
"properties": {
"key": {
"type": "integer"
}
},
"minProperties": 2
}
}
First, we apply the not
over the conjunction induced by the sub-schema.
That is, we transform the “NOT AND” into an “OR NOT”:
{
"type": "object",
"anyOf": [
{
"not": {
"properties": {
"key": {
"type": "integer"
}
}
}
},
{
"not": {
"minProperties": 2
}
}
]
}
It is then easier to apply the not
over each item.
Typically, the properties imposing a minimum number of values to be present become a limit over the maximum number of values.
In the case of properties
or items
, the not
is propagated even further.
In the end, we obtain the following schema
{
"type": "object",
"anyOf": [
{
"properties": {
"key": {
"not": {
"type": "integer"
}
}
}
},
{
"maxProperties": 1
}
]
}
Notice that we can not propagate the not
over type
for now.
When the generator or the validator will process the key key
, the not
will be taken into account to restrict the range of allowed types.
oneOf
The oneOf
keyword is basically a XOR operation over the elements in the array.
Let us denote the XOR operation by \(\oplus\) and assume the array contains the elements \(A, B\), and \(C\).
Then,
That is, we can transform the oneOf
into an anyOf
containing allOf
.
For instance, the schema
{
"oneOf": [
A,
B,
C
]
}
(where A, B, and C are valid JSON schemas) becomes
{
"anyOf": [
{
"allOf": [
A,
{"not": B},
{"not": C}
]
},
{
"allOf": [
{"not": A},
B,
{"not": C}
]
},
{
"allOf": [
{"not": A},
{"not": B},
C
]
}
]
}
We then apply our procedures over the anyOf
, allOf
, and not
keywords to ensure that all constraints are correctly taken into account.
References
When encountering the keyword $ref
, the implementation needs to load the required schema.
In the case where that other schema is in a completely different file, we must open this new file and read the schema in it.
In the case where the schema is actually a sub-schema of the current file, we simply have to “jump” to that part of the file.
In both cases, this operation is invisible for the generator and the validator.
That is, whenever a call to a function retrieving a sub-schema (so, the boolean operators, properties
, items
, and so on), the function immediately returns the referenced schema.
In other words, the generator and the validator do not see the $ref
keywords.
This allows the generation and validation to always behave the same way, regardless of $ref
.
Abstractions induced by how the keywords are handled
The way not
and allOf
(for instance) are handled already abstracts the schema. Indeed, consider the following schema (let us called it A)
{
"type": "integer",
"allOf": [
{
"not": {
"minItems": 5
}
},
{
"not": {
"maxItems": 3
}
}
]
}
and apply our operations for “not” to obtain
{
"type": "integer",
"allOf": [
{
"maxItems": 4
},
{
"minItems": 4
}
]
}
Finally, apply the “allOf” to get the schema B
{
"type": "integer",
"maxItems": 4,
"minItems": 4
}
It is possible to generate documents that are valid against B (since minItems
and maxItems
do not impose any kind of constraints upon integers).
However, all of these generate documents are invalid against A since the sub-schema
{
"not": {
"minItems": 5
}
}
returns false for any document (again, as minItems
do not impose restrictions upon integers).
Implementation
The principles described here are implemented in three classes:
be.ac.umons.jsonschematools.JSONSchemaStore
handles reading a schema from a file.be.ac.umons.jsonschematools.JSONSchema
implements the operations to manipulate a schema (i.e., retrieving constraints, applying the boolean operations, and so on).be.ac.umons.jsonschematools.MergeKeys
is an helper class (thus, only accessible in the library code) implementing the actual merging of values. This class is used byJSONSchema
to split the complexity into more readable parts.
On top of these classes, the classes JSONObject
and JSONArray
from org.json
are extended to provide an implementation of the hashCode
method: be.ac.umons.jsonschematools.HashableJSONObject
and be.ac.umons.jsonschematools.HashableJSONArray
.
This allows us to use sets and maps to implement many parts of the library in an efficient.
In order to guarantee that the results are always the same (to have reproducible results), we rely on LinkedHashSet
and LinkedHashMap
.