Early developmentNot everything described here has shipped yet.

Skip to content

Move Away from JSON Schema

Date: May 2023

Status: Accepted

Context

In ADR 006, we decided to use JSON Schema for our schemas. However, we have since found that using JSON schema internally introduces accidental complexity.

The accidental complexity comes from the expressivity of JSON schema that we do not need, and from a mismatch between JSON schema and how our UIs represent and edit data.

Decision

We will move away from JSON schema and instead use a custom schema format that is more closely aligned with our UIs.

Including:

  • We will replace the array type with a "multiple" attribute
  • We will implement required properties via a "required" attribute rather than the schema-level "required" list

Consequences

  • We can simplify our code dealing with schemas, especially code related to handling of multiple values.
  • We can no longer use standard JSON-schema-based validators to validate Subjects. Then again, we anticipated those would not have sufficed anyway.
  • We no longer expose schemas in a standard format. If such a requirement arises, we can still implement it via a web API that converts our custom schema format to JSON schema.

Alternatives Considered

Continued usage of JSON Schema

We would need to maintain a higher level of internal complexity. Users who need JSON Schema can still easily be accommodated via a new API endpoint that does simple translation.

RDF shape languages: SHACL and ShEx

SHACL (a W3C standard) and ShEx (a grammar-based schema language, used by Wikidata for its EntitySchemas) both describe and validate the shape of RDF graphs.

Not adopted as our internal schema format, for the same reasons we moved away from JSON Schema. Our editing UIs cannot handle all their expressivity. Nearly all potential users are better served with a subset of their expressivity, avoiding both unnecessary implementation and carry costs, and complexity in the UIs and elsewhere.

Additionally, our data model is not an RDF graph.

This does not rule out emitting SHACL or ShEx as a downstream artifact. We could generate shapes from our native schemas, for instance, at RDF-export time.