Ecto Embedded Schemas — Quick search through a JSONB array in PostgreSQL

Ecto Embedded Schemas — Quick search through a JSONB array in PostgreSQL - Coletiv Blog

Have you ever wanted or had the need to take a database table and cram a lot of information into a specific field using JSON?

This is possible and supported by both PostgreSQL and Ecto, and it can sometimes simplify your database in cases where you absolutely want to store some kind of nested data structure.

This post shortly describes how to create embedded schemas in Ecto to save a JSON data structure on your database, and how to implement a direct database search on this same field.

Why did I need a JSON field in the first place?

Recently, in a project I was working on, Backercamp, I had to implement a Shopify integration feature where some users would be attributed some coupons to use. I saw an opportunity here to, instead of just creating a new coupon table, to directly create a coupon field on the user table, saving me some trouble in extra queries and joins. Additionally, a JSON field might be useful when the structure of an object might vary.

In PostgreSQL, this can be done by defining a column as JSON. In this case, I actually used JSONB — a decomposed binary form of JSON — since it can be more efficient. The coupon, which included some closely related variables looked something like this:

Coupon — JSONB

{"code": "A2IFH1OI31DHQ08D", "expires_in": 12451353, "extra": true}

… which was then modified to an array, so that a user could have multiple coupons, which looks something like:

Coupon Array — JSONB[]

[
    {"code": "A2IFH1OI31", "expires_in": 12451353, "extra": true},
    {"code": "DF2EIUH2I4", "expires_in": 15135533, "extra": false},
    {"code": "SJDF9U292U", "expires_in": 46464143, "extra": true}
]

With this, I could save and retrieve the array directly from the database easily. The next step was to implement a search through this array — for this search, the input would be a “code” string, and the output would be the user that the coupon code belonged to. I basically had to search which user had a specific coupon.

And this is where I had to begin investigating…

1. Implementing a JSONB[] in Ecto

First things first, so an explanation on how to actually implement the embedded schema is still needed.

If you already know about embedded schemas and just want to know how to implement the search, feel free to skip ahead to the next section.

Embedded Schema

The proper way of implementing a complex field like this in Ecto is to “embed” it into the schema. There are 2 functions for this:

  • embeds_once: Field that includes a single JSON like object (JSONB)

  • embeds_many: Includes an array of JSON like objects (JSONB[])

schema "user" do
    field(:name, :string)
    field(:email, :string)
    timestamps()
    embeds_many(:coupons, Coupon)
end

The embedded schema itself needs to be defined, so it can describe what is stored inside the JSON.

defmodule Coupon
use Ecto.Schema
import Ecto.Changeset

@primary_key {:code, :string, autogenerate: false}    embedded_schema do
        field(:expires_in, :integer)
        field(:extra, :boolean)
    end
end

Migration

The migration to add this field is quite simple. The JSON part of the field is represented as :map

def change do
    alter table(:user) do
        add(:coupons, {:array, :map}, default: [])
    end
end

2. Building the search query in PostgreSQL

So, after all is said and done, the field worked like a charm. Whenever it was needed, I only had to query the user table and access that specific column, perfect!

When started to implement the “search by code” feature I just thought:

“Shenanigans! But how will this work inside a PostgreSQL query? 🤔🤔”

Turns out that, in Postgres, it is actually quite simple to access specific fields inside a JSON, as the following image shows:

Figure 1 — JSON Operators ([https://www.postgresql.org/docs/10/functions-json.html](https://www.postgresql.org/docs/10/functions-json.html)) Figure 1 — JSON Operators (https://www.postgresql.org/docs/10/functions-json.html)

The operator ->> was particularly interesting: for a single JSON object, I could immediately access the code if I wanted to:

{"code": “A2IFH1OI31", "expires_in": 12451353, "extra":  true} ->> 'code'

results in

“A2IFH1OI31"

This operand is nifty and useful, but here, I only used it directly on a single field. Now, to find a way to actually do this but for an entire array!

How do we do this for a whole array?

Searching through the same JSON functions documentation as shown above, I found an interesting function, which was supposed to unroll a JSONB into a set of values: jsonb_array_elements(jsonb). However, this function seems not to work with our JSONB[] type, only JSONBBummer 😔!

Internally, the function looks at the JSONB object and determines if there are any set of objects, but still, it does not work directly with the type we set for our coupon column…

Setting out to find an alternative, I found the unnest(anyarray) function, which does basically the same function, but this time it works directly with the array!

SELECT unnest(user.coupons)
FROM user

Well… this seems to work to retrieve all the user’s coupon JSON objects!

Now, to find a specific coupon with a code, we use this previous query as a subquery and use the JSON operator inside the WHERE condition:

SELECT * FROM
    (SELECT unnest(user.coupons) as obj
    FROM user) subq
WHERE obj->>'code' ilike '%search_term%'

We’re making progress here! The previous query returns to me all the codes matching my search term. However, the final goal is to return the users associated with these codes, so there is a final extra step.

I was able to do this by modifying the previous query a bit and using the EXISTS operator to test for the existence of rows in the subquery. For each user, it checks whether any existing code matches, and the final query is actually pretty simple!

SELECT *
FROM user
WHERE exists
    (SELECT * FROM unnest(user.coupons) obj
    WHERE obj->>'code' ilike '%search_term%')

Efficiency

With around 189k users, with around 6k of them having codes, the query takes somewhere between 800 and 1200msec to complete, which is not ideal

Query and query plan to obtain coupon codes from a user Query and query plan to obtain coupon codes from a user

Total query time Total query time while obtaining coupon codes for a user

Note: Assume project_backer is the same as user.

What about… indexing?

GIN indexes could technically be applied to an array! But it does not help much in this particular situation.

A GIN index here would only be useful if we wanted to search for the whole of the content of an object inside the array — we would have to search for the entire object (code, expires_in, extra) instead of just searching for the code. This is not feasible, since we do not know the values of the other parameters: expires_in and extra.

With a different data structure, an index might’ve been feasible, so try to think about your particular situation before you decide on whether you really want or need to use an array to store your many objects.

3. Translating the Query into Ecto

This is the easiest part, since Ecto has the fragment function for whenever we want to write a query that deals with a function that is not implemented directly in Ecto itself.

User
|> where([user],
  fragment(
   "exists (select * from unnest(?) obj where obj->>'code' ilike
           ?)",
   user.coupons,
   ^"%#{search}%")
|> Repo.all()

Final Thoughts and Considerations

And that’s it! If you really needed to know how to solve this, you have an example here — but take into account the following:

  • This is my personal solution to a specific problem. You might need to adapt the query to your needs as your field and schemas might have a different structure from the one presented.

  • Decide if you really need a JSON structure/embedded schema, or if it would be more organized to have your structure represented by a table — sometimes JSON with embedded schemas are the only way to save a piece of data in which the format is variable, but if the format is fixed, consider a more traditional solution!

  • Lastly, but maybe most importantly, measure your query performance. A query like this for a JSON[] might be heavy if you have many thousands of users, for instance, and more so if they have many objects within the array. Be wary of this when deciding whether or not to use an embedded schema.


Hope to have shed some insight into the topic of embedded schema arrays and search functions!

Thank you for reading!

Thank you so much for reading, it means a lot to us! Also don’t forget to follow Coletiv on Twitter and LinkedIn as we keep posting more and more interesting articles on multiple technologies.

In case you don’t know, Coletiv is a software development studio from Porto specialised in Elixir, iOS, and Android app development. But we do all kinds of stuff. We take care of UX/UI design, web development, and even security for you.

So, let’s craft something together?