A lyrics website that learns from you, built with Algolia and Kontent

So I’ve been messing around with Kontent.ai recently, and I’ve got to say, it’s one of my favorite headless CMSs. I was testing some things in the project it automatically creates for you on signup, and it occurred to me that I could model a song in a lyrics database:

Song model in Kontent.ai

And thus, this idea was born. I’m going to build a lyrics database, and make good use of Algolia’s AI-powered search API to learn from the person searching and the type of music they’d like. Note that we’d need a lot of data to actually do anything useful with what we learn. For example, you could use Algolia’s Recommends API to suggest songs that’d the user would probably enjoy by the songs they already have searched for, but you’d need thousands of click events for this to work properly. It’s a great idea in concept, but I’ll start small for the sake of this article.

Let’s break this process down into a few steps, and walk through them one by one:

Let’s look at how modeling works in Kontent.ai, and fill our adorable little database with song data.
Then, we’ll need to build a lambda function in JavaScript that can get our data from Kontent.ai and put it in searchable-object form for Algolia.
Next is creating an Algolia index from our data, and hooking up the GUI to consume that index.
Lastly, we’ll want to make the search results lead to pages that display all the information about that song, even if some of that info wasn’t in the search result that led us to that page, so we’ll need to go back to Kontent.ai to request the data again.

Let’s get started!

How modeling works in Kontent.ai

This will actually be a fairly short section — Kontent.ai aims to make this intuitive. Essentially, you’re meant to split your content into discrete types with explicit shapes. You can organize the attributes in those types into groups for convenience too. Definitely take a look at this docs article for more details, here were the four main suggestions:

Group content elements based on their purpose — it lets folks edit only what is relevant to them.
Crumble content into smaller pieces — don’t go too far with this, but try to keep your units of content in the smallest reasonable chunks.
Verify your content model responds to your needs — the purpose of the model is to represent your app’s requirements, so don’t make something that’ll give your content creators a migraine down the line.
Ensure your content model won’t become a burden — it’s a good idea to refresh the model every once in a while because those business requirements are prone to change.

Sidenote: I love the font of this site. Apparently it’s GT Walsheim Pro.

For me, I just wanted to model a song, so I added these attributes:

Title — simple text
Artist — simple text
Lyrics — rich text
In-app URL — auto-generated slug
YouTube Music URL — simple text
Album Cover — image asset
Album Name — simple text

That should do it! I’m just going to add a few of my favorite songs to start fleshing out our data.

Adding a song called 319

Adding a song called Digital Witness

Transforming our data for Algolia to use

The next step is building a lambda function to consume this data and turn it into a searchable index for Algolia. Kontent.ai automatically provides us a couple APIs to work with — I’m going to demo the GraphQL one. This is actually a public dataset too because I haven’t turned on “Secure Access”, so you can follow along with the project ID fd742f82-d991-00e6-8d84-5b22158c71b0. Just make sure that if you’re in production, you absolutely turn on Secure Access so that your data isn’t completely public. I don’t mind here because this data will be publicly searchable via Algolia anyway. If you’d like to just mess around with the dataset without creating a whole app, definitely try the online Hasura API explorer. The GraphQL URL where our requests point to is just going to be [https://graphql.kontent.ai/](https://graphql.kontent.ai/) plus our project ID above.

I’m going to use that Hasura explorer to model what my query will look like — it’s definitely a helpful tool even if you are building a full app. I’m thinking that right now, the only searchable attributes should be the title of the song, the artist’s name, the album name, and most importantly, the lyrics. So here’s the explorer page once I’ve discovered the right query to retrieve that data:

My big GraphQL query to Kontent.ai pulling up results in the Hasura GraphQL Explorer

Cool right? It automatically pulled in the schema and everything right off the bat. Kudos to Kontent.ai on making it so easy to use, and to Hasura for developing such a clean GraphQL reading UI.

In text form, here’s my GraphQL query:

query SearchIndex {
  song_All {
    items {
      title
      artist
      albumName
      lyrics {
        html
      }
      albumCover {
        items {
          url
        }
      }
            inAppUrl
    }
  }
}

Inside a JS lambda function, we’re looking at something like this:

import fetch from 'node-fetch';
const { convert } = require('html-to-text');

exports.handler = async ev => {
    const lyricsAsHtml = ev.queryStringParameters.lyricsAsHtml == 'true';

    const response = await fetch(
        "https://graphql.kontent.ai/fd742f82-d991-00e6-8d84-5b22158c71b0",
        {
            headers: {
                "Content-Type": "application/graphql"
            },
            method: "POST",
            body: `
                query SearchIndex {
                    song_All {
                        items {
                            title
                            artist
                            albumName
                            lyrics {
                                html
                            }
                            albumCover {
                                items {
                                    url
                                }
                            }
                            inAppUrl
                        }
                    }
                }
            `
        }
    );
    let results = (await response.json())
        .data
        .song_All
        .items
        .map(
            result => ({
                ...result,
                lyrics: lyricsAsHtml
                    ? result.lyrics.html
                    : convert(result.lyrics.html)
            })
        );

    return {
        statusCode: 200,
        body: JSON.stringify(results, null, '\t')
    };
};

If you’re not familiar with this code, that’s alright! It might seem a little daunting, but the purpose is simple: run that GraphQL query and spit whatever it gives you back out to the client as JSON. I added in a library called html-to-text as well that can convert the HTML output of Kontent.ai to plain text when we give this endpoint the right GET query parameter. You’ll see that becomes important later. When I go to the endpoint in my browser directly, I’ll get this back (assuming I’ve set the lyricsAsHtml parameter to false):

The results of my query returned by the Netlify function I set up

Perfect!

Now, I just want to package that up into something Algolia can search. We want each song represented by a singular object in the JSON array, each with all the data they need for Algolia to match search queries against. Luckily, we’ve engineered our content model in such a way that all that data is already broken down for us! I don’t need to go fishing around in some long description to find the artist’s name; it’s right in the artist property. So I just have to pick out the searchable properties in a GraphQL query (done), and feed the results straight to Algolia!

Come with me and head to algolia.com! Create your account (or log in, if you’re way ahead of me 😉) and create your new application. I’ll be using the free tier, so it looks like this to me:

Creating an Algolia application

Follow the process through and create your application.

It’ll prompt you to set up an index — I named mine lyrics just like the application. It just makes things easier since I’ll only have one for this project. I saved the JSON from my endpoint’s output to a JSON file, so I can initialize the Algolia index with that file. Here’s me uploading it:

Uploading our song data to Algolia

For my project, I’m not going to worry too much about automatically updating the data. I’d imagine that any time I release a bunch of content in bulk in the CMS, I can just download it with this endpoint I’ve made and upload it into Algolia. It’s a minor pain, but thankfully Algolia has a crawler specifically to solve this problem, and it’s available on the paid plans. I’m sticking with the free since this is a demo project, but that would correct this thorn on the rosebush, so to speak.

Lastly, under the Configuration tab, make sure you designate your searchable attributes as searchable:

Choosing which attributes are searchable in Algolia

Using Algolia on the front end

Let’s make our front end now! I just spun up a simple Astro template with:

npm init astro -- --template blog

I had to go back in add node-fetch to the package.json because Astro overwrote it, and I need that package for our lambda function, but everything else stayed put.

Now let’s create a search box and a place for the results to be shown. Astro lets me neatly package the whole search system into a Search component just like any other framework does. Take a look at this single-file component:

---
// Search.astro
---

<style is:inline>
    #search-input-container {
        height: 5vw;
        display: flex;
        justify-content: flex-end;
        align-items: center;
    }

    #search-input {
        border: 1px solid #111;
        border-radius: 5px;
        margin: 0 5px;
    }

    #search-hits > div > div > ol {
        list-style-type: none;
        min-width: 50vw;
    }

    #search-hits > div > div > ol > li {
        margin: 0;
        padding: 1vw;
        width: 100%;
    }

    .search-result {
        display: grid;
        grid-template-areas: 'img title'
                             'img artist'
                             'img album';
        grid-template-columns: 12vh 1fr;
        color: #111;
        text-decoration: none;
        cursor: pointer;
        border: 1px solid #888;
        box-shadow: 1px 1px 3px #888;
        padding: 2vw;
        border-radius: 10px;
    }

    .search-result-img {
        grid-area: img;
        height: 10vh;
    }

    .search-result-title {
        grid-area: title;
        font-size: 3vh;
        line-height: 4vh;
    }

    .search-result-artist {
        grid-area: artist;
        font-size: 2.5vh;
        line-height: 3vh;
    }

    .search-result-album {
        grid-area: album;
        font-size: 2.5vh;
        line-height: 3vh;
    }
</style>

<section id="search">
    <div id="search-input-container">
        <label for="search-input">Search Lyrics:</label>
    </div>

    <div id="search-hits"></div>

    <script
        src="https://cdn.jsdelivr.net/npm/algoliasearch@4.5.1/dist/algoliasearch-lite.umd.js"
        integrity="sha256-EXPXz4W6pQgfYY3yTpnDa3OH8/EPn16ciVsPQ/ypsjk="
        crossorigin="anonymous"
    ></script>
    <script
        src="https://cdn.jsdelivr.net/npm/instantsearch.js@4.8.3/dist/instantsearch.production.min.js"
        integrity="sha256-LAGhRRdtVoD6RLo2qDQsU2mp+XVSciKRC8XPOBWmofM="
        crossorigin="anonymous"
    ></script>
    <script src="../js/search.js"></script>
</section>

Let’s break that code sample down a bit:

At the beginning, you’ll notice there are two lines that are just ---. Those “fences” section off a piece of JavaScript that only runs when Astro builds this page. I don’t have any build-time logic to run in this component, so I just put a comment with the file name, but this functionality is incredibly helpful when you want to run as little as possible on the client.
Next, I can include some CSS in a <style> tag that’s scoped to just this component, so I don’t have to worry about any class names or anything bleeding into the rest of the app.
Inside my actual HTML, I have a container for me to put my search input box, as well as a place for the results of the search query to go. You’ll see why these are currently empty and how they’ll be filled in a moment.
Lastly, I include a few scripts. The first two are jsDelivr-hosted Algolia JS scripts from the algoliasearch and instantsearch.js libraries. The last script references a JavaScript file that will be bundled into this component at build-time but currently lives at /src/js/search.js. This is where the actual search logic lives.

Here’s what that search.js looks like, with comments explaining what everything does:

// step 1. connect to algolia
const search = instantsearch({
    indexName: 'lyrics', // this is the index we made earlier
    searchClient: algoliasearch(
        '8WCMJCP5DG', // identifies this specific application
        '7654ba9896fe4286deca1195f47bb2a0' // this is my public api key, 
        // feel free to use these credentials if you want to test out this index!
    )
});

// step 2. create widgets, which are like little nuggets of functionality
search.addWidgets([
    // widget #1 is a custom SearchBox
    // I could've used the default Search, but I like recreating it
    // this gives me a little extra flexibility later if I want to
    //     add more functionality
    instantsearch.connectors.connectSearchBox(
        (renderOptions, isFirstRender) => {
            const { query, refine, clear, isSearchStalled, widgetParams } = renderOptions;

            if (isFirstRender) {
                // if we're just now rendering the page
                //     1. create the input for the search box
                //     2. give it an event listener
                //     3. add it to the container we specify later
                const input = document.createElement('input');
                input.type = "search";
                input.id = "search-input";
                input.addEventListener('input', event => {
                    if (event.target.value) refine(event.target.value);
                    else clear();
                });

                widgetParams.container.appendChild(input);
            }

            widgetParams.container.querySelector('input').value = query;
        }
    )({
        // specify the container where we'll add the search box
        container: document.getElementById('search-input-container')
    }),

    // widget #2 is where the results go
    instantsearch.widgets.hits({
        container: '#search-hits',
        templates: {
            item: hit => { 
                // I could do some logic here if I wanted, but I'll just return
                //     a template string with the results embedded into it
                return `
                    <a class="search-result" href="/songs/${hit.inAppUrl}">
                        <img src="${hit.albumCover.items[0].url}" class="search-result-img" />
                        <span class="search-result-title">${hit.title}</span>
                        <span class="search-result-artist">${hit.artist}</span>
                        <span class="search-result-album">from <i>${hit.albumName}</i></span>
                    </a>
                `;
            }
        }
    })
]);

// step 3: start searching!
search.start();

Hopefully those comments clarify what each piece of code does, but if you have any questions, feel free to reach out to me on Twitter!

Now when I load up the home page, which mostly just contains the new Search component we just created, it looks like this:

8The new lyrics homepage with the search box and results

Note that without a search query, all results are shown automatically. Perhaps there’s room in the future for some more sophisticated ranking algorithms that give each searchable object a number based on their popularity right now, and then order the song results by which ones have the highest number? Then when we aren’t searching, we’ll have just the first few very popular results on the homepage, but when we are searching, we’ll be prioritizing what the user is most likely to be searching for. This isn’t something I’ll try to implement now, but it’s a good question to mull over while you develop your search index: what factors affect the order of the search results to best match my users’ expectations? Perhaps Algolia’s personalization AI is a good fit here.

When I start searching for something in any of of our four searchable attributes, the results start narrowing:

Searching by the query "fall"

Interestingly, Algolia is already doing a lot of the relevance logic behind the scenes. It’s already realizing that the first song, which has the search query in the title and the album name in addition to lyrics matches, is probably the best search result.

Another fascinating result: October Sky doesn’t actually contain any references to the search query, but I imagine Algolia’s AI realizes that “October” and “fall” are related words. How cool is that? Similar things are probably happening to the other songs, just with lower relevance scores — Digital Witness contains a reference to jumping (which is related to falling), 319 to December and snow (which are probably mentioned a lot with the winter, and so tangentially match the fall), and In My Groove to the sand (with the same reasoning as the winter result, but for summer this time).

If I want, I can actually output a message to the console based on the JSON of the search results to see where exactly they matched:

Console logs of the result match data

Sure enough, it’s matching the search query to the lyrics of all these songs despite the actual word fall not appearing in any of them. It’s just matching based on relevant terms like “October” showing up in the lyrics, which is just amazing if you ask me.

Pulling song data from Kontent.ai for their own pages

You may have noticed that each of our search results is actually an <a> tag, and that they link to the inAppUrl link we pulled from Kontent.ai via that GraphQL query. Right now, it’s a 404 when we try to click on one of those links, but they are taking us to the right place:

The Astro 404 page

See the URL? It’s /songs/in-my-groove. The next step here is to create a generic, catch-all page for the /songs url, and then search for all the songs that match that last part in Kontent.ai. Then we can pull the data from Kontent.ai into a full-on lyric page for the song.

According to Astro’s routing docs, we must do two things to create dynamically routed pages like this:

put the slug in brackets in the filename — I named our new page /src/pages/songs/[inAppUrl].astro, so we’re good there
export the getStaticPaths function, which tells Astro what pages to create at build time

Our getStaticPaths function needs to return an array of objects like this:

{
    params: {
        dog: 'clifford'
    },
    props: {
        color: 'red',
        size: 'big'
    }
}

Something that tripped me up at first: the params object only contains the attributes that appear in the URL. The props object contains everything else that we want to make available to the page. So in our case, the attribute in the params object will be inAppUrl, which is part of the URL that the page Astro is building will eventually live on. Everything in props, however, will be given to us to use in our build-time JS, so let’s stick all of the data we’ll need in there now so we don’t have to run multiple GraphQL queries. Here’s the [inAppUrl].astro file that I came up with:

---
// import node-fetch so we can query for all our song data from Kontent.ai in a `getStaticPaths` function
import fetch from 'node-fetch';
import BaseHead from '../../components/BaseHead.astro';

export async function getStaticPaths () {
    const response = await fetch(
        "https://graphql.kontent.ai/fd742f82-d991-00e6-8d84-5b22158c71b0",
        {
            headers: {
                "Content-Type": "application/graphql"
            },
            method: "POST",
            body: `
                query SearchIndex {
                    song_All {
                        items {
                            title
                            artist
                            albumName
                            lyrics {
                                html
                            }
                            albumCover {
                                items {
                                    url
                                }
                            }
                            youtubeMusicUrl
                            inAppUrl
                        }
                    }
                }
            `
        }
    );
    return (await response.json())
        .data
        .song_All
        .items
        .map(result => ({
            params: {
                inAppUrl: result.inAppUrl
            },
            props: {
                ...result,
                lyrics: result.lyrics.html,
                albumCover: result.albumCover.items[0].url
            }
        }));
};

// now let's start building the page with the data from Astro.params

const {
    title,
    artist,
    albumName,
    lyrics,
    albumCover,
    youtubeMusicUrl,
    inAppUrl
} = Astro.props;

---

<html lang="en">
    <head>
        <BaseHead
            {title}
            description={`${title} by ${artist} from ${albumName}`}
        />

        <style>
            header {
                display: grid;
                grid-template-areas: 'img title'
                                     'img artist'
                                     'img album';

                border: 1px solid #888;
                box-shadow: 1px 1px 3px #888;
                padding: 2vw;
                border-radius: 10px;
                margin-bottom: 10vh;
                column-gap: 2vw;
                justify-content: center;
                align-items: center;
            }

            .song-img {
                grid-area: img;
                height: 25vh;
                object-fit: contain;
                object-position: right;
                width: 100%;
                grid-template-columns: 25vh 1fr;
            }

            .song-title {
                grid-area: title;
                font-size: 7vh;
            }

            .song-artist {
                grid-area: artist;
                font-size: 3.5vh;
            }

            .song-album {
                grid-area: album;
                font-size: 3vh;
            }

            main, footer {
                text-align: center;
                margin-bottom: 10vh;
            }
        </style>
    </head>

    <body>
        <header>
            <img class="song-img" src={albumCover} />
            <h1 class="song-title">{title}</h1>
            <p class="song-artist">by {artist}</p>
            <p class="song-album">from <i>{albumName}</i></p>
        </header>

        <main>
            <h2>Lyrics:</h2>
            <Fragment set:html={lyrics}>
        </main>

        <footer>
            <h2>Want to hear more?</h2>
            <a href={youtubeMusicUrl} target="_blank">Listen to the song on YouTube Music</a>
        </footer>
    </body>
</html>

It’s another long code sample, but I’ll break it down again:

The beginning is where we implement the getStaticPaths function I was just yammering on about. We query every piece of data about every song, and build our params and props objects as needed.
When Astro builds a page for each song, it’ll make our props object to the main body of the page as Astro.props. Before the end of the JS section, I destructure that object into all of our variables. I’ve already modified lyrics and albumCover so they’re just strings at this point, making the destructuring much cleaner.
Then comes a sprinkling of CSS — you’ll notice it’s much the same as our search result CSS earlier. I wonder why that could be…
The <header> is much the same as our search result template from earlier.
In <main>, we use a special Astro component similar to <></> in React to render our already-marked-up lyrics. Injecting plain HTML into the page via expressions like a simple {lyrics} is frowned-upon due to possible security concerns, so it was deprecated in Astro a bit ago in favor of this approach.
Our <footer> is nothing special — just some extra information and the YouTube Music link.

And we’re done! The song page looks like this:

The song-specific page

This is exactly what I was imagining at the beginning — a tidy lyric display with a smart search. You can test it out for yourself at https://lyrics-algolia-kontent.netlify.app

Where we could go with this

We’re done for now, but we’re not finished for good. We could do a lot of things with this still! Most of it would begin with tracking click events on each song in the search results and counting clicks on the YouTube Music links as conversions. From there, we could:

Track analytics of the most popular songs straight from our Algolia dashboard
Recommend new songs to the user based on songs they previously have clicked on
Recommend new songs to the user based on what other people who liked a particular song also liked
Have a trending page with the most popular songs of the day or month
Personalize the search result rankings so they’re more likely to be what the user is searching for (i.e., if the user searches for rock songs a lot, show the rock song matches first even if a jazz song matches slightly better)
Quantitatively improve the search relevance algorithm by A/B testing slightly different versions against each other

Like I mentioned earlier, we’d need a lot more data for all of this, and it’s hard to get that data with 7 songs. Adding a significantly larger amount of songs would need to be done by human submission (to avoid copyright infringement), and that would take far more time than I have for right now. But you know where to find me if you want to pick this project up 😏

So what did we learn today, class?

Kontent.ai provides an easy-to-use APIs and a clean GUI for working with structured data.
I think The Oh Hellos and Jacob Collier are the best thing to happen to music in a while.
Algolia can index lots of data in advance and spit out intelligent results to search queries at hyper-speed.
fall has so many meanings and related words that it’s nearly useless as a search query.
Astro is super fun to work with! I only had to ask one question, and as usual, it was just me overthinking something (thanks for the help Ben), but otherwise it was smooth sailing.
Most lyrics sites today are probably infringing on somebody’s copyrights 24/7.
We’ve left the door open for so many AI-driven awesome features if we just get enough content into this app to generate the needed training data (legally, of course).

Wow! 7 takeaways in 4,000 words. I think that’s as good a place as any to leave it! If you have any questions, feel free to reach out to me, the Algolia team, the Kontent.ai team, or the Astro team — everyone is eager to help. As always, happy building!

Jaden's blog