Adding document search to my portfolio site

Adding document search to my portfolio site

I think it’s pretty uncontroversial by now to say that Apple is great at UI design. They’ve got their flaws, of course, but the interfaces of iOS and macOS and the style Apple used on them have become iconic.

When I was tasked by my college professor with making a portfolio site that stood out amongst the crowd, I clicked the Minimize button on the browser window and stared at the Desktop... and then it clicked. A fantastic UI was already designed for me, and everybody already knew intuitively how to use it! I just had to make a website out of it. And thus, jaden.baptista.dev was born.

I took great care to implement the basics of the Finder, which is how folks can access “files” on my previous jobs and such. I didn’t finish it though, mainly because the search was so daunting to implement. Here’s what it looks like on my actual Mac:

Mac interface example

The above picture is me, searching on my Mac for a font I made of a left-handed friend’s handwriting.

And there’s my design blueprint! But like I said, I didn’t tackle this before because I didn’t see a simple way to get it done. But since that point, I’ve worked a bit with Algolia, which now seems like the ideal solution for this oddly specific usecase, so I’m going to circle back to this project and document my findings here.

Organizing my data

I recently added a very simple content management setup using Sanity, so I’m starting out with all my data in one neat little place. In fact, I’ve created a JavaScript file that looks like this:

const getTechnologies = async filterForName => {
    const query = !!filterForName
        ? `*[_type == "technology" && name == "${filterForName}"]{ name, usedhere, summary }`
        : `*[_type == "technology"]{ name, usedhere, summary }`;

    const response = await fetch(
        `https://${process.env.NEXT_PUBLIC_SANITY_PROJECT_ID}.api.sanity.io/v2021-06-07/data/query/production?query=${encodeURIComponent(query)}`
    );
    return (await response.json()).result;
};

const getProjects = async filterForName => {
    const query = !!filterForName
        ? `*[_type == "project" && name == "${filterForName}"]{ companyDescription, jobDescription, name, summary }`
        : `*[_type == "project"]{ companyDescription, jobDescription, name, summary }`;

    const response = await fetch(
        `https://${process.env.NEXT_PUBLIC_SANITY_PROJECT_ID}.api.sanity.io/v2021-06-07/data/query/production?query=${encodeURIComponent(query)}`
    );
    return (await response.json()).result;
};

export {
    getTechnologies,
    getProjects
};

This just exports functions for looking up the two main pieces of content in my Sanity content lake. This gives me a perfect foundation for setting up search.

Just a heads-up: In this article, I’m going to be working with a very, very simple implementation of search. It’s essentially a toy/concept website, and so I might not use production-ready methods or fully think out how this would scale. Always defer to the Algolia docs if anything seems odd here. What I aim to demonstrate with this exploration though is the general way you can turn a database into well-formatted search material and then feed it into Algolia. If you have any questions, feel free to reach out to me on Twitter!

So to get started with the search, let’s briefly review how Algolia Search works! Generally, it expects you to have an endpoint which compiles all the information that you want to be searchable and returns it for Algolia to index. Then when somebody actually types something into the search box, your front-end quickly asks Algolia to thumb through the mounds of raw data we gave it earlier and return the most relevant results. Note that it doesn’t just match by text — it’s far more complex and nuanced than that. In my example, we’re not going to see much effect from AI personalization or intent detection, but know that once you start applying this at scale, you won’t get repeatable results given a certain query — you’ll still get relevant results, but now it’ll be catered toward the user who is searching and what they’re probably thinking about, based on training analytics you send it. I’m just going to skip over that AI part for now and come back to it in another article soon.

Here’s my JavaScript file gathering my searchable objects from my database:

import { getTechnologies, getProjects } from '../utilities/getFilenames.js';

const portableTextToPlainText = blocks =>
    blocks.map(
        block => (block._type !== 'block' || !block.children)
            ? ''
            : block.children.map(child => child.text).join('')
    ).join('\n\n');

exports.handler = async ev => {
    return {
        statusCode: 200,
        body: JSON.stringify([
            ...(await getTechnologies()).map(technology => ({
                name: technology.name,
                description: portableTextToPlainText(technology.summary)
            })),
            ...(await getProjects()).map(project => ({
                name: project.name,
                description: [
                    portableTextToPlainText(project.summary),
                    portableTextToPlainText(project.companyDescription),
                    portableTextToPlainText(project.jobDescription)
                ].join('\n\n')
            }))
        ])
    };
};

A little breakdown, since this may be a smidgeon abstruse:

  • At the beginning, I import my functions from the previous code sample which pull data in from Sanity. If you scroll back up a little bit, you’ll see that Project and Technology are two shapes that my data follows, one for the projects I’ve completed and one for the technologies I work with.
  • My Projects and Technologys are not carrying the same fields. Technology just has a brief summary, so I just need to convert that from Portable Text (Sanity’s amazing JSON-based rich text format) to plain text for Algolia to search. Note that this is one of those moments where my usecase is likely so different from yours that this might not be a good example. Generally you want plain-text descriptions to be as short and concise as possible, and to include as much information as you can in key-value attributes. I simply don’t have many of those discrete pieces of data which I can pull out of the description and convert into attributes on the object, but I will soon! When I go back to this content and add more details, I’ll have dates and titles and other parcels of data which can be excised from the wall-of-text that is the description and stored as attributes, both in my datastore (Sanity) and in my search index (Algolia). For the visual learners, this is bad:

      {
          "name": "TakeShape",
          "description": "I loved helping TakeShape.io get their DevRel team off the ground from May to December 2020."
      }
    

    And this is good:

      {
          "name": "TakeShape",
          "position": "Developer Advocate",
          "startDate": "5/20",
          "endDate": "12/20",
          "experience": "positive",
          "website": "https://takeshape.io",
          "description": "The few things which cannot be split into keys and values, typed out in sentence form."
      }
    
  • Lastly, this is all happening inside a Netlify Function. If you’re not quite up on what that is, that’s alright! Hopefully some of the structure of the JavaScript above will make more sense after reading this.

So now, when I ping that endpoint, that JavaScript function looks up all the data one could possibly search and returns it as regularized objects in JSON:

[
    ... some technologies ...
    {
        "name": "JavaScript",
        "description": "JavaScript is a powerful language used to add interactivity to otherwise plain, static pages. It's the language I specialize in since it lets me exercise problem solving skills and iterate on my solutions to reach maximum efficiency and elegance."
    },
    ... some more technologies ...
    ... some projects ...
    {
        "name": "TakeShape",
        "description": "As the Developer Advocate at TakeShape, I brought their fantastic API mesh product into the spotlight of the web developer community with a well-timed article schedule, an array of conference and podcast appearances, and an active social media presence..."
    },
    ... some more projects ...
]

Notice that after our little compilation step described above, we no longer differentiate between Project or Technology or anything else I might split up inside my app — they’re all just searchable objects now with the same shape so that Algolia can compare them. Keep in mind that I haven’t added enough data to reliably split out things like position duration, job title, or experience into defined attributes — in practice, it’d be rare to be using such a powerful tool as Algolia on such a pointless and ill-defined dataset. If you run something like this at anything resembling scale, you’ll have far more than two attributes on each search object, all of them relevant to search.

Working with Algolia

The next step is to get our data indexed in Algolia. When you create an application in Algolia, it’ll ask you to create your first index — I named mine files — and this is where we’ll be sticking our data. If you’re working on one of the paid plans of Algolia, you can set up a crawler to come to your site and ping the data organization endpoint we just set up with some configurable frequency. In my case, the free plan will do just fine — the information on my site updates quite infrequently, and I don’t have very much of it. Let’s demonstrate uploading this manually for now: You’ll see this screen in the dashboard of your index in Algolia:

A screenshot of the option to upload records manually to some index in the Algolia dashboard

And now, all 12 of my projects and technologies have been transformed into regular records and stored in Algolia.

The completed process

Now that our data is in Algolia, we have to think about what we want to get back when we query for it. Remember, we’re searching based on queries which likely aren’t fully-formed — if we send our first query as soon as the user types the first letter in the search box, Algolia will just get something like r. This is the time to decide what to prioritize when multiple results match such a generic query as r. I would highly suggest reading over the guide on relevance because it goes far more in-depth about this topic than I will here. At the moment, my thought process is very simple: I don’t have enough searchable content where this prioritization will matter. Even typing in one letter will narrow the results enough that they all are within view. Perhaps one day I’ll have so much job experience that I’ll need to revisit this, but honestly, I’m not including everything I’ve ever done on this page anyways — that’s not usually the best strategy when pitching yourself to potential employers.

As far as the technical implementation, this part was actually much simpler than I thought it would be. I didn’t have to work with any HTTP requests or AJAX — Algolia has created a system called InstantSearch that takes care of this for us. InstantSearch comes in all your favorite flavors: plain ‘ol JS, React, Vue, Angular, iOS native, and Android native. I built this site with Next.JS, so I just installed the right library and imported it into my Finder component:

npm install algoliasearch react-instantsearch-hooks-web
// at the top of my Finder component:
import algoliasearch from 'algoliasearch/lite';
import { InstantSearch, useSearchBox, Hits } from 'react-instantsearch-hooks-web';

Why those specific imports you ask? Good question! Here’s what they each do:

  • InstantSearch is a container for all our Algolia-related shenanigans. In my main Finder component, I just need to create a search client and then pass that into the <InstantSearch/> component:

      // inside my <Finder> component:
      const searchClient = algoliasearch(process.env.ALGOLIA_APPLICATION_ID, process.env.ALGOLIA_PUBLIC_KEY);
    
      return (
          // ... some stuff ...
          <InstantSearch
              indexName="files"
              searchClient={searchClient}
          >
              <SearchBox />
              <Hits hitComponent={Hit} />
          </InstantSearch>
          // ... some stuff ...
      );
    
  • Perhaps you’ve noticed that the previous code sample used two components we haven’t defined yet: SearchBox and Hit. For a little context, SearchBox is where the search input is (naturally), and Hits automatically updates whenever the value of that input has changed, displaying each result from the Algolia query as a Hit component. Now that code sample is quite simplified from what I actually included in my app; I define Hit as an anonymous function component right in the hitComponent prop definition, and I’ve got some quirky state stuff going on too. Because of that quirky state stuff, Algolia’s predefined SearchBox component wasn’t going to work for me, so I imported a hook to let me make my own: useSearchBox. It works like this:

      const SearchBox = () => {
          const { query, refine, clear, isSearchStalled } = useSearchBox();
    
          return (
              <input
                  type="search"
                  onInput={ev => {
                      if (ev.target.value) refine(ev.target.value);
                      else clear();
                  }}
              />
          )
      };
    

    I’ve defined that SearchBox component, and made sure to trigger the actual search with the refine function every time the input changes.

So in summary: I made a custom component to replace Algolia’s default SearchBox for my needs (likely the default will work for you). It’s just a dolled-up <input> that tells the InstantSearch API to query Algolia any time the user types in something. When that query completes, it updates the Hits component, which displays the results as a bunch of Hits.

And if you’re interested, that quirky state stuff is there to hide the results when we’re not actually searching anything — Algolia assumes we’re on a dedicated search page, not a nerdy homage to MacOS, so when the input box is empty, <Hits> displays all of the possible results. I happen to have a different design goal than the original intention, but I think it’s important to note here that the tool was flexible enough to let me do what I want with it. All of the Algolia-related stuff took about 15 minutes to implement, and most of that was just trying to correct my initial misunderstandings about the SearchBox component. It didn’t take me too much longer to finish the project because I had already done much of the work — for example, the Hit component is essentially the same thing as the component I use to display icons on the Desktop, so I just reused it. Thank goodness for composable architecture!

Here’s what it looks like now, with backward and forward buttons for good measure:

The interface now on my site

And when we search for something:

A screenshot of me searching for something on my site

And for comparison, our original inspiration:

The same screenshot as at the beginning of this article

Similar enough, no? It leaves me some room to grow for sure, but I like what we’ve done! It’s not too useful just searching for one letter, but my searchable objects are distinct enough that it narrows quickly:

Another screenshot of the search on my site but with fewer results because of a more specific query

And we’re done!

Takeaways

I learned several things from this experience:

  • Algolia’s marketing talks a lot about ecommerce. As it turns out, you can product-ize nearly anything, including my portfolio, and Algolia will help all the same.
  • I spent more time writing than developing for this article — the part where I actually integrated Algolia into my personal website only took a few minutes.
  • I probably ought to create more defined attributes on my data structures, because I could make them searchable. I probably ought to write another article about this…
  • Algolia isn’t just for big companies like I thought it was. It’s flexible enough to work at whatever scope the project calls for.
  • This whole thing was free! I didn’t even come close to the limit of the free tier, which is great because I’ve made a point of building this whole site entirely on free tiers!
  • This article was prompted by a collaboration with the folks from Algolia, but I went far beyond my mandate with the research and the idea generation here. So while I will definitely tell you that you should sign up for Algolia (insert a marketing pitch here), I will also say that you actually should sign up for Algolia and go build something with it. Ping me on Twitter when you do — I’m going to working on another big project or two with the folks at Algolia soon, so we can learn together!