~/blog/"Why use just Google, when you can combine the power of multiple search engines?"

2023-02-03


There is this idea expressed in the book The Wisdom of Crowds, which states that aggregating the information from a large group tends to yield better decisions than the ones taken by a single member of the group (as long as the group is diverse enough and each member is independent). There are some discussions regarding the conditions in which this holds, but there’s still some nugget of truth in there: looking through a problem from a diverse set of perspectives is beneficial.

Which leads leads me to: search engines.

Depending on the source, you’ll get slightly different values for the current market share of Google, but the general consensus is that it seems to be around 90%. This is concerning (to say the least), since the world is trusting a single member to dictate what’s relevant and/or true. Yes, Google aggregates the results from different sources, but the company is still responsible for designing the algorithm that decides which sources are displayed at the top (or displayed at all).

Even if you don’t really care about the possible consequences of concentrating all this power in a single company, you can still look at this issue from a pure self-interest perspective (which turns it into an opportunity). In the current competitive information age, having access to more varied sources of information than your peers can be an advantage. You may not notice it at first, but every so often you’ll find a website that was not as good at playing the google SEO game, but happens to have the information or the solution to the problem that you were looking for. Exposing yourself to more variety will give you an edge.

My point is not “don’t use Google”, but rather “why use only Google?”. The fact is, Google is amazing at what they do, but there’s nothing stopping you from enhancing your search results with more sources, in other words: to create your own wise crowd.

Which leads me to: metasearch engines.

A metasearch engine is basically a search engine of search engines. It’s a piece of software that sends your query to multiple search engines, aggregates the results, and presents them to you.

In this post, I’ll show you how you can run your own metasearch engine for free. More specifically, SearXNG.

SearXNG is a fork of Searx, with a few improvements which makes it (in my opinion) easier to set-up.

Setting up SearXNG

The first step is to install docker. Make sure that you follow the post-installation steps as well, so that you don’t have to run docker with sudo. After you’re done, create a directory where you’re going to store the startup scripts (or cd into the one you already use for your personal scripts):

cd ~/
mkdir scripts
cd scripts

Inside your scripts directory, create a new script called start_searxng, open it, and paste the following:

#!/bin/sh

 mkdir -p $HOME/searxng

 docker pull searxng/searxng
 docker run -d --rm \
              -d -p 8080:8080 \
              -v "${HOME}/searxng:/etc/searxng" \
              -e "BASE_URL=http://localhost:8080/" \
              searxng/searxng

Don’t forget to make the script executable

chmod +x start_searxng

And run it:

./start_searxng

If you now open your browser and go to localhost:8080, you should be able to see the SearXNG page.

It’s that easy. You’re running SearXNG now!

You can customize your preferences, and pick which search engines you’d like to use.

Be aware that your queries will be slower the more engines you add (specially if they are slow and/or unreliable). So, you’ll have to balance between diversity and speed.

You can also see that you have various search categories, including “science” (i.e. scientific papers) and “files” (i.e. torrents), which you can customize as well.

Setting it up in the search bar (optional)

Another thing you may want to configure, is to have your browser use SearXNG when you type a search in the address bar, instead of Google. This will depend on your browser.

Chrome-based browsers

For chrome-based browsers (which include Brave), you want to do the following:

  1. Go to the settings: chromium://settings/searchEngines (replace chromium with brave, chrome, etc).
  2. Go into the “Search engines” section.
  3. Add a new search engine.
  4. Give it a name.
  5. In the “URL with %s in place of query” field, fill it out with the following: http://localhost:8080/search?q=%s.
  6. Set it as the default search engine.

Firefox-based browsers

For Firefox-based browsers (which include LibreWolf), the procedure is the following:

  1. Go into the settings: about:preferences#search
  2. Select the “Add search bar in toolbar” option (i.e. select the option to split the search bar from the address bar).
  3. Open localhost:8080
  4. On the search bar, the magnifying glass icon should have a green “+” symbol.
  5. Click on the icon.
  6. You should see an SearXNG icon with the same “+” green icon.
  7. Click on it. SearXNG is now on your search engines.
  8. Go back to about:preferences#search, and set it as default.

Final questions you might have

What if I want to use this on the go, while on my phone?

This is where the “for free” part of running your own metasearch engine stops. If you want to do this, you will usually have to buy a domain name and pay for hosting. If there’s interest, I might write a post in the future with a step-by-step guide. But for now, you should at least be able to play around with your new metasearch engine, and see if this is something that you’re willing to pay a few dollars per month to have available everywhere you go.

If you’re already familiar with how to host your own services, you should be able to easily deploy the docker containers yourself. SearXNG provides a docker-compose yaml file that will make your life much easier.

Are there any kind people running SearXNG instances that I could use for free?

There certainly are, but to be honest, I wouldn’t use them. You have no idea what a random internet stranger is running, they could be logging your search history. You can check what kind of code you’re running on your machine, but you can’t check theirs.

As with most computer-related things, I’d rather have more control. But you’re free to decide what’s best for you.