Presearch – A better search engine for We The People

tldr; Presearch is building a fully decentralised search-engine where you, the user, are in control, and you, the user, get rewarded in PRE’s, it’s native token, for any value you add. Yes, even for simply searching with Presearch!

Get your own Presearch account & start earning: https://presearch.org/signup?rid=2462883*

* link with my referral code will give you 25 PRE’s signup bonus and, if you use Presearch for more than 30 days and earn more than 50 search-reward tokens, earns me 25 tokens commission. If this offends, use this direct link.

For some time now, I’ve been following a project named Presearch, which is building a decentralised search-engine ‘for the people’. Far from being yet another Google killer, their ‘token economy’, value driven approach to this mission is what really makes them stand out.

I’ve been impressed with their team, the stuff going on in the back-end and the open and engaged communication on their social channels. I’ve been running their search nodes, use presearch.org as my main search-engine and never looked back.

Better than reading this from me, because all of this is just like, my opinion, is to Do Your Own Research. A good starting point is the vision paper at presearch.io

And because an image says more than 1000 words and Presearch kindly allowed me to do so, you can view much more on 40two.tube’s new Presearch Channel (as well as at source, if you prefer ads).

Building a Presearch Node Dashboard with Graylog

Over the past year I ran a Graylog server collecting any type of data I could find, to get some hands-on with Threat Detection, dash-boarding, messaging and the correlation of data from multiple sources. Graylog proved stable and with some tweaking I found the sweet-spot between ElasticSearch eating all resources vs crashing because of a lack thereof.

It taught me a lot but at some point you stop finding new stuff. Time for some fresh data.

I am an avid Presearch node-operator, the node-API gives me fresh metrics on all nodes in 5 minute intervals and these metrics combine into rewards using documented tokenomics. Feels like a project.

About those Presearch nodes:

Do you support decentralization and an open internet that isn’t dominated by a handful of Big Tech companies? Now you can be part of the solution by operating a Presearch Node and helping to power the Presearch decentralized search engine. Presearch Nodes are used to process user search requests, and node operators earn Presearch PRE tokens for joining and supporting the network.

Source: presearch.io

Dashboarding Presearch turned out to be very useful. I learned what moments my nodes switch gateways, to what extend node latency or stake influences earnings and especially what nodes are just too darn expensive for their overall performance (looking at you, Azure).

some of the dashboards in action

It also helped me make informed decisions on how to spread my stake, what resources impact node performance and what all that translates to in Cold Hard $Tokens.

I will not share any of those findings here, because (a) Presearch nodes still have to go main-net and then everything will change, and (b) because Do Your Own (P)Research.

Pro-tip: anyone telling you how to make money on the Internet should be taken with a grain of salt. Most people that know how to make money on the Internet are doing just that, not writing blog posts.

No dashboards, no magic queries. Also, nothing on how to get your VPS, Docker engine and Graylog/ElasticSearch stack running. Or how to keep the latter from eating all your RAM again and again and again.

Instead, I describe how I am sending the log-messages and API statistics of my Presearch nodes to Graylog, saving you hours and hours of time (p)researching.

Do Your Own Prerequisites

You will need to get ready:

  1. A Graylog instance that is accessible from your nodes, preferably behind a proxy such as Traefik.
  2. A working Presearch Dashboard, Node Registration Code and Node API key.
  3. Some staked nodes.

Oh, and nothing in your setup will be exactly like mine. You might get an error here and there. Google Presearch it. Switch it off and on again. Get some fresh air. Blame ElasticSearch.

Your nodes are earning $PRE every 5 minutes, take as long as you need 😉

Docker Logs & API Metrics

One of the reasons for picking Graylog is that I can correlate log events with time-based metrics. Thus, of each node, I collect the node logs (aka the Docker logs of each node) and the node performance metrics (from the Node-API).

You will need to setup GrayLog Inputs. Log collection is done through a single GELF input on the Graylog server. I use Docker’s native support for logging in this Graylog Extended Log Format over UDP/TLS, to securely send the logs across the public internet. For the performance data off the Node API, we need a separate JSON API Input (and extractor) for each node.

You will also need to teach GrayLog to handle Scientific Notification, because that’s what some of the Presearch node API’s metrics are sent in.

First though, some administration.

A number of variables are used in the commands below; depending on your platform you can set them as environment variables or simply note them down in notepad to paste them when needed.

VariableDescription
$NODE-NAMEThe name you give to your node. To correlate logs with metrics, this should be the same as the node’s host name.
$NODE-REGISTRATION-CODEThe registration code you can find in your Presearch Node Dashboard.
$API-KEYThe API key found in the Presearch Node Dashboard.
$URL-ENCODED-PUBLIC-KEYThe URL-encoded Public Key of the node; see below.
$GRAYLOG-SERVER-IPThe Public IP-address of your central Graylog server.
your presearch node dashboard

Use your Presearch Node Dashboard (and below tool) to get the data you need. The Node Registration Code and Node API Key from your main dashboard, and the Public Key behind the “Stats” button next to your node.

URL encoded Node Public Key

To fetch data from the Presearch Node API for just one particular node, you need to include a URL-encoded version of your Node Public Key.

To get this from your browser:

  1. Open CyberChef Toolbox with the URL-Encode recipe: https://tools.nielsemmer.com/#recipe=URL_Encode(true)
  2. Make sure to check the option to encode all special characters
  3. Copy the Public Key for your node (found under “Stats” for that node in the Presearch Node Dashboard)
  4. Paste your node’s Public Key in the Input box
  5. The URL encoded key can now be copied from the Ouput box

This Output is what you will need as the $URL-ENCODED-PUBLIC-KEY variable.

You can test if you converted the key Public Key properly by visiting:

https://nodes.presearch.org/api/nodes/status/$API-KEY?stats=true&public_keys=$URL-ENCODED-PUBLIC-KEY

replacing $API-KEY and $URL-ENCODED-PUBLIC-KEY with the appropriate values, and you should receive a JSON response for just your one node.

Graylog server preparation

The Graylog server needs some preparation to properly parse node data. This is a one-time thing, additional nodes can skip this step.

First we need to define a special Grok pattern to be able to parse some of the values returned by the Presearch node API, and then we set-up a GELF input for all the nodes to send their log messages.

Setup Grok pattern for Scientific notification

The Node API returns some values in Scientific Notification (SCI), which Grok does not extract by default. We need to add a new Grok Pattern:

graylog menu > System / Input > Grok Patterns > Create pattern

FieldValue
NameNUMBER_SCI
Pattern[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?

This pattern is later referred to in the Grok extractor for each node.

Setup GELF input on the Graylog server

Node logs are easy to capture since Docker natively supports sending container logs to a GELF (Graylog Extended Log Format) receiver anywhere on the internet. Thus, we will setup an UDP Gelf input on our Graylog instance.

Disclaimer: Smart people secure this with TLS but since this is a lot of work and worst case someone RickRolls my dashboard, I am using an insecure UDP input on my instance. Your mileage may vary. Not Technical Advice.

Using a single Graylog input for all nodes means the message source (the host-name of the Docker host) will be how we know which node sent the update.

And since we want to later correlate the node’s logs and metrics, it is best to set the $node-name variable for the API input to the node’s host name, so it will be reported as source for that input.

To set-up the Gelf input, on your Graylog console:

graylog menu > System / Input > Inputs > Select Input (drop-down) > GELF UDP

FieldValue
Input typeGELP UDP
Bind address0.0.0.0
Port12201
Receive Buffer Size262144

With the Gelf UDP running, the server is ready to receive node data.

You can test the Gelf input from the Graylog host itself (assuming Graylog runs on Docker) by sending ‘hello world’ to your Graylog stream:

docker run --log-driver=gelf --log-opt gelf-address=tcp://localhost:12201 alpine echo hello world

For each node:

For the first node, but also every node after, you need to first restart your presearch-node container to connect its logs and then create an input to talk to the Node-API.

Thus, the below steps are to be followed for each individual node.

Start node with GELF logging over UDP

If the Presearch node is running, first stop and remove it. The keys will be left untouched so it will return with the same identity.

sudo docker stop presearch-node && sudo docker rm presearch-node

Then, spin-up a new Presearch node with logging directed at the Graylog server:

sudo docker run -dt --name presearch-node --restart=unless-stopped -v presearch-node-storage:/app/node -e REGISTRATION_CODE="$NODE-REGISTRATION-CODE" --log-driver gelf --log-opt gelf-address=udp://$GRAYLOG-SERVER-IP:12201 presearch/node

If all works as intended, the log messages for the new node should arrive in your Graylog event stream.

log messages shown in your default stream

Take note of the SOURCE listed, as this should be used as $NODE-NAME moving forward to collate the logs with the appropriate node statistics.

Add a Node-API input for the node

On your Graylog console:

Graylog server > System / Inputs > Inputs > Select input (drop-down) > JSON path from HTTP API

Then use the values below, replacing the $NODE-NAME and $URL-ENCODED-PUBLIC-KEY variables with the ones generated for this specific node:

FieldValue
Input typeJSON path from HTTP API
Title$NODE-NAME
URL of JSON resourcehttps://nodes.presearch.org/api/nodes/status/$APIKEY/?stats=true&public_keys=$URL-ENCODED-PUBLIC-KEY
Interval5
Interval time unitMinutes
JSON path of data to extract$.nodes.*.period
Message source$NODE-NAME

Save and start the input. It should immediately start receiving messages. Once the first message is in, an extractor can be built to get the metrics needed.

Add extractor from input

Graylog server > System / Inputs > Inputs > Manage extractors (next to newly created input)

In the Extractor screen for the input, under Add extractor, click Get started and then Load Message. If loading the message fails, the input is not running properly.

If you do see a message, click Select extractor type next to the result field, and pick Grok pattern.

FieldValue
Extractor typeGrok pattern
Source fieldresult
Grok patternuptime_percentage=%{BASE10NUM:uptime_pct;float}, avg_uptime_score=%{BASE10NUM:uptime_score;float}, avg_latency_ms=%{BASE10NUM:latency_ms;float}, avg_latency_score=%{BASE10NUM:latency_score;float}, total_requests=%{BASE10NUM:total_requests;float}, avg_success_rate=%{BASE10NUM:avg_success_rate;float}, avg_success_rate_score=%{BASE10NUM:avg_success_rate_score;float}, avg_reliability_score=%{BASE10NUM:avg_reliability_score;double}, avg_staked_capacity_percent=%{NUMBER_SCI:avg_staked_capacity_pct;float}, avg_utilization_percent=%{NUMBER_SCI:avg_utilization_pct;float}, total_pre_earned=%{BASE10NUM:total_pre_earned;float}
ConditionAlways try to extract
Extraction strategyCopy
Extractor title$NODE-NAME

If this is a node that has been up for more than a few hours, clicking test will parse all the needed fields.

If testing fails, this is because the node is new or not functioning properly, and one of the values is presented as NULL which confuses Grok.

Save the extractor.

Log messages should now be coming in from the API; clicking an entry expands it to see all the variables extracted for your dashboarding pleasure.

Rounding up

Watch the main Graylog stream to see the data come in from the nodes.

If all works, you can start building your dashboards…