Building a Presearch Node Dashboard with Graylog

Over the past year I ran a Graylog server collecting any type of data I could find, to get some hands-on with Threat Detection, dash-boarding, messaging and the correlation of data from multiple sources. Graylog proved stable and with some tweaking I found the sweet-spot between ElasticSearch eating all resources vs crashing because of a lack thereof.

It taught me a lot but at some point you stop finding new stuff. Time for some fresh data.

I am an avid Presearch node-operator, the node-API gives me fresh metrics on all nodes in 5 minute intervals and these metrics combine into rewards using documented tokenomics. Feels like a project.

About those Presearch nodes:

Do you support decentralization and an open internet that isn’t dominated by a handful of Big Tech companies? Now you can be part of the solution by operating a Presearch Node and helping to power the Presearch decentralized search engine. Presearch Nodes are used to process user search requests, and node operators earn Presearch PRE tokens for joining and supporting the network.

Source: presearch.io

Dashboarding Presearch turned out to be very useful. I learned what moments my nodes switch gateways, to what extend node latency or stake influences earnings and especially what nodes are just too darn expensive for their overall performance (looking at you, Azure).

some of the dashboards in action

It also helped me make informed decisions on how to spread my stake, what resources impact node performance and what all that translates to in Cold Hard $Tokens.

I will not share any of those findings here, because (a) Presearch nodes still have to go main-net and then everything will change, and (b) because Do Your Own (P)Research.

Pro-tip: anyone telling you how to make money on the Internet should be taken with a grain of salt. Most people that know how to make money on the Internet are doing just that, not writing blog posts.

No dashboards, no magic queries. Also, nothing on how to get your VPS, Docker engine and Graylog/ElasticSearch stack running. Or how to keep the latter from eating all your RAM again and again and again.

Instead, I describe how I am sending the log-messages and API statistics of my Presearch nodes to Graylog, saving you hours and hours of time (p)researching.

Do Your Own Prerequisites

You will need to get ready:

  1. A Graylog instance that is accessible from your nodes, preferably behind a proxy such as Traefik.
  2. A working Presearch Dashboard, Node Registration Code and Node API key.
  3. Some staked nodes.

Oh, and nothing in your setup will be exactly like mine. You might get an error here and there. Google Presearch it. Switch it off and on again. Get some fresh air. Blame ElasticSearch.

Your nodes are earning $PRE every 5 minutes, take as long as you need 😉

Docker Logs & API Metrics

One of the reasons for picking Graylog is that I can correlate log events with time-based metrics. Thus, of each node, I collect the node logs (aka the Docker logs of each node) and the node performance metrics (from the Node-API).

You will need to setup GrayLog Inputs. Log collection is done through a single GELF input on the Graylog server. I use Docker’s native support for logging in this Graylog Extended Log Format over UDP/TLS, to securely send the logs across the public internet. For the performance data off the Node API, we need a separate JSON API Input (and extractor) for each node.

You will also need to teach GrayLog to handle Scientific Notification, because that’s what some of the Presearch node API’s metrics are sent in.

First though, some administration.

A number of variables are used in the commands below; depending on your platform you can set them as environment variables or simply note them down in notepad to paste them when needed.

VariableDescription
$NODE-NAMEThe name you give to your node. To correlate logs with metrics, this should be the same as the node’s host name.
$NODE-REGISTRATION-CODEThe registration code you can find in your Presearch Node Dashboard.
$API-KEYThe API key found in the Presearch Node Dashboard.
$URL-ENCODED-PUBLIC-KEYThe URL-encoded Public Key of the node; see below.
$GRAYLOG-SERVER-IPThe Public IP-address of your central Graylog server.
your presearch node dashboard

Use your Presearch Node Dashboard (and below tool) to get the data you need. The Node Registration Code and Node API Key from your main dashboard, and the Public Key behind the “Stats” button next to your node.

URL encoded Node Public Key

To fetch data from the Presearch Node API for just one particular node, you need to include a URL-encoded version of your Node Public Key.

To get this from your browser:

  1. Open CyberChef Toolbox with the URL-Encode recipe: https://tools.nielsemmer.com/#recipe=URL_Encode(true)
  2. Make sure to check the option to encode all special characters
  3. Copy the Public Key for your node (found under “Stats” for that node in the Presearch Node Dashboard)
  4. Paste your node’s Public Key in the Input box
  5. The URL encoded key can now be copied from the Ouput box

This Output is what you will need as the $URL-ENCODED-PUBLIC-KEY variable.

You can test if you converted the key Public Key properly by visiting:

https://nodes.presearch.org/api/nodes/status/$API-KEY?stats=true&public_keys=$URL-ENCODED-PUBLIC-KEY

replacing $API-KEY and $URL-ENCODED-PUBLIC-KEY with the appropriate values, and you should receive a JSON response for just your one node.

Graylog server preparation

The Graylog server needs some preparation to properly parse node data. This is a one-time thing, additional nodes can skip this step.

First we need to define a special Grok pattern to be able to parse some of the values returned by the Presearch node API, and then we set-up a GELF input for all the nodes to send their log messages.

Setup Grok pattern for Scientific notification

The Node API returns some values in Scientific Notification (SCI), which Grok does not extract by default. We need to add a new Grok Pattern:

graylog menu > System / Input > Grok Patterns > Create pattern

FieldValue
NameNUMBER_SCI
Pattern[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?

This pattern is later referred to in the Grok extractor for each node.

Setup GELF input on the Graylog server

Node logs are easy to capture since Docker natively supports sending container logs to a GELF (Graylog Extended Log Format) receiver anywhere on the internet. Thus, we will setup an UDP Gelf input on our Graylog instance.

Disclaimer: Smart people secure this with TLS but since this is a lot of work and worst case someone RickRolls my dashboard, I am using an insecure UDP input on my instance. Your mileage may vary. Not Technical Advice.

Using a single Graylog input for all nodes means the message source (the host-name of the Docker host) will be how we know which node sent the update.

And since we want to later correlate the node’s logs and metrics, it is best to set the $node-name variable for the API input to the node’s host name, so it will be reported as source for that input.

To set-up the Gelf input, on your Graylog console:

graylog menu > System / Input > Inputs > Select Input (drop-down) > GELF UDP

FieldValue
Input typeGELP UDP
Bind address0.0.0.0
Port12201
Receive Buffer Size262144

With the Gelf UDP running, the server is ready to receive node data.

You can test the Gelf input from the Graylog host itself (assuming Graylog runs on Docker) by sending ‘hello world’ to your Graylog stream:

docker run --log-driver=gelf --log-opt gelf-address=tcp://localhost:12201 alpine echo hello world

For each node:

For the first node, but also every node after, you need to first restart your presearch-node container to connect its logs and then create an input to talk to the Node-API.

Thus, the below steps are to be followed for each individual node.

Start node with GELF logging over UDP

If the Presearch node is running, first stop and remove it. The keys will be left untouched so it will return with the same identity.

sudo docker stop presearch-node && sudo docker rm presearch-node

Then, spin-up a new Presearch node with logging directed at the Graylog server:

sudo docker run -dt --name presearch-node --restart=unless-stopped -v presearch-node-storage:/app/node -e REGISTRATION_CODE="$NODE-REGISTRATION-CODE" --log-driver gelf --log-opt gelf-address=udp://$GRAYLOG-SERVER-IP:12201 presearch/node

If all works as intended, the log messages for the new node should arrive in your Graylog event stream.

log messages shown in your default stream

Take note of the SOURCE listed, as this should be used as $NODE-NAME moving forward to collate the logs with the appropriate node statistics.

Add a Node-API input for the node

On your Graylog console:

Graylog server > System / Inputs > Inputs > Select input (drop-down) > JSON path from HTTP API

Then use the values below, replacing the $NODE-NAME and $URL-ENCODED-PUBLIC-KEY variables with the ones generated for this specific node:

FieldValue
Input typeJSON path from HTTP API
Title$NODE-NAME
URL of JSON resourcehttps://nodes.presearch.org/api/nodes/status/$APIKEY/?stats=true&public_keys=$URL-ENCODED-PUBLIC-KEY
Interval5
Interval time unitMinutes
JSON path of data to extract$.nodes.*.period
Message source$NODE-NAME

Save and start the input. It should immediately start receiving messages. Once the first message is in, an extractor can be built to get the metrics needed.

Add extractor from input

Graylog server > System / Inputs > Inputs > Manage extractors (next to newly created input)

In the Extractor screen for the input, under Add extractor, click Get started and then Load Message. If loading the message fails, the input is not running properly.

If you do see a message, click Select extractor type next to the result field, and pick Grok pattern.

FieldValue
Extractor typeGrok pattern
Source fieldresult
Grok patternuptime_percentage=%{BASE10NUM:uptime_pct;float}, avg_uptime_score=%{BASE10NUM:uptime_score;float}, avg_latency_ms=%{BASE10NUM:latency_ms;float}, avg_latency_score=%{BASE10NUM:latency_score;float}, total_requests=%{BASE10NUM:total_requests;float}, avg_success_rate=%{BASE10NUM:avg_success_rate;float}, avg_success_rate_score=%{BASE10NUM:avg_success_rate_score;float}, avg_reliability_score=%{BASE10NUM:avg_reliability_score;double}, avg_staked_capacity_percent=%{NUMBER_SCI:avg_staked_capacity_pct;float}, avg_utilization_percent=%{NUMBER_SCI:avg_utilization_pct;float}, total_pre_earned=%{BASE10NUM:total_pre_earned;float}
ConditionAlways try to extract
Extraction strategyCopy
Extractor title$NODE-NAME

If this is a node that has been up for more than a few hours, clicking test will parse all the needed fields.

If testing fails, this is because the node is new or not functioning properly, and one of the values is presented as NULL which confuses Grok.

Save the extractor.

Log messages should now be coming in from the API; clicking an entry expands it to see all the variables extracted for your dashboarding pleasure.

Rounding up

Watch the main Graylog stream to see the data come in from the nodes.

If all works, you can start building your dashboards…

TiZu Tech – Everything Tech

Self-hosting is fun and easy to start using no more than a Raspberry Pi or old desktop PC. To get anything serious done though, you’ll need a VPS with ample bandwidth, one of more fixed IP addresses and some scalability.

There’s lots of cheap offers on lowendbox, some of them excellent value for money (I host many of my Presearch nodes on RackNerd) but for my main server I needed to find something that could do it all and still be affordable. Somewhere on my side of the planet, in my timezone and giving me full control over the whole chain from firewall to chosen distro.

I found all this at TiZu. Run by a good friend, their attention to detail, both in security and quality of hardware, has kept downtime to an absolute minimum. Not that nielsemmer.com ever gets any traffic, but it also runs my Graylog/ElasticSearch cluster and 40two.tube, both of which do.

Countless DDOS attacks, script-kiddies (with proxies) across the globe, my services weather the Fediverse currents with ease.

He’s now started documenting his best practices for the rest of us at tizutech.com. One for your bookmarks, if you’re into self-hosting.

Snapdrop – cross platform AirDrop in your browser

Another one of those super light super anonymous super useful tools that quickly went from let’s-try to use-it-every-day.

SnapDrop solves my problem of quickly getting stuff from device to device without cables, USB or cloud drives. It does so from my browser, across platforms.

Where FilePizza lets you send files across the internet, SnapDrop works when you’re standing right next to each-other. Or at least are in the same building.

Receiving a file from my phone

Sending files with SnapDrop couldn’t be easier. On your laptop, pc or phone, open snapdrop.40two.site from any recent browser. The bottom of your screen will now show what random (geeky) nickname your device has been given.

Now, on the other device, also open snapdrop.40two.site. That’s it.

Click on one of the devices in your screen to send one or more files to it. Right-click (or long-touch) to send a message.

That really is all there is to it. No install needed.  

The official site for this service should be snapdrop.net. It seems to be down (as of Jan’23) though, which is why I am running my own intance. Bookmark: snapdrop.40two.site

Repo and info on self-hosting this at: https://github.com/RobinLinus/snapdrop

Matrix – Secure, decentralised and open

Matrix describes itself as an open standard for interoperablede-centralisedreal-time communication over IP.

Yet Another Chat Tool? Some new User-less Network Doomed to Fail? Not quite.

Federation

Matrix is federated. Which means is that – just like with email – you can setup a Matrix identity on any homeserver and then communicate with any other Matrix account on any other homeserver.

You do not need a phone number or email address to register, just a random username and a good strong password is enough.

End-to-end encryption

All direct – one on one – chats are encrypted end-to-end, and chat rooms can be set so, and this includes meta data, so not even the administrator can access your data*.

* note that the IP address and time of your last connection is stored for 28 days in the database, so bad people accessing the database can get that particular bit of info.

I’m @40two:matrix.org

See you on the Matrix!