Over the past year I ran a Graylog server collecting any type of data I could find, to get some hands-on with Threat Detection, dash-boarding, messaging and the correlation of data from multiple sources. Graylog proved stable and with some tweaking I found the sweet-spot between ElasticSearch eating all resources vs crashing because of a lack thereof.
It taught me a lot but at some point you stop finding new stuff. Time for some fresh data.
I am an avid Presearch node-operator, the node-API gives me fresh metrics on all nodes in 5 minute intervals and these metrics combine into rewards using documented tokenomics. Feels like a project.
About those Presearch nodes:
Do you support decentralization and an open internet that isn’t dominated by a handful of Big Tech companies? Now you can be part of the solution by operating a Presearch Node and helping to power the Presearch decentralized search engine. Presearch Nodes are used to process user search requests, and node operators earn Presearch PRE tokens for joining and supporting the network.Source: presearch.io
Dashboarding Presearch turned out to be very useful. I learned what moments my nodes switch gateways, to what extend node latency or stake influences earnings and especially what nodes are just too darn expensive for their overall performance (looking at you, Azure).
It also helped me make informed decisions on how to spread my stake, what resources impact node performance and what all that translates to in Cold Hard $Tokens.
I will not share any of those findings here, because (a) Presearch nodes still have to go main-net and then everything will change, and (b) because Do Your Own (P)Research.
Pro-tip: anyone telling you how to make money on the Internet should be taken with a grain of salt. Most people that know how to make money on the Internet are doing just that, not writing blog posts.
No dashboards, no magic queries. Also, nothing on how to get your VPS, Docker engine and Graylog/ElasticSearch stack running. Or how to keep the latter from eating all your RAM again and again and again.
Instead, I describe how I am sending the log-messages and API statistics of my Presearch nodes to Graylog, saving you hours and hours of time (p)researching.
Do Your Own Prerequisites
You will need to get ready:
- A Graylog instance that is accessible from your nodes, preferably behind a proxy such as Traefik.
- A working Presearch Dashboard,
Node Registration Codeand
Node API key.
- Some staked nodes.
Oh, and nothing in your setup will be exactly like mine. You might get an error here and there.
Your nodes are earning $PRE every 5 minutes, take as long as you need 😉
Docker Logs & API Metrics
One of the reasons for picking Graylog is that I can correlate log events with time-based metrics. Thus, of each node, I collect the node logs (aka the Docker logs of each node) and the node performance metrics (from the Node-API).
You will need to setup GrayLog Inputs. Log collection is done through a single GELF input on the Graylog server. I use Docker’s native support for logging in this Graylog Extended Log Format over UDP/TLS, to securely send the logs across the public internet. For the performance data off the Node API, we need a separate JSON API Input (and extractor) for each node.
You will also need to teach GrayLog to handle Scientific Notification, because that’s what some of the Presearch node API’s metrics are sent in.
First though, some administration.
A number of variables are used in the commands below; depending on your platform you can set them as environment variables or simply note them down in notepad to paste them when needed.
|$NODE-NAME||The name you give to your node. To correlate logs with metrics, this should be the same as the node’s host name.|
|$NODE-REGISTRATION-CODE||The registration code you can find in your Presearch Node Dashboard.|
|$API-KEY||The API key found in the Presearch Node Dashboard.|
|$URL-ENCODED-PUBLIC-KEY||The URL-encoded Public Key of the node; see below.|
|$GRAYLOG-SERVER-IP||The Public IP-address of your central Graylog server.|
Use your Presearch Node Dashboard (and below tool) to get the data you need. The Node Registration Code and Node API Key from your main dashboard, and the Public Key behind the “Stats” button next to your node.
URL encoded Node Public Key
To fetch data from the Presearch Node API for just one particular node, you need to include a URL-encoded version of your Node Public Key.
To get this from your browser:
- Open CyberChef Toolbox with the URL-Encode recipe: https://tools.nielsemmer.com/#recipe=URL_Encode(true)
- Make sure to check the option to encode all special characters
- Copy the Public Key for your node (found under “Stats” for that node in the Presearch Node Dashboard)
- Paste your node’s Public Key in the Input box
- The URL encoded key can now be copied from the Ouput box
This Output is what you will need as the
You can test if you converted the key Public Key properly by visiting:
$URL-ENCODED-PUBLIC-KEY with the appropriate values, and you should receive a JSON response for just your one node.
Graylog server preparation
The Graylog server needs some preparation to properly parse node data. This is a one-time thing, additional nodes can skip this step.
First we need to define a special Grok pattern to be able to parse some of the values returned by the Presearch node API, and then we set-up a GELF input for all the nodes to send their log messages.
Setup Grok pattern for Scientific notification
The Node API returns some values in Scientific Notification (SCI), which Grok does not extract by default. We need to add a new Grok Pattern:
graylog menu >
System / Input >
Grok Patterns >
This pattern is later referred to in the Grok extractor for each node.
Setup GELF input on the Graylog server
Node logs are easy to capture since Docker natively supports sending container logs to a GELF (Graylog Extended Log Format) receiver anywhere on the internet. Thus, we will setup an UDP Gelf input on our Graylog instance.
Disclaimer: Smart people secure this with TLS but since this is a lot of work and worst case someone RickRolls my dashboard, I am using an insecure UDP input on my instance. Your mileage may vary. Not Technical Advice.
Using a single Graylog input for all nodes means the message
source (the host-name of the Docker host) will be how we know which node sent the update.
And since we want to later correlate the node’s logs and metrics, it is best to set the
$node-name variable for the API input to the node’s host name, so it will be reported as
source for that input.
To set-up the Gelf input, on your Graylog console:
graylog menu >
System / Input >
Select Input (drop-down) >
|Receive Buffer Size|
With the Gelf UDP running, the server is ready to receive node data.
You can test the Gelf input from the Graylog host itself (assuming Graylog runs on Docker) by sending ‘hello world’ to your Graylog stream:
docker run --log-driver=gelf --log-opt gelf-address=tcp://localhost:12201 alpine echo hello world
For each node:
For the first node, but also every node after, you need to first restart your presearch-node container to connect its logs and then create an input to talk to the Node-API.
Thus, the below steps are to be followed for each individual node.
Start node with GELF logging over UDP
If the Presearch node is running, first stop and remove it. The keys will be left untouched so it will return with the same identity.
sudo docker stop presearch-node && sudo docker rm presearch-node
Then, spin-up a new Presearch node with logging directed at the Graylog server:
sudo docker run -dt --name presearch-node --restart=unless-stopped -v presearch-node-storage:/app/node -e REGISTRATION_CODE="$NODE-REGISTRATION-CODE" --log-driver gelf --log-opt gelf-address=udp://$GRAYLOG-SERVER-IP:12201 presearch/node
If all works as intended, the log messages for the new node should arrive in your Graylog event stream.
Take note of the SOURCE listed, as this should be used as $NODE-NAME moving forward to collate the logs with the appropriate node statistics.
Add a Node-API input for the node
On your Graylog console:
Graylog server >
System / Inputs >
Select input (drop-down) >
JSON path from HTTP API
Then use the values below, replacing the
$URL-ENCODED-PUBLIC-KEY variables with the ones generated for this specific node:
|URL of JSON resource|
|Interval time unit|
|JSON path of data to extract|
Save and start the input. It should immediately start receiving messages. Once the first message is in, an extractor can be built to get the metrics needed.
Add extractor from input
Graylog server >
System / Inputs >
Manage extractors (next to newly created input)
In the Extractor screen for the input, under Add extractor, click
Get started and then
Load Message. If loading the message fails, the input is not running properly.
If you do see a message, click
Select extractor type next to the
result field, and pick
If this is a node that has been up for more than a few hours, clicking
test will parse all the needed fields.
If testing fails, this is because the node is new or not functioning properly, and one of the values is presented as NULL which confuses Grok.
Save the extractor.
Log messages should now be coming in from the API; clicking an entry expands it to see all the variables extracted for your dashboarding pleasure.
Watch the main Graylog stream to see the data come in from the nodes.
If all works, you can start building your dashboards…