HAProxy Log Parsing & Fluent-bit
I am finally doing something with the Logs from my Anycast Reverse Proxy based on HAProxy. This epic started a few days ago and has led to other disoveries but in this we will cover just the HAProxy and Fluent…bits.
HAProxy Log Format
After looking at a custom log pattern and all sorts of other permutations, I have ended up simply using the HAProxy option httplog
configuratoin. That spits out logs that look like this:
Mar 6 21:16:10 hap01.mgmt.etse.me haproxy[1222631]: 10.192.54.72:50501 [06/Mar/2025:21:16:10.686] https-in~ drop_pool/drop-1 0/0/1/45/46 207 887 - - ---- 3/3/0/0/0 0/0 "PROPFIND /remote.php/dav/files/jacko/ HTTP/1.1"
Great article HERE with more detail on HAProxy Logging config. Another good one HERE.
I am using then using the HAProxy configuration option log fluent01.sdn.etse.me:514 local0 info
to ship these logs to Fluent-Bit which is the equivalent of logstash in the ELK stack, usually paired with OpenSearch.
Fluent Bit
Once you’ve followed the installation guide to get Fluent Bit up and running. I followed the instructions for Debian 12 to get it going.
Basic Configuration
To get up and running, you need can simply modify /etc/fluent-bit/fluent-bit.conf
to comment out the default INPUT for cpu and update the default OUTPUT to use stdout (echo to the screen) and match * so it will output all logs received to stdout. Also add an @INCLUDE as below.
#[INPUT]
# name cpu
# tag cpu.local
# Read interval (sec) Default: 1
# interval_sec 1
[OUTPUT]
name stdout
#format json
match *
@INCLUDE haproxy.conf
Now we can create our /etc/fluent-bit/haproxy.conf
in which we will tell Fluent Bit to listen for syslog on port 514:
[INPUT]
Name syslog
Parser haproxy
Listen 0.0.0.0
Port 514
Mode udp
Aaaand edit /etc/fluent-bit/paresers.conf
to include the following stanza at the end of the file:
[PARSER]
Name haproxy
Format regex
Regex (?<day_time>\w+ \d+ \S+) (?<ps>\S+)\[(?<pid>\d+)\]: (?<c_ip>\S+):(?<c_port>\d+) \[(?<time>.+)\] (?<f_end>[\w-]+)(\~)?.(?<b_end>[\w-]*)\/(?<b_server><?[\w-]+>?) (?<tq>-?\d+)\/(?<tw>-?\d+)\/(?<tc>-?\d+)\/(?<tr>-?\d+)\/(?<tt>-?\d+) (?<status_code>\d+) (?<bytes>\d+) (?<req_cookie>\S+) (?<res_cookie>\S+) (?<t_state>[\w-]+) (?<actconn>-?[0-9])\/(?<feconn>-?\d+)\/(?<beconn>\d+)\/(?<srv_conn>\d+)\/(?<retries>\d+) (?<srv_queue>\d+)\/(?<backend_queue>\d+) (\"<?[\w]+>?\")*(\"((?<method>[\w]+) (?<path>[^#?\s]+)(\?)?(?<query>[^#?\s+]*).(?<version>[\w+\/\.]+))\")*
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S.%L
Time_Keep On
Time_Strict Off
Regular Expressions
The secret sauce here is the Regex
line which is used to parse the log lines into the constituent fields. I couldn’t find a pre-canned Regex for httplog that worked for me. Maybe I was being dim. So I pulled this together from a few sources and it works well so far.
The pattern was developed using https://regex101.com using the PCRE (PHP < 7.3) parser and validated using the following patterns:
Mar 6 13:33:45 haproxy[1222631]: 192.168.254.103:51340 [06/Mar/2025:13:33:45.184] https-in~ gitea_pool/gitea-1 0/0/1/15/16 200 132 - - ---- 2/2/0/0/0 0/0 "POST /api/actions/runner.v1.RunnerService/FetchTask HTTP/1.1"
Mar 6 13:33:45 haproxy[1222631]: 192.168.254.103:51340 [06/Mar/2025:13:33:45.184] https-in~ gitea_pool/gitea-1 0/0/1/15/16 200 132 - - ---- 2/2/0/0/0 0/0 "POST /api/actions/runner.v1.RunnerService/FetchTask?john=doe&doe=did HTTP/1.1"
Mar 6 13:33:45 haproxy[1222631]: 192.168.254.103:51340 [06/Mar/2025:13:33:45.184] https-in~ gitea_pool/gitea-1 0/0/1/15/16 200 132 - - ---- 2/2/0/0/0 0/0 "POST / HTTP/1.1"
Mar 6 13:33:45 haproxy[1222631]: 192.168.254.103:51340 [06/Mar/2025:13:33:45.184] https-in~ gitea_pool/gitea-1 0/0/1/15/16 200 132 - - ---- 2/2/0/0/0 0/0 "POST /index.html HTTP/1.1"
Mar 7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https-in/<NOSRV> -1/-1/-1/-1/0 400 0 - - PR-- 4/4/0/0/0 0/0 "<BADREQ>"
Mar 7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https_in/gitea-1 1/1/1/1/0 400 0 - - P-- 4/4/0/0/0 0/0 "POST / HTTP1.1"
Mar 7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https-in/<NOSRV> 1/1/1/1/0 400 0 - - P-- 4/4/0/0/0 0/0 "POST / HTTP1.1"
Mar 7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https-in/<NOSRV> 1/1/1/1/0 400 0 - - P-- 4/4/0/0/0 0/0 "<BADREQ>"
Mar 7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https-in/<NOSRV> -1/1/1/1/0 400 0 - - P-- 4/4/0/0/0 0/0 "<BADREQ>"
Mar 8 08:46:17 haproxy[1348846]: 43.157.95.239:43770 [08/Mar/2025:08:46:17.808] http-in http-in/<NOSRV> 0/-1/-1/-1/0 301 88 - - LR-- 5/1/0/0/0 0/0 "GET / HTTP/1.1"
Time Formatting
The other bit of secret sauce is the Time_Format %d/%b/%Y:%H:%M:%S.%L
line which processes the [06/Mar/2025:21:16:10.686]
field to a timestamp. I used the reference https://www.programiz.com/python-programming/datetime/strptime to help. It took an embarresingly long time to work out. Also the %L
field I couldn’t find documented but I believe it to represent milliseconds.
Testing it out
The best way I found to debug Fluent Bit was to set the log_level debug
in the /etc/fluent-bit/fluent-bit.conf
and then to run fluent bit interactively using /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf
:
root@fluent01:/etc/fluent-bit# /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf
Fluent Bit v3.2.8
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ _____ _____
| ___| | | | | ___ (_) | |____ |/ __ \
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __ / /`' / /'
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / \ \ / /
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /.___/ /./ /___
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ \____(_)_____/
[2025/03/08 14:12:37] [ info] Configuration:
[2025/03/08 14:12:37] [ info] flush time | 1.000000 seconds
[2025/03/08 14:12:37] [ info] grace | 5 seconds
[2025/03/08 14:12:37] [ info] daemon | 0
[2025/03/08 14:12:37] [ info] ___________
[2025/03/08 14:12:37] [ info] inputs:
[2025/03/08 14:12:37] [ info] syslog
[2025/03/08 14:12:37] [ info] ___________
[2025/03/08 14:12:37] [ info] filters:
[2025/03/08 14:12:37] [ info] ___________
[2025/03/08 14:12:37] [ info] outputs:
[2025/03/08 14:12:37] [ info] stdout.0
[2025/03/08 14:12:37] [ info] file.1
[2025/03/08 14:12:37] [ info] ___________
[2025/03/08 14:12:37] [ info] collectors:
[2025/03/08 14:12:37] [ info] [fluent bit] version=3.2.8, commit=, pid=28311
[2025/03/08 14:12:37] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2025/03/08 14:12:37] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/03/08 14:12:37] [ info] [simd ] disabled
[2025/03/08 14:12:37] [ info] [cmetrics] version=0.9.9
[2025/03/08 14:12:37] [ info] [ctraces ] version=0.6.0
[2025/03/08 14:12:37] [ info] [input:syslog:syslog.0] initializing
[2025/03/08 14:12:37] [ info] [input:syslog:syslog.0] storage_strategy='memory' (memory only)
[2025/03/08 14:12:37] [debug] [syslog:syslog.0] created event channels: read=25 write=26
[2025/03/08 14:12:37] [debug] [downstream] listening on 0.0.0.0:514
[2025/03/08 14:12:37] [ info] [in_syslog] UDP server binding 0.0.0.0:514
[2025/03/08 14:12:37] [debug] [stdout:stdout.0] created event channels: read=28 write=29
[2025/03/08 14:12:37] [debug] [file:file.1] created event channels: read=37 write=38
[2025/03/08 14:12:37] [ info] [output:stdout:stdout.0] worker #0 started
[2025/03/08 14:12:37] [debug] [router] match rule syslog.0:stdout.0
[2025/03/08 14:12:37] [ info] [output:file:file.1] worker #0 started
[2025/03/08 14:12:37] [debug] [router] match rule syslog.0:file.1
[2025/03/08 14:12:37] [ info] [sp] stream processor started
[2025/03/08 14:12:38] [debug] [task] created task=0x7a1d0be56560 id=0 OK
[2025/03/08 14:12:38] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] syslog.0: [[1741443158.030000000, {}], {"day_time"=>"Mar 8 14:12:38", "ps"=>"haproxy", "pid"=>"1359293", "c_ip"=>"192.168.254.103", "c_port"=>"48730", "time"=>"08/Mar/2025:14:12:38.030", "f_end"=>"https-in", "b_end"=>"gitea_pool", "b_server"=>"gitea-1", "tq"=>"0", "tw"=>"0", "tc"=>"0", "tr"=>"16", "tt"=>"16", "status_code"=>"200", "bytes"=>"132", "req_cookie"=>"-", "res_cookie"=>"-", "t_state"=>"----", "actconn"=>"4", "feconn"=>"4", "beconn"=>"1", "srv_conn"=>"1", "retries"=>"0", "srv_queue"=>"0", "backend_queue"=>"0", "method"=>"POST", "path"=>"/api/actions/runner.v1.RunnerService/FetchTask", "version"=>"HTTP/1.1"}]
[2025/03/08 14:12:38] [debug] [out flush] cb_destroy coro_id=0
[2025/03/08 14:12:38] [debug] [output:file:file.1] task_id=0 assigned to thread #0
[2025/03/08 14:12:38] [debug] [out flush] cb_destroy coro_id=0
[2025/03/08 14:12:38] [debug] [task] destroy task=0x7a1d0be56560 (task_id=0)
You can see on the line beginning [0]
that we have a nicely parsed event log which has output our key and value pairs. Looking good.
Output to file
Whilst I am waiting to sit down and configure an OpenSearch instance to ship these logs to, I am going to output the logs to a file in json format. To do that, update your /etc/fluent-bit/fluent-bit.conf
and stop it writing the logs to stdout:
#[OUTPUT]
# name stdout
# #format json
# match *
While we are there, lets also switch log_level info
so that we don’t spam the logs on the system.
Then create the /etselogs/haproxy
folder and edit your /etc/fluent-bit/haproxy.conf
to add:
[OUTPUT]
Name file
Match *
Path /etselogs/haproxy
Format plain
Run Fluent-Bit with SystemD
If you haven’t already enabled the Fluent-Bit service to run on system start using systemctl enable fluent-bit
and then start the service using systemctl start fluent-bit
AND check that all is well using systemctl status fluent-bit.service
:
root@fluent01:/etc/fluent-bit# systemct status fluent-bit.service
-bash: systemct: command not found
root@fluent01:/etc/fluent-bit# systemct status fluent-bi^C
root@fluent01:/etc/fluent-bit# systemctl status fluent-bit.service
* fluent-bit.service - Fluent Bit
Loaded: loaded (/lib/systemd/system/fluent-bit.service; enabled; preset: enabled)
Active: active (running) since Sat 2025-03-08 14:23:22 AWST; 30s ago
Docs: https://docs.fluentbit.io/manual/
Main PID: 28445 (fluent-bit)
Tasks: 4 (limit: 76773)
Memory: 3.0M
CPU: 17ms
CGroup: /system.slice/fluent-bit.service
`-28445 /opt/fluent-bit/bin/fluent-bit -c //etc/fluent-bit/fluent-bit.conf
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [fluent bit] version=3.2.8, commit=, pid=28445
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [simd ] disabled
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [cmetrics] version=0.9.9
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [ctraces ] version=0.6.0
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [input:syslog:syslog.0] initializing
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [input:syslog:syslog.0] storage_strategy='memory' (memory only)
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [in_syslog] UDP server binding 0.0.0.0:514
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [sp] stream processor started
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [output:file:file.0] worker #0 started
Great stuff. Now if you want to review your logs you can simply:
tail -f /etselogs/haproxy/syslog.0
{"day_time":"Mar 8 14:25:02","ps":"haproxy","pid":"1359293","c_ip":"192.168.254.103","c_port":"34456","time":"08/Mar/2025:14:25:02.031","f_end":"https-in","b_end":"gitea_pool","b_server":"gitea-1","tq":"0","tw":"0","tc":"0","tr":"15","tt":"15","status_code":"200","bytes":"132","req_cookie":"-","res_cookie":"-","t_state":"----","actconn":"4","feconn":"4","beconn":"1","srv_conn":"1","retries":"0","srv_queue":"0","backend_queue":"0","method":"POST","path":"/api/actions/runner.v1.RunnerService/FetchTask","version":"HTTP/1.1"}
or to view the last log as nicely formatted json tail -1 /etselogs/haproxy/syslog.0 | json_pp
{
"actconn" : "4",
"b_end" : "gitea_pool",
"b_server" : "gitea-1",
"backend_queue" : "0",
"beconn" : "1",
"bytes" : "132",
"c_ip" : "192.168.254.103",
"c_port" : "39422",
"day_time" : "Mar 8 14:26:28",
"f_end" : "https-in",
"feconn" : "4",
"method" : "POST",
"path" : "/api/actions/runner.v1.RunnerService/FetchTask",
"pid" : "1359293",
"ps" : "haproxy",
"req_cookie" : "-",
"res_cookie" : "-",
"retries" : "0",
"srv_conn" : "1",
"srv_queue" : "0",
"status_code" : "200",
"t_state" : "----",
"tc" : "1",
"time" : "08/Mar/2025:14:26:28.029",
"tq" : "0",
"tr" : "37",
"tt" : "38",
"tw" : "0",
"version" : "HTTP/1.1"
}
To do
- There must be a way of rotating the log file.
- Export these logs to OpenSearch