HAProxy Log Parsing & Fluent-bit

I am finally doing something with the Logs from my Anycast Reverse Proxy based on HAProxy. This epic started a few days ago and has led to other disoveries but in this we will cover just the HAProxy and Fluent…bits.

HAProxy Log Format

After looking at a custom log pattern and all sorts of other permutations, I have ended up simply using the HAProxy option httplog configuratoin. That spits out logs that look like this:

Mar  6 21:16:10 hap01.mgmt.etse.me haproxy[1222631]: 10.192.54.72:50501 [06/Mar/2025:21:16:10.686] https-in~ drop_pool/drop-1 0/0/1/45/46 207 887 - - ---- 3/3/0/0/0 0/0 "PROPFIND /remote.php/dav/files/jacko/ HTTP/1.1"

Great article HERE with more detail on HAProxy Logging config. Another good one HERE.

I am using then using the HAProxy configuration option log fluent01.sdn.etse.me:514 local0 info to ship these logs to Fluent-Bit which is the equivalent of logstash in the ELK stack, usually paired with OpenSearch.

Fluent Bit

Once you’ve followed the installation guide to get Fluent Bit up and running. I followed the instructions for Debian 12 to get it going.

Basic Configuration

To get up and running, you need can simply modify /etc/fluent-bit/fluent-bit.conf to comment out the default INPUT for cpu and update the default OUTPUT to use stdout (echo to the screen) and match * so it will output all logs received to stdout. Also add an @INCLUDE as below.

#[INPUT]
#    name cpu
#    tag  cpu.local

    # Read interval (sec) Default: 1
#    interval_sec 1

[OUTPUT]
    name    stdout
    #format json
    match   *

@INCLUDE haproxy.conf

Now we can create our /etc/fluent-bit/haproxy.conf in which we will tell Fluent Bit to listen for syslog on port 514:

[INPUT]
    Name     syslog
    Parser   haproxy
    Listen   0.0.0.0
    Port     514
    Mode     udp

Aaaand edit /etc/fluent-bit/paresers.conf to include the following stanza at the end of the file:

[PARSER]
    Name  haproxy
    Format  regex
    Regex  (?<day_time>\w+  \d+ \S+) (?<ps>\S+)\[(?<pid>\d+)\]: (?<c_ip>\S+):(?<c_port>\d+) \[(?<time>.+)\] (?<f_end>[\w-]+)(\~)?.(?<b_end>[\w-]*)\/(?<b_server><?[\w-]+>?) (?<tq>-?\d+)\/(?<tw>-?\d+)\/(?<tc>-?\d+)\/(?<tr>-?\d+)\/(?<tt>-?\d+) (?<status_code>\d+) (?<bytes>\d+) (?<req_cookie>\S+) (?<res_cookie>\S+) (?<t_state>[\w-]+) (?<actconn>-?[0-9])\/(?<feconn>-?\d+)\/(?<beconn>\d+)\/(?<srv_conn>\d+)\/(?<retries>\d+) (?<srv_queue>\d+)\/(?<backend_queue>\d+) (\"<?[\w]+>?\")*(\"((?<method>[\w]+) (?<path>[^#?\s]+)(\?)?(?<query>[^#?\s+]*).(?<version>[\w+\/\.]+))\")*
    Time_Key    time
    Time_Format %d/%b/%Y:%H:%M:%S.%L
    Time_Keep   On
    Time_Strict  Off

Regular Expressions

The secret sauce here is the Regex line which is used to parse the log lines into the constituent fields. I couldn’t find a pre-canned Regex for httplog that worked for me. Maybe I was being dim. So I pulled this together from a few sources and it works well so far.

The pattern was developed using https://regex101.com using the PCRE (PHP < 7.3) parser and validated using the following patterns:

Mar  6 13:33:45 haproxy[1222631]: 192.168.254.103:51340 [06/Mar/2025:13:33:45.184] https-in~ gitea_pool/gitea-1 0/0/1/15/16 200 132 - - ---- 2/2/0/0/0 0/0 "POST /api/actions/runner.v1.RunnerService/FetchTask HTTP/1.1"
Mar  6 13:33:45 haproxy[1222631]: 192.168.254.103:51340 [06/Mar/2025:13:33:45.184] https-in~ gitea_pool/gitea-1 0/0/1/15/16 200 132 - - ---- 2/2/0/0/0 0/0 "POST /api/actions/runner.v1.RunnerService/FetchTask?john=doe&doe=did HTTP/1.1"
Mar  6 13:33:45 haproxy[1222631]: 192.168.254.103:51340 [06/Mar/2025:13:33:45.184] https-in~ gitea_pool/gitea-1 0/0/1/15/16 200 132 - - ---- 2/2/0/0/0 0/0 "POST / HTTP/1.1"
Mar  6 13:33:45 haproxy[1222631]: 192.168.254.103:51340 [06/Mar/2025:13:33:45.184] https-in~ gitea_pool/gitea-1 0/0/1/15/16 200 132 - - ---- 2/2/0/0/0 0/0 "POST /index.html HTTP/1.1"

Mar  7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https-in/<NOSRV> -1/-1/-1/-1/0 400 0 - - PR-- 4/4/0/0/0 0/0 "<BADREQ>"
Mar  7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https_in/gitea-1 1/1/1/1/0 400 0 - - P-- 4/4/0/0/0 0/0 "POST / HTTP1.1"
Mar  7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https-in/<NOSRV> 1/1/1/1/0 400 0 - - P-- 4/4/0/0/0 0/0 "POST / HTTP1.1"
Mar  7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https-in/<NOSRV> 1/1/1/1/0 400 0 - - P-- 4/4/0/0/0 0/0 "<BADREQ>"
Mar  7 20:00:03 haproxy[1247607]: 192.168.254.97:56102 [07/Mar/2025:20:00:03.422] https-in~ https-in/<NOSRV> -1/1/1/1/0 400 0 - - P-- 4/4/0/0/0 0/0 "<BADREQ>"

Mar  8 08:46:17 haproxy[1348846]: 43.157.95.239:43770 [08/Mar/2025:08:46:17.808] http-in http-in/<NOSRV> 0/-1/-1/-1/0 301 88 - - LR-- 5/1/0/0/0 0/0 "GET / HTTP/1.1"

Time Formatting

The other bit of secret sauce is the Time_Format %d/%b/%Y:%H:%M:%S.%L line which processes the [06/Mar/2025:21:16:10.686] field to a timestamp. I used the reference https://www.programiz.com/python-programming/datetime/strptime to help. It took an embarresingly long time to work out. Also the %L field I couldn’t find documented but I believe it to represent milliseconds.

Testing it out

The best way I found to debug Fluent Bit was to set the log_level debug in the /etc/fluent-bit/fluent-bit.conf and then to run fluent bit interactively using /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf:

root@fluent01:/etc/fluent-bit# /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf
Fluent Bit v3.2.8
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  _____
|  ___| |                | |   | ___ (_) |         |____ |/ __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \  / /
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)_____/


[2025/03/08 14:12:37] [ info] Configuration:
[2025/03/08 14:12:37] [ info]  flush time     | 1.000000 seconds
[2025/03/08 14:12:37] [ info]  grace          | 5 seconds
[2025/03/08 14:12:37] [ info]  daemon         | 0
[2025/03/08 14:12:37] [ info] ___________
[2025/03/08 14:12:37] [ info]  inputs:
[2025/03/08 14:12:37] [ info]      syslog
[2025/03/08 14:12:37] [ info] ___________
[2025/03/08 14:12:37] [ info]  filters:
[2025/03/08 14:12:37] [ info] ___________
[2025/03/08 14:12:37] [ info]  outputs:
[2025/03/08 14:12:37] [ info]      stdout.0
[2025/03/08 14:12:37] [ info]      file.1
[2025/03/08 14:12:37] [ info] ___________
[2025/03/08 14:12:37] [ info]  collectors:
[2025/03/08 14:12:37] [ info] [fluent bit] version=3.2.8, commit=, pid=28311
[2025/03/08 14:12:37] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2025/03/08 14:12:37] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/03/08 14:12:37] [ info] [simd    ] disabled
[2025/03/08 14:12:37] [ info] [cmetrics] version=0.9.9
[2025/03/08 14:12:37] [ info] [ctraces ] version=0.6.0
[2025/03/08 14:12:37] [ info] [input:syslog:syslog.0] initializing
[2025/03/08 14:12:37] [ info] [input:syslog:syslog.0] storage_strategy='memory' (memory only)
[2025/03/08 14:12:37] [debug] [syslog:syslog.0] created event channels: read=25 write=26
[2025/03/08 14:12:37] [debug] [downstream] listening on 0.0.0.0:514
[2025/03/08 14:12:37] [ info] [in_syslog] UDP server binding 0.0.0.0:514
[2025/03/08 14:12:37] [debug] [stdout:stdout.0] created event channels: read=28 write=29
[2025/03/08 14:12:37] [debug] [file:file.1] created event channels: read=37 write=38
[2025/03/08 14:12:37] [ info] [output:stdout:stdout.0] worker #0 started
[2025/03/08 14:12:37] [debug] [router] match rule syslog.0:stdout.0
[2025/03/08 14:12:37] [ info] [output:file:file.1] worker #0 started
[2025/03/08 14:12:37] [debug] [router] match rule syslog.0:file.1
[2025/03/08 14:12:37] [ info] [sp] stream processor started
[2025/03/08 14:12:38] [debug] [task] created task=0x7a1d0be56560 id=0 OK
[2025/03/08 14:12:38] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] syslog.0: [[1741443158.030000000, {}], {"day_time"=>"Mar  8 14:12:38", "ps"=>"haproxy", "pid"=>"1359293", "c_ip"=>"192.168.254.103", "c_port"=>"48730", "time"=>"08/Mar/2025:14:12:38.030", "f_end"=>"https-in", "b_end"=>"gitea_pool", "b_server"=>"gitea-1", "tq"=>"0", "tw"=>"0", "tc"=>"0", "tr"=>"16", "tt"=>"16", "status_code"=>"200", "bytes"=>"132", "req_cookie"=>"-", "res_cookie"=>"-", "t_state"=>"----", "actconn"=>"4", "feconn"=>"4", "beconn"=>"1", "srv_conn"=>"1", "retries"=>"0", "srv_queue"=>"0", "backend_queue"=>"0", "method"=>"POST", "path"=>"/api/actions/runner.v1.RunnerService/FetchTask", "version"=>"HTTP/1.1"}]
[2025/03/08 14:12:38] [debug] [out flush] cb_destroy coro_id=0
[2025/03/08 14:12:38] [debug] [output:file:file.1] task_id=0 assigned to thread #0
[2025/03/08 14:12:38] [debug] [out flush] cb_destroy coro_id=0
[2025/03/08 14:12:38] [debug] [task] destroy task=0x7a1d0be56560 (task_id=0)

You can see on the line beginning [0] that we have a nicely parsed event log which has output our key and value pairs. Looking good.

Output to file

Whilst I am waiting to sit down and configure an OpenSearch instance to ship these logs to, I am going to output the logs to a file in json format. To do that, update your /etc/fluent-bit/fluent-bit.conf and stop it writing the logs to stdout:

#[OUTPUT]
#    name    stdout
#    #format json
#    match   *

While we are there, lets also switch log_level info so that we don’t spam the logs on the system.

Then create the /etselogs/haproxy folder and edit your /etc/fluent-bit/haproxy.conf to add:

[OUTPUT]
    Name   file
    Match  *
    Path   /etselogs/haproxy
    Format plain

Run Fluent-Bit with SystemD

If you haven’t already enabled the Fluent-Bit service to run on system start using systemctl enable fluent-bit and then start the service using systemctl start fluent-bit AND check that all is well using systemctl status fluent-bit.service:

root@fluent01:/etc/fluent-bit# systemct status fluent-bit.service
-bash: systemct: command not found
root@fluent01:/etc/fluent-bit# systemct status fluent-bi^C
root@fluent01:/etc/fluent-bit# systemctl status fluent-bit.service
* fluent-bit.service - Fluent Bit
     Loaded: loaded (/lib/systemd/system/fluent-bit.service; enabled; preset: enabled)
     Active: active (running) since Sat 2025-03-08 14:23:22 AWST; 30s ago
       Docs: https://docs.fluentbit.io/manual/
   Main PID: 28445 (fluent-bit)
      Tasks: 4 (limit: 76773)
     Memory: 3.0M
        CPU: 17ms
     CGroup: /system.slice/fluent-bit.service
             `-28445 /opt/fluent-bit/bin/fluent-bit -c //etc/fluent-bit/fluent-bit.conf

Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [fluent bit] version=3.2.8, commit=, pid=28445
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [simd    ] disabled
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [cmetrics] version=0.9.9
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [ctraces ] version=0.6.0
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [input:syslog:syslog.0] initializing
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [input:syslog:syslog.0] storage_strategy='memory' (memory only)
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [in_syslog] UDP server binding 0.0.0.0:514
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [sp] stream processor started
Mar 08 14:23:22 fluent01 fluent-bit[28445]: [2025/03/08 14:23:22] [ info] [output:file:file.0] worker #0 started

Great stuff. Now if you want to review your logs you can simply:

tail -f /etselogs/haproxy/syslog.0
{"day_time":"Mar  8 14:25:02","ps":"haproxy","pid":"1359293","c_ip":"192.168.254.103","c_port":"34456","time":"08/Mar/2025:14:25:02.031","f_end":"https-in","b_end":"gitea_pool","b_server":"gitea-1","tq":"0","tw":"0","tc":"0","tr":"15","tt":"15","status_code":"200","bytes":"132","req_cookie":"-","res_cookie":"-","t_state":"----","actconn":"4","feconn":"4","beconn":"1","srv_conn":"1","retries":"0","srv_queue":"0","backend_queue":"0","method":"POST","path":"/api/actions/runner.v1.RunnerService/FetchTask","version":"HTTP/1.1"}

or to view the last log as nicely formatted json tail -1 /etselogs/haproxy/syslog.0 | json_pp

{
   "actconn" : "4",
   "b_end" : "gitea_pool",
   "b_server" : "gitea-1",
   "backend_queue" : "0",
   "beconn" : "1",
   "bytes" : "132",
   "c_ip" : "192.168.254.103",
   "c_port" : "39422",
   "day_time" : "Mar  8 14:26:28",
   "f_end" : "https-in",
   "feconn" : "4",
   "method" : "POST",
   "path" : "/api/actions/runner.v1.RunnerService/FetchTask",
   "pid" : "1359293",
   "ps" : "haproxy",
   "req_cookie" : "-",
   "res_cookie" : "-",
   "retries" : "0",
   "srv_conn" : "1",
   "srv_queue" : "0",
   "status_code" : "200",
   "t_state" : "----",
   "tc" : "1",
   "time" : "08/Mar/2025:14:26:28.029",
   "tq" : "0",
   "tr" : "37",
   "tt" : "38",
   "tw" : "0",
   "version" : "HTTP/1.1"
}

To do

  • There must be a way of rotating the log file.
  • Export these logs to OpenSearch

© 2021. All rights reserved.

Powered by Hydejack v9.1.6