Monday, May 18, 2026

multi-file sort NCSA combined based log files

problem: you want to grep and sort multiple NCSA combined based log files

For example, you have a dir with 30 log files for an apache2 vhost, some uncompressed, some compressed (logrotated) etc. Months and years might be arbitrary. Lets say you want to concatenate all the log entries, grep/awk a specific keyword such as year, and then sort the entries based on the NCSA combined timestamp format, which doesn't naturally sort as single key/field... 

The NCSA timestamp format:  date +'[%d/%b/%Y:%H:%M:%S %z]' yields: [18/May/2026:22:12:35 +0100] 

So 👆 this timestamp format is actually spanning two fields and requires specific subfield sorting.

impact: manual repetition required for each file

Without a multi-file pipeline, the operator will:

  1. have to check each file individually
  2. not have a single overview e.g. less buffer or concatenated output file
  3. not have a single grepable text stream for further filtering/discovery

solution: zgrep pipeline

  1. find ... -exec ... {} + efficiently avoids any glob argument limits
  2. zgrep -Fhi automatically handles both (un)compressed files, omitting filenames, ignoring case and  searching for a fixed string e.g. file.php
  3. awk filters based on one or more column values, e.g. timestamp year
  4. LANG=C sort -s performs stable sorting using the C locale.
    Multiple -k options define a cascading sort hierarchy, where each option's argument specifies the sub-field position and sort type (numeric/month/etc).
  5. The less -S pager provides a scrollable and searchable buffer for overview and further discovery
# subshell to avoid changing shells pwd/cwd and provides a single text stream to awk
( cd /var/log/apache2/sub.domain.tld && \
find . -maxdepth 1 -name 'access.log*' -exec zgrep -Fhi file.php {} + ) \
| awk '$4 ~ /\/2026:/' \
| LANG=C sort -s -k4.10,4.13n -k4.6,4.8M -k4.3,4.4n -k4.15,4.22 -k5.3,5.7n \
| less -S

This scenario reminded me of my 2010 post on Parsing NCSA combined log format - working with columns.

Tested with sort (GNU coreutils) 9.1.

No comments: