problem: you want to grep and sort multiple NCSA combined based log files
For example, you have a dir with 30 log files for an apache2 vhost, some uncompressed, some compressed (logrotated) etc. Months and years might be arbitrary. Lets say you want to concatenate all the log entries, grep/awk a specific keyword such as year, and then sort the entries based on the NCSA combined timestamp format, which doesn't naturally sort as single key/field...
The NCSA timestamp format: date +'[%d/%b/%Y:%H:%M:%S %z]' yields: [18/May/2026:22:12:35 +0100]
So 👆 this timestamp format is actually spanning two fields and requires specific subfield sorting.
impact: manual repetition required for each file
Without a multi-file pipeline, the operator will:
- have to check each file individually
- not have a single overview e.g.
lessbuffer or concatenated output file - not have a single grepable text stream for further filtering/discovery
solution: zgrep pipeline
find ... -exec ... {} +efficiently avoids any glob argument limitszgrep -Fhiautomatically handles both (un)compressed files, omitting filenames, ignoring case and searching for a fixed string e.g.file.phpawkfilters based on one or more column values, e.g. timestamp yearLANG=C sort -sperforms stable sorting using the C locale.
Multiple-koptions define a cascading sort hierarchy, where each option's argument specifies the sub-field position and sort type (numeric/month/etc).- The
less -Spager provides a scrollable and searchable buffer for overview and further discovery
# subshell to avoid changing shells pwd/cwd and provides a single text stream to awk
( cd /var/log/apache2/sub.domain.tld && \
find . -maxdepth 1 -name 'access.log*' -exec zgrep -Fhi file.php {} + ) \
| awk '$4 ~ /\/2026:/' \
| LANG=C sort -s -k4.10,4.13n -k4.6,4.8M -k4.3,4.4n -k4.15,4.22 -k5.3,5.7n \
| less -S
This scenario reminded me of my 2010 post on Parsing NCSA combined log format - working with columns.
Tested with sort (GNU coreutils) 9.1.