Module trase.tools.aws.s3_access_logs
Parse AWS S3 server access logs into structured records.
Format reference: https://docs.aws.amazon.com/AmazonS3/latest/userguide/LogFormat.html
Each line is space-separated with three token styles — bracketed timestamps
([12/Mar/2026:09:00:56 +0000]), double-quoted fields (request line,
referrer, user-agent) and bare tokens — and an empty field is a single -.
AWS appends new trailing fields over time, so we tokenise the whole line
(respecting quotes/brackets) and map only the first, stable set of fields
positionally; any extra trailing tokens are ignored.
Used by the s3_access_log_events dbt model to turn the raw log objects
under s3://trase-db-dumps/trase-storage-access-logs/ into a typed table.
Functions
def fetch_and_parse_access_logs(keys, bucket, client) ‑> list[dict]-
Download each access-log object and parse it. Per-object errors are logged and skipped so one bad object can't abort an incremental load.
Returns the concatenated records across all
keys. def parse_access_log_line(line: str) ‑> dict | None-
Parse one access-log line into a dict keyed by :data:
_FIELDS, or None if the line has fewer tokens than the stable field set (malformed/partial). def parse_access_log_lines(text: str) ‑> list[dict]-
Parse a whole access-log file's text into a list of records, skipping blank and unparseable lines.