Module trase.tools.aws.s3_access_logs

Parse AWS S3 server access logs into structured records.

Format reference: https://docs.aws.amazon.com/AmazonS3/latest/userguide/LogFormat.html

Each line is space-separated with three token styles — bracketed timestamps ([12/Mar/2026:09:00:56 +0000]), double-quoted fields (request line, referrer, user-agent) and bare tokens — and an empty field is a single -. AWS appends new trailing fields over time, so we tokenise the whole line (respecting quotes/brackets) and map only the first, stable set of fields positionally; any extra trailing tokens are ignored.

Used by the s3_access_log_events dbt model to turn the raw log objects under s3://trase-db-dumps/trase-storage-access-logs/ into a typed table.

Functions

def fetch_and_parse_access_logs(keys, bucket, client) ‑> list[dict]

Download each access-log object and parse it. Per-object errors are logged and skipped so one bad object can't abort an incremental load.

Returns the concatenated records across all keys.

def parse_access_log_line(line: str) ‑> dict | None

Parse one access-log line into a dict keyed by :data:_FIELDS, or None if the line has fewer tokens than the stable field set (malformed/partial).

def parse_access_log_lines(text: str) ‑> list[dict]

Parse a whole access-log file's text into a list of records, skipping blank and unparseable lines.