Module trase.tools.aws.aws_helpers_cached

Functions

def get_pandas_df_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs) ‑> pandas.core.frame.DataFrame

Load a CSV file on S3 into a Pandas dataframe.

The file will only be downloaded once: thereafter it is stored in the local cache using the joblib library. The cache key includes the ETag of the object, so it will be up-to-date even if the remote object changes content.

All other arguments are passed to get_pandas_df().

def read_geojson_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs)

Cached version of :func:read_geojson().

Downloaded once, then served from the local joblib cache (keyed by S3 ETag). All other arguments are passed through to read_geojson.

def read_parquet_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs)

Cached version of :func:read_s3_parquet().

Downloaded once, then served from the local joblib cache (keyed by S3 ETag). All other arguments are passed through to read_s3_parquet.

def read_polars_csv_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs)

Load a CSV file on S3 into a Polars dataframe.

The file will only be downloaded once: thereafter it is served from the local joblib cache (keyed by the object's S3 ETag), so it stays up-to-date even if the remote object changes content.

All other arguments are passed to read_polars_csv().

def read_polars_parquet_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs)

Load a Parquet file on S3 into a Polars dataframe.

The file will only be downloaded once: thereafter it is served from the local joblib cache (keyed by the object's S3 ETag), so it stays up-to-date even if the remote object changes content.

All other arguments are passed to read_polars_parquet().