Module trase.tools.aws.aws_helpers_cached
Functions
def get_pandas_df_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs) ‑> pandas.core.frame.DataFrame-
Load a CSV file on S3 into a Pandas dataframe.
The file will only be downloaded once: thereafter it is stored in the local cache using the joblib library. The cache key includes the ETag of the object, so it will be up-to-date even if the remote object changes content.
All other arguments are passed to
get_pandas_df(). def read_geojson_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs)-
Cached version of :func:
read_geojson().Downloaded once, then served from the local joblib cache (keyed by S3 ETag). All other arguments are passed through to
read_geojson. def read_parquet_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs)-
Cached version of :func:
read_s3_parquet().Downloaded once, then served from the local joblib cache (keyed by S3 ETag). All other arguments are passed through to
read_s3_parquet. def read_polars_csv_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs)-
Load a CSV file on S3 into a Polars dataframe.
The file will only be downloaded once: thereafter it is served from the local joblib cache (keyed by the object's S3 ETag), so it stays up-to-date even if the remote object changes content.
All other arguments are passed to
read_polars_csv(). def read_polars_parquet_once(key, bucket='trase-storage', version_id=None, client=None, print_version_id=False, **kwargs)-
Load a Parquet file on S3 into a Polars dataframe.
The file will only be downloaded once: thereafter it is served from the local joblib cache (keyed by the object's S3 ETag), so it stays up-to-date even if the remote object changes content.
All other arguments are passed to
read_polars_parquet().