Module trase.tools.aws.aws_helpers

Functions for s3 access and parsing

Functions

def get_pandas_df(key, bucket='trase-storage', version_id=None, client=<botocore.client.S3 object>, track=True, sep=';', encoding='utf8', xlsx=False, print_version_id=False, **kwargs) ‑> pandas.core.frame.DataFrame
def get_s3_json(key, s3_resource=s3.ServiceResource(), bucket_name='trase-storage', track=True)
def get_s3_object_body(key, s3_client=<botocore.client.S3 object>, bucket_name='trase-storage', track=True)
def is_good_s3_key(key: str) ‑> bool

AWS S3 allows the key to be constructed from any UTF-8 character. However, in practice some characters like space or "!" cause problems.

To make problems less likely we should stick to only a limited set of characters:

0-9 a-z A-Z _ . ( )

Plus, of course, the directory separator. This is in line with AWS's recommendation (https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html).

def list_version_ids(key, bucket='trase-storage', client=<botocore.client.S3 object>, ascending=True) ‑> List[VersionIdResponse]

List Version IDs for a given S3 object.

These can be a mix of versions and delete markers. By default the versions are returned in ascending order, with the latest version ID positioned as the last element in the list.

Args

key
a key to a S3 object
bucket
S3 bucket
ascending
sort list in ascending last modified (default) or descending

Returns: a list of tuples (version_id, modified, is_delete) containing the string version ID, a datetime object representing when the version ID was created, and a boolean representing whether the version ID is a delete marker.

def make_good_s3_key(string) ‑> str
def parse_aws_object2(key: str, encoding='utf-8', separator=';', quoting=None, bucket='trase-storage', Range='bytes=-', track=True)
def read_csv(key, bucket='trase-storage', version_id=None, client=<botocore.client.S3 object>, track=True, print_version_id=False, **kwargs) ‑> pandas.core.frame.DataFrame

Read an S3 object containing CSV data to a DataFrame

Args

kwargs
passed through to pd.read_csv
def read_geojson(key, bucket='trase-storage', version_id=None, client=<botocore.client.S3 object>, track=True, print_version_id=False, **kwargs) ‑> pandas.core.frame.DataFrame

Import GeoPandas and Read an S3 object containing Geometry data to a DataFrame

Args

kwargs
passed through to geopandas.read_file
def read_json(key, bucket='trase-storage', version_id=None, client=<botocore.client.S3 object>, track=True, print_version_id=False, **kwargs)

Args

kwargs
passed through to json.load
def read_s3_csv(key, s3_client=<botocore.client.S3 object>, bucket_name='trase-storage', track=True)
def read_s3_folder(folder, prefix='', s3_resource=s3.ServiceResource(), bucket_name='trase-storage')
def read_s3_object(key, s3_resource=s3.ServiceResource(), bucket_name='trase-storage', track=True)
def read_s3_parquet(key, bucket='trase-storage', version_id=None, client=<botocore.client.S3 object>, track=True, print_version_id=False, **kwargs) ‑> pandas.core.frame.DataFrame

Read a Parquet dataset from S3 to a Pandas DataFrame. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_parquet.html

Arguments

kwargs: passed to pd.read_parquet

def read_xlsx(key, bucket='trase-storage', version_id=None, client=<botocore.client.S3 object>, track=True, print_version_id=False, **kwargs) ‑> pandas.core.frame.DataFrame

Read an S3 object containing XLSX data to a DataFrame

Args

kwargs
passed through to pd.read_excel
def read_yaml(key, bucket='trase-storage', version_id=None, client=<botocore.client.S3 object>, track=True, print_version_id=False, **kwargs)

Args

kwargs
passed through to yaml.safe_load
def stream_object(key, bucket='trase-storage', version_id=None, client=<botocore.client.S3 object>, track=True, print_version_id=False, decode=True, encoding='utf8', decoding_errors='strict', **kwargs) ‑> ContextManager[IO]

Returns a stream containing the body of an S3 object. This function is a context manager to ensure that the HTTP stream gets closed:

with stream_object("my-object", "my_bucket") as file:
    print("The contents are", file.read())

Args

decode
if true then the stream will contain text; otherwise it will contain bytes.
decoding_errors
see the "errors" parameter of https://docs.python.org/3/library/io.html#io.TextIOWrapper
track
if true then the S3 object will be added to trase.tools.aws.metadata.S3_OBJECTS_ACCESSED_IN_CURRENT_SESSION
def upload_pandas_df_to_s3(df, new_key, sep=';', encoding='utf8', float_format='%.2f', quotechar='"', bucket_name='trase-storage')

Upload a csv dataset to s3 from a pandas DataFrame

:param df: pandas DataFrame object :param new_key: s3 path :param sep: Separator :param bucket_name: s3 bucket name :param encoding: encoding str :param float_format: format of float columns :param quotechar: quoting character

:return: AWS ServiceResource object

def upload_s3_csv_buffer(csv_buffer, key_name, s3_client=<botocore.client.S3 object>, bucket_name='trase-storage')

Uploads local file to s3

Classes

class VersionId (version_id, modified, is_delete)

VersionIdResponse(version_id, modified, is_delete)

Ancestors

  • builtins.tuple

Instance variables

var is_delete

Alias for field number 2

var modified

Alias for field number 1

var version_id

Alias for field number 0