Glider
|
Functions | |
read_report_csv (filename, chunk) | |
group_by_field (data, fields, snapshot) | |
build_snapshot (df, field, snapshot) | |
upload_mongo (event, final_path, snapshot, period) | |
search_snpashots_id (files, date) | |
create_snapshot (event, final_path) | |
Constants | |
f_path = __file__ | |
index = f_path.find("report_generation/") | |
ACCESS_ID | |
Access keys for AWS. | |
ACCESS_KEY | |
Access keys for AWS. | |
mongo_conn = mongo_connection() | |
Class to connect to mongodb. | |
snap_collection = mongo_conn.mongo_conn_snapshots() | |
Mongo collection to create/update snapshots. | |
final_df = DataFrame() | |
int | total_rows = 0 |
int | total_local = 0 |
session = boto3.Session(aws_access_key_id=ACCESS_ID, aws_secret_access_key= ACCESS_KEY) | |
Connection for AWS using boto3. | |
update_at = dt.now().replace(hour=0, minute=0, second=0, microsecond=0) | |
It's used to know when snapshot was created/updated. | |
src.report_generation.ReportSnapshot.build_snapshot | ( | df, | |
field, | |||
snapshot ) |
Checks the field and fill the snapshot field using grouped data. Args: df (pandas dataframe): current parquet file loaded as dataframe field (str): current file used as filter snapshot (dict): it will contain all fields info Returns: snapshot (dict)
Definition at line 68 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.create_snapshot | ( | event, | |
final_path ) |
Executes full procedure Args: event (dict): contains the file and client info final_path (str): Path where original csv is saved in s3 Returns: Nothing
Definition at line 136 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.group_by_field | ( | data, | |
fields, | |||
snapshot ) |
Group by service_id, territory_code, artists and tracks Args: data (pandas dataframe): current parquet file loaded as dataframe fields (list): list of fields that are considered to snapshot snapshot (dict): it will contain all fields info Returns: snap (dict)
Definition at line 53 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.read_report_csv | ( | filename, | |
chunk ) |
Receives a S3 path and loads the data using awswrangler ans chunks (1M lines per chunk) Args: filename (str): s3 file location chunk (int): chunk size Returns: final_df (pandas dataframe)
Definition at line 35 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.search_snpashots_id | ( | files, | |
date ) |
Searches for all snapshots id in mongoDB.
Definition at line 126 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.upload_mongo | ( | event, | |
final_path, | |||
snapshot, | |||
period ) |
Creates and updates snapshot to mongoDB. Args: event (dict): contains the file and client info final_path (str): Path where original csv is saved in s3 snapshot (dict): it contains all fields info period (str): month and year when sales were processed Returns: Nothing
Definition at line 97 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.ACCESS_ID |
Access keys for AWS.
Definition at line 19 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.ACCESS_KEY |
Access keys for AWS.
Definition at line 19 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.f_path = __file__ |
Definition at line 11 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.final_df = DataFrame() |
Definition at line 26 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.index = f_path.find("report_generation/") |
Definition at line 12 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.mongo_conn = mongo_connection() |
Class to connect to mongodb.
Definition at line 22 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.session = boto3.Session(aws_access_key_id=ACCESS_ID, aws_secret_access_key= ACCESS_KEY) |
Connection for AWS using boto3.
Definition at line 31 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.snap_collection = mongo_conn.mongo_conn_snapshots() |
Mongo collection to create/update snapshots.
Definition at line 24 of file ReportSnapshot.py.
int src.report_generation.ReportSnapshot.total_local = 0 |
Definition at line 28 of file ReportSnapshot.py.
int src.report_generation.ReportSnapshot.total_rows = 0 |
Definition at line 27 of file ReportSnapshot.py.
src.report_generation.ReportSnapshot.update_at = dt.now().replace(hour=0, minute=0, second=0, microsecond=0) |
It's used to know when snapshot was created/updated.
Definition at line 33 of file ReportSnapshot.py.