Glider
Loading...
Searching...
No Matches
src.report_generation.ReportSnapshot Namespace Reference

Functions

 read_report_csv (filename, chunk)
 
 group_by_field (data, fields, snapshot)
 
 build_snapshot (df, field, snapshot)
 
 upload_mongo (event, final_path, snapshot, period)
 
 search_snpashots_id (files, date)
 
 create_snapshot (event, final_path)
 

Constants

 f_path = __file__
 
 index = f_path.find("report_generation/")
 
 ACCESS_ID
 Access keys for AWS.
 
 ACCESS_KEY
 Access keys for AWS.
 
 mongo_conn = mongo_connection()
 Class to connect to mongodb.
 
 snap_collection = mongo_conn.mongo_conn_snapshots()
 Mongo collection to create/update snapshots.
 
 final_df = DataFrame()
 
int total_rows = 0
 
int total_local = 0
 
 session = boto3.Session(aws_access_key_id=ACCESS_ID, aws_secret_access_key= ACCESS_KEY)
 Connection for AWS using boto3.
 
 update_at = dt.now().replace(hour=0, minute=0, second=0, microsecond=0)
 It's used to know when snapshot was created/updated.
 

Function Documentation

◆ build_snapshot()

src.report_generation.ReportSnapshot.build_snapshot ( df,
field,
snapshot )
Checks the field and fill the snapshot field using grouped data.

Args:
    df (pandas dataframe): current parquet file loaded as dataframe
    field (str): current file used as filter
    snapshot (dict): it will contain all fields info
Returns: snapshot (dict)

Definition at line 68 of file ReportSnapshot.py.

Here is the caller graph for this function:

◆ create_snapshot()

src.report_generation.ReportSnapshot.create_snapshot ( event,
final_path )
Executes full procedure

Args:
    event (dict): contains the file and client info
    final_path (str): Path where original csv is saved in s3
Returns: Nothing

Definition at line 136 of file ReportSnapshot.py.

Here is the call graph for this function:

◆ group_by_field()

src.report_generation.ReportSnapshot.group_by_field ( data,
fields,
snapshot )
Group by service_id, territory_code, artists and tracks

Args:
    data (pandas dataframe): current parquet file loaded as dataframe
    fields (list): list of fields that are considered to snapshot
    snapshot (dict): it will contain all fields info
Returns: snap (dict)

Definition at line 53 of file ReportSnapshot.py.

Here is the call graph for this function:

◆ read_report_csv()

src.report_generation.ReportSnapshot.read_report_csv ( filename,
chunk )
Receives a S3 path and loads the data using awswrangler ans chunks (1M lines per chunk)

Args:
    filename (str): s3 file location
    chunk (int): chunk size
Returns: final_df (pandas dataframe)

Definition at line 35 of file ReportSnapshot.py.

Here is the caller graph for this function:

◆ search_snpashots_id()

src.report_generation.ReportSnapshot.search_snpashots_id ( files,
date )
Searches for all snapshots id in mongoDB.

Definition at line 126 of file ReportSnapshot.py.

Here is the caller graph for this function:

◆ upload_mongo()

src.report_generation.ReportSnapshot.upload_mongo ( event,
final_path,
snapshot,
period )
Creates and updates snapshot to mongoDB.

Args:
    event (dict): contains the file and client info
    final_path (str): Path where original csv is saved in s3
    snapshot (dict): it contains all fields info
    period (str): month and year when sales were processed
Returns: Nothing

Definition at line 97 of file ReportSnapshot.py.

Here is the caller graph for this function:

Constant Documentation

◆ ACCESS_ID

src.report_generation.ReportSnapshot.ACCESS_ID

Access keys for AWS.

Definition at line 19 of file ReportSnapshot.py.

◆ ACCESS_KEY

src.report_generation.ReportSnapshot.ACCESS_KEY

Access keys for AWS.

Definition at line 19 of file ReportSnapshot.py.

◆ f_path

src.report_generation.ReportSnapshot.f_path = __file__

Definition at line 11 of file ReportSnapshot.py.

◆ final_df

src.report_generation.ReportSnapshot.final_df = DataFrame()

Definition at line 26 of file ReportSnapshot.py.

◆ index

src.report_generation.ReportSnapshot.index = f_path.find("report_generation/")

Definition at line 12 of file ReportSnapshot.py.

◆ mongo_conn

src.report_generation.ReportSnapshot.mongo_conn = mongo_connection()

Class to connect to mongodb.

Definition at line 22 of file ReportSnapshot.py.

◆ session

src.report_generation.ReportSnapshot.session = boto3.Session(aws_access_key_id=ACCESS_ID, aws_secret_access_key= ACCESS_KEY)

Connection for AWS using boto3.

Definition at line 31 of file ReportSnapshot.py.

◆ snap_collection

src.report_generation.ReportSnapshot.snap_collection = mongo_conn.mongo_conn_snapshots()

Mongo collection to create/update snapshots.

Definition at line 24 of file ReportSnapshot.py.

◆ total_local

int src.report_generation.ReportSnapshot.total_local = 0

Definition at line 28 of file ReportSnapshot.py.

◆ total_rows

int src.report_generation.ReportSnapshot.total_rows = 0

Definition at line 27 of file ReportSnapshot.py.

◆ update_at

src.report_generation.ReportSnapshot.update_at = dt.now().replace(hour=0, minute=0, second=0, microsecond=0)

It's used to know when snapshot was created/updated.

Definition at line 33 of file ReportSnapshot.py.