Gustavo A.
Data Engineer

Skills

Airflow

Spark

Sql

Python

Microsoft Azure

Amazon Aws

Google Cloud

Gustavo is available for hire

Hire Gustavo A.

All Howdy Candidates are vetted for skills and english proficiency.

Bio

Specializing in the development of data-driven solutions tailored for business applications, with a focus on identifying key problems, uncovering insights, and validating findings through data. Demonstrated expertise in implementing strategies that not only address business needs but also positively impact users.

Data Engineer
7/1/2022 - Present

Developed a robust Data Lakehouse utilizing S3 and Glue Data Catalog. Achieved seamless data ingestion integration with multiple providers from diverse sources like REST API, GraphQL, CDC, DMS, PostgreSQL, and MongoDB. Designed and implemented pipeline templates for AWS Glue using AWS CloudFormation. Successfully migrated pipelines from SQL to PySpark and transitioned infrastructure-as-code from CloudFormation to Terraform. Executed full pipelines running in AWS Glue, incorporating Workflow, Triggers, Crawlers, and Jobs to aggregate data from MongoDB, MySQL, and PostgreSQL, subsequently creating a Glue Data Catalog for connectivity through Redshift Spectrum and Athena or directly within Redshift Storage. Developed streaming data ingestion pipelines using AWS Kinesis and Spark Structured Streaming, as well as near real-time pipelines with AWS MSK and Spark Structured Streaming.
Data Engineer
4/1/2020 - 2/1/2022

Designed and implemented a Data Lakehouse utilizing S3 and Glue Data Catalog. Led the migration of data pipelines from on-premises servers and legacy graphical interface tools to AWS Glue, employing Python and PySpark jobs orchestrated with Triggers, Crawlers, and Workflows. Successfully migrated 10TB of data from SQL Server (2008-2012) to AWS S3, Redshift Spectrum, and Athena. Developed data models to enable a self-service schema for business analysts, and seamlessly integrated these models with Power BI from Redshift. Optimized a critical daily pipeline, reducing processing time from 8 hours to 20 minutes while managing around 60 GB of daily data ingestion. Conducted data journey workshops for business analysts and created pipeline templates to facilitate easy maintenance. Engineered a CI/CD pipeline leveraging AWS CodePipeline and CloudFormation. Integrated diverse data sources through REST API, GraphQL, and Change Data Capture (CDC). Utilized AWS MWAA (Amazon Managed Workflows for Apache Airflow) for pipeline orchestration, employing EMR clusters and Lambda functions for efficient processing. Migrated on-premises Oracle Database 12c with OLTP and OLAP models to AWS; the OLTP model to DynamoDB, improving development time to market, and the OLAP model to a single node Redshift cluster.
Data Analytics Engineer
3/1/2018 - 3/1/2020

Implemented a data-driven culture by migrating all Excel reports to Python and PySpark, significantly increasing the performance of Pentaho pipelines by 800%. Created a Data Lake using Google Cloud Storage and Google BigQuery, along with data modeling to provide a self-service schema for business analysts, seamlessly integrated with Power BI. Developed an automated pipeline using AWS Step Functions, Lambda, and EMR to deliver over 10,000 personalized reports for customers. Managed pipeline orchestration with Airflow (Google Cloud Composer) and DataProc to handle the ingestion of 3TB of data monthly from SQL Servers and FTP Servers, creating multiple star schemas and denormalized models for customer consumption.
Data Analytics Engineer
7/1/2017 - 3/1/2018

Developed a data warehouse utilizing SQL Server and PySpark to extract data from Salesforce, enabling the creation of key performance indicators (KPIs) for tracking the customer journey. Designed and implemented KPIs and dashboards in Power BI, providing comprehensive insights into the organization's forecast process and aligning strategic objectives. Transitioned legacy VBA and Excel reports to Python, streamlining reporting processes and enhancing data analysis capabilities.
Data Analytics Engineer
8/1/2014 - 6/1/2017

Automated Excel reports, significantly reducing errors and inconsistency. Improved the flow of information by developing a centralized pipeline utilizing SQL Server Triggers and Stored Procedures. Created key performance indicators (KPIs) and dashboards to enhance strategic focus.