Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Thiago Rodrigues da Silva

Data Engineer
Contagem,MG

Summary

Currently, I work as a Data and ML Engineer. I hold a degree in Computer Engineering from the Federal Center for Technological Education of Minas Gerais (CEFET-MG) and am currently pursuing a Master's degree in Computer Science at the Federal University of Minas Gerais (UFMG). I have worked on solving problems such as cloud architecture (AWS and GCP), creating data pipelines with quality assurance, computer vision, machine learning, anomaly detection, and product recommendations.

Overview

7
7
years of professional experience
9
9
years of post-secondary education
2
2
Languages

Work History

Tech Lead - Data Engineer & AI Engineer

SevenX Gaming
Minas Gerais
09.2024 - Current

Successfully led the iGaming regulation project for SevenX Gaming in Brazil, involving:

  • Creation of Python pipelines to extract data from MySQL;
  • Data processing using Python to partition, generate XML files, digitally sign, and compress them in base64;
  • Implementation of a logging system to ensure submission success;
  • Back-end development for a web application to manually send files in case of persistent failures
  • Architected a robust AWS infrastructure, utilizing services like IAM, EC2, Glue, Lake Formation, Athena, RDS, Redshift, IAM Center, CloudWatch, Python (Pandas, Polars, PySpark), and Java

Achievements include:

  • Creation and management of Data Lakes and Data Warehouses (ETL/ELT);
  • Pipeline design for bronze, silver, and gold stages;
  • 47% cost reduction through SQL optimization, resource restructuring, and usage control
  • Machine Learning Initiatives:
  • Behavioral and satisfaction analysis (clustering and logistic regression);
  • Fraud detection (outlier analysis and neural networks);
  • Campaign optimization using recommendation systems

Key Automation Projects:

  • Automated reporting via WhatsApp (Meta API), email, and Slack;
  • Crawlers for report optimization;
  • Automated data submission applications;
  • Data migration between UX platforms;
  • Automated communication using Meta solutions
  • Big Data Streaming with Kafka:
  • Improved performance, fault tolerance, and reduced analysis time

Data Engineer & AI Engineer

SevenX Gaming
Minas Gerais
01.2024 - 09.2024

Architected a robust AWS infrastructure, utilizing services like IAM, EC2, Glue, Lake Formation, Athena, RDS, Redshift, IAM Center, CloudWatch, Python (Pandas, Polars, PySpark), and Java

Designed scalable and maintainable data models to support business intelligence initiatives and reporting needs.

Achievements include:

  • Creation and management of Data Lakes and Data Warehouses (ETL/ELT);
  • Pipeline design for bronze, silver, and gold stages;
  • 47% cost reduction through SQL optimization, resource restructuring, and usage control

Key Automation Projects:

  • Automated reporting via WhatsApp (Meta API), email, and Slack;
  • Crawlers for report optimization;
  • Automated data submission applications;
  • Data migration between UX platforms;
  • Automated communication using Meta solutions.

Data Engineer

i-Cherry
Curitiba
09.2022 - 04.2024
  • Managed approximately 20 cloud computing projects for different major companies, such as Honda, Hering, SBT, Globo, TIM and Ipiranga.
  • Development of data collection and processing pipelines for clients using Google Cloud Storage, Scheduler, Composer (Airflow), SQL, Python, PySpark, Functions, BigQuery, and messaging (Pub/Sub)
  • Data engineering, data ETL from media platforms, CI/CD, Git, and client management tools using Google Cloud Platform (GCP)
  • Implementation of data warehouse and data lake
  • Tagging and analytics tools, such as Google Analytics, Tag Manager, Optimize, and JavaScript
  • Increased efficiency of data-driven decision making by creating user-friendly dashboards that enable quick access to key metrics.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Data Engineer Intern

i-Cherry
Curitiba
08.2021 - 09.2022
  • Development of data collection and processing pipelines for clients using Google Cloud Storage, Scheduler, Composer (Airflow), SQL, Python, PySpark, Functions, BigQuery, and messaging (Pub/Sub)
  • Data engineering, data ETL from media platforms, CI/CD, Git, and client management tools using Google Cloud Platform (GCP)
  • Implementation of data warehouse and data lake
  • Tagging and analytics tools, such as Google Analytics, Tag Manager, Optimize, and JavaScript
  • Prioritized scalability in all developed solutions, anticipating future growth and accommodating for it through modular design principles.
  • Managed approximately 4 cloud computing projects for different major Brazilian companies, such as TIM e Ipiranga.

Bolsista De Iniciação Científica

CEFET-MG - Centro Federal de Educação Tecnológica de Minas Gerais
Minas Gerais
08.2018 - 07.2020
  • The project is still under development, and my contribution involved the creation of data pipelines and analysis of microdata from socioeconomic surveys such as the Census, RAIS, and PNAD, aiming to generate detailed information about the location and categorization of jobs in the city of Belo Horizonte and the Metropolitan Region of Belo Horizonte (RMBH)
  • The ultimate goal is that this information, along with the pipelines and analyses conducted, will contribute to demonstrating that by integrating accessibility into urban planning practices, cities can become more sustainable
  • The pipeline that enabled the analysis was built in Python, primarily using the Pandas library for data extraction, processing, and storage
  • One of the biggest challenges was processing large volumes of data (Big Data) without having access to scalable computing infrastructure, which required executing all operations locally
  • To address this limitation, we adopted a partitioned processing approach, allowing for efficient data handling without overloading the available computing capacity

In the future, this information can be used as the basis for developing Machine Learning models with the following objectives:

  • Prediction of job demand;
  • Prediction of the unemployment rate;
  • Socioeconomic inequality model;
  • Job segmentation by socioeconomic characteristics;
  • Salary prediction by location and sector;
  • Analysis of regional inequality

Education

Master of Science - Computer Science

Universidade Federal De Minas Gerais
Minas Gerais
09.2024 - Current

Bachelor's degree - Computer Engineering

Centro Federal De Educação Tecnológica De Minas Gerais
Minas Gerais
01.2017 - 06.2023

Integrated High School with a Technical Degree - Technology in Computer Science

Escola Politécnica De Minas Gerais Polimig
Minas Gerais
01.2011 - 01.2013

Skills

Python

Data Pipelines

Machine Learning

Cloud Computing

SQL

Source and version control: git, github

Timeline

Tech Lead - Data Engineer & AI Engineer

SevenX Gaming
09.2024 - Current

Master of Science - Computer Science

Universidade Federal De Minas Gerais
09.2024 - Current

Data Engineer & AI Engineer

SevenX Gaming
01.2024 - 09.2024

Data Engineer

i-Cherry
09.2022 - 04.2024

Data Engineer Intern

i-Cherry
08.2021 - 09.2022

Bolsista De Iniciação Científica

CEFET-MG - Centro Federal de Educação Tecnológica de Minas Gerais
08.2018 - 07.2020

Bachelor's degree - Computer Engineering

Centro Federal De Educação Tecnológica De Minas Gerais
01.2017 - 06.2023

Integrated High School with a Technical Degree - Technology in Computer Science

Escola Politécnica De Minas Gerais Polimig
01.2011 - 01.2013
Thiago Rodrigues da SilvaData Engineer