Summary
Overview
Work History
Education
Skills
Software
Languages
Timeline
Generic

José Pena

Senior Data Scientist, LLM Engineer
São Paulo

Summary

Senior Data Scientist currently working at Epharma. I use my background as a mathematician to create Machine Learning and AI models focused on business details in order to get insights, optimize processes across the company and create innovative solutions to problems, boosting performance KPIs and fostering a data-driven culture across organizations. Experience with statistical analysis, data preparation, Machine Learning and Deep Learning techniques and data visualization. Deeply curious and keen to learn, I have a passion for Mathematics and sciences in general, and I am always eager to find answers to questions I am asked.

Overview

4
4
years of professional experience
10
10
years of post-secondary education

Work History

Senior Data Scientist

Epharma
05.2024 - Current

I have been working in a position of leadership in the following 3 projects:

  • Development of Generative AI solution (using Open AI's ChatGPT and Microsoft Visual Studio APIs) that will read and interpret hand-written or digital medicine prescriptions in real-time through an API present on Epharma's software, aiming to reduce the risk of frauds and making discount authorization more precise and secure (Current status: deployment in production, approved by business stakeholders).
  • Development of Generative AI solution that detects adverse events regarding use of medicine on chat bot conversations with clients and report them to respective stakeholders' team for evaluation (Current Status: data preparation; Next: modelling using prompt engineering or fine tuning of LLMs)
  • Segmentation of patients that use a certain drug based on usage profile and medical features aiming to increase customer retention and impacting on supply estimates (Current status: under approval by stakeholders).

In general, I am responsible for:

  • Optimization and deployment in production of ML models that feed an Epharma's software present on 40.000+ pharmacies across Brazil that provides discount on medicinal goods aiming to gain loyalty of patients, impacting the lives and health of millions of people.
  • Technical mentoring of Data Scientist in junior positions through their projects, promoting professional growth and skill development across Epharma's DS team.
  • Creation of data-oriented culture on Epharma, presenting complex findings to non-technical stakeholders through clear visualizations and concise reports.

Data Scientist

Mastercard
6 2022 - 04.2024

As a Data Scientist at Mastercard I have worked as a technical leader on a range of both internal and external projects impacting positively on the user experience of 10+ million clients across several big banks banks in Brazil improving KPIs such as churn rate (-10% in one bank) and card activation rate (+8%), such as:

  • Development of Machine and Deep Learning supervised models using algorithms in contexts of classification (e.g. credit risk), regression (e.g. credit limit bank should give for customers) and time series modelling.
  • Automation and deployment in production of such models (in AWS and GCP environments), in order to to reduce risk of unexpected errors.
  • Development of unsupervised ML models using K-means and Expectation-Maximization algorithms (e.g. clusterization of customers based on their spending behavior) aiming to personalize CRM initiatives.
  • Data Analysis in order to extract insights, leading to recommended personalized actions (e.g. marketing efforts, credit card limit policy adjustments) aiming to enhance customers’ experience and improve client’s KPI’s.

Data Scientist

HDI Seguros
05.2020 - 05.2022

As a part of the Analytics team of an insurance company I have:

  • Developed prediction models for eventualities such as floods and thefts,
  • Performed AB tests to access models’ efficiency and
  • Built data visualization tools (e.g., KPIs dashboards, maps) to present results from models developed in the Analytics department

leading to improved risk identification and consequently increasing efficiency of the underwriting processes of the company.

Education

Online Certificate - Statistics And Data Science

Udemy
Online
05.2001 -

Ph.D. - Mathematics

University of Campinas
Campinas, Brazil
03.2016 - 03.2020

Master of Science - Mathematics

University of Campinas
Campinas, Brazil
03.2014 - 03.2016

Bachelor of Science - Mathematics

University of Campinas
Campinas, Brazil
03.2010 - 12.2013

Skills

Generative AI (LLMs, Langchain, Hugging Face, RAG, ChatGPT, LlaMa, Gemini, API calls, Fine Tuning)

Deep Learning (Neural Networks, TensorFlow, PyTorch)

Machine Learning Techniques and Algorithms (Supervised Learning: XGBoost, Random Forest, Catboost, SVM, Multiple Linear Regression; Unsupervised Learning: Kmeans, Expectation- Maximization, Scikit-Learn)

Optimization (Gradient Descent, Simmulated Annealing)

Data Visualization

Agile Methodologies

Software

Python, PySpark C, R (scripting languages)

Scikit-Learn (Python library for ML)

TensorFlow, Keras, PyTorch (Python libraries for Deep Learning)

SQL

AWS (Amazon Web Services)

GCP (Google Cloud Services)

Microsoft Office (Excel, Power BI)

SAS

Databricks

Azure

Hadoop

Languages

English (fluent)
Italian (fluent)
Portuguese (mother tongue)

Timeline

Senior Data Scientist

Epharma
05.2024 - Current

Data Scientist

HDI Seguros
05.2020 - 05.2022

Ph.D. - Mathematics

University of Campinas
03.2016 - 03.2020

Master of Science - Mathematics

University of Campinas
03.2014 - 03.2016

Bachelor of Science - Mathematics

University of Campinas
03.2010 - 12.2013

Online Certificate - Statistics And Data Science

Udemy
05.2001 -

Data Scientist

Mastercard
6 2022 - 04.2024
José PenaSenior Data Scientist, LLM Engineer