Dr. Florian Hinzpeter
Collaborative and pragmatic data science consulting, from Big Data and Machine Learning to Responsible AI and beyond.
About me
Hi, I’m Florian! I’m a passionate and experienced data science consultant specialized in machine learning, cloud data platforms, big data technologies, and responsible AI. My mission is to help organizations leverage their data to make better decisions and drive growth.
Over the years, I’ve had the privilege of working on some truly exciting data projects in a variety of industries, from health insurance to automotive. Whether I’m building predictive models, designing data pipelines, or performing exploratory data analysis, I always bring my best to the table and strive to deliver exceptional results. One of my heartfelt concerns is to ensure that the impact of machine learning on our society is sustainable and positive. This is why I’ve made it my mission to help companies create more reliable, responsible, and ethical machine learning systems. To achieve this, I’ve acquired extensive knowledge in the areas of bias & fairness in machine learning, explainable AI, and modern ML Ops best practices. I’m truly excited about the cutting-edge technologies that help us to make machine learning fair, transparent, and trustworthy. So, if you’re looking to audit your machine learning models for discrimination or wish to gain a deeper understanding of how your system arrives at decisions, I’m here to help! I am also highly experienced in assessing the level of compliance with respect to the upcoming AI Act of the European Union.
Before starting my career as a data consultant in 2019, I worked as a researcher in theoretical physics at the Technical University in Munich, where I also earned my PhD in 2018. During my research time I had the opportunity to study spatial aspects of biochemical reactions. The discoveries I made were truly fascinating and even led to a publication in Nature Physics.
My services
Data Science & Machine Learning
Custom development of data science solutions that use statistical modelling and machine learning to extract actionable insights from complex data and help you make data-driven decisions.
Cloud
Data Platform
Holistic design and implementation of modern data platforms that enable efficient data storage, processing and analysis. This forms the basis for the development of scalable data products.
Big Data
Engineering
Expert advice on your data infrastructure. From building data ingestion and ETL pipelines to ensuring data governance and quality solutions, I can help you orchestrate and manage your data to unlock its full value.
Responsible
Artificial Intelligence
Tailored advice on developing machine learning systems that are robust, transparent and free from discriminatory bias. My expertise ensures that your systems are compliant with the EU AI Act.
Portfolio
Project Description |
In this project I leveraged Explainable AI indicators to provide model explanations to model end-users (internal staff). Local model explanations were visualized on a dashboard. This allowed the users to investigate the model decision making on single data instances. |
Industry |
Automotive Industry |
Project roles |
Explainable AI Expert, Senior Data Scientist |
Tasks & Technologies |
For model explanations we leveraged the SHAP library together with its rich visualization options. For dashboard development we used streamlit. Software development was done in Python. Deployment was done with Docker. |
Project Description |
In this role, I helped a client build a complete data science platform in Microsoft Azure and Databricks. |
Industry |
Automotive Industry |
Project roles |
Cloud Architect, Solution Architect |
Tasks & Technologies |
Deployment of a Azure Databricks Data Science Platform including all required Azure resources (Azure Data Factory, Azure Data Lake, Azure Key Vault, Azure Databricks, MSSQL). Provisioning and management of Azure and Databricks resources via Infrastructure as Code (IaC) using Bicep templates, Azure CLI and Databricks CLI as well as implementation of a Continuous Deployment pipeline in Azure DevOps. Design of Data Ownership concepts and implementation of Data Permission Groups using ACLs on the Azure Data Lake Storage and Azure Active Directory (AAD). Synchronization of AAD and Databricks Identity Managamenet via SCIM. |
Project Description |
The aim of this project was to assess the data science platform and workflow of a company with respect to responsible AI. The assessment consted of reviewing the maturity of the company in the dimensions, (1) Fairness & Bias of machine Learning models, (2) Transparency & Trust of machine learning solutions, (3) technical reliability, (4) data governance and data quality, (5) Ethical and Sustainable AI |
Industry |
Wealth Management |
Project roles |
Explainable & Responsible AI Expert |
Tasks & Technologies |
Creation of a self-assessment framework including a survey to determine the maturity level in the area of Responsible AI, tailored to the requirements of the EU AI Act. |
Project Description |
The goal of this project was to integrate On-premise data into the Azure Cloud using the Lakehouse paradigm. |
Industry |
Automotive Industry |
Project roles |
Explainable AI Expert, Senior Data Scientist |
Tasks & Technologies |
Integrate on-premises data into a cloud-hosted data lake with ETL and ELT using Azure Data Factory and Databricks. Implement data pre-processing and aggregation pipelines according to the Lakehouse architecture using the Spark and Photon engines along with SparkSQL and PySpark. Implement data-governance using the delta lake engine and Hive Metastore. |
Project Description |
In this project, I developed the data science and software components for a machine learning system to determine recommendations for health insurance products for policyholders. |
Industry |
Health Insurance |
Project roles |
Lead Software Developer & Senior Data Scientist |
Tasks & Technologies |
Software Development with Python, The suitability of a product for an insured person was determined using Graphical Probabilistical Models (Bayesian Networks), the software was unit- and integration-tested with Pytest, for version control and CI/CD piplines Gitlab was used, for dependency management and virtual environment organisation Poetry was used. |
Project Description |
The aim of this project was to predict the length of stay of insured patients in hospital using machine learning methods. With accurate predictions, the insurance company was able to control the length of stay in a more targeted way. |
Industry |
Health Insurance |
Project roles |
Lead Data Scientist, Senior Software Developer |
Tasks & Technologies |
Software development in Python,feature engineering and machine learning pipline assembly with scikit-learn, Gradient Boosted Tree Regression with Catboost, Oversampling of minority samples (SOMTE_NC) with imbalanced-learn, code version control with Gitlab, dependency management and virtual environment organisation with Poetry |
Project Description |
The aim of this project was to modernise the methodology of estimating used car prices. To do this, large amounts of data from real transactions and online exchanges were used to train a machine learning system. The machine learning system was then used to assist the valuation experts. |
Industry |
Automotive Industry |
Project roles |
Lead Data Scientist, Machine Learning Engineer |
Tasks & Technologies |
Software development in Python, Gradient Boosted Tree Regression with Catboost, API design and implementation with FastAPI, Data preparation with Spark, Model versioning and experiment tracking with MLflow, Databricks was used as the data platform. |
Project Description |
In this project I leveraged topic modeling approaches (Latent Dirichlet Allocation) to extract generic car accident scenarios using car repair data that contained information about repaired or replaced parts and working positions. |
Industry |
Automotive Industry |
Project roles |
Lead Data Scientist, Senior Software Developer |
Tasks & Technologies |
Software development in Python, training of Latent Dirichlet Allocation with SparkML, deployment with docker, model version and orchestration with MLflow |
Project Description |
The goal of this project was to design and provision a generic pipeline that automates the process of continuous integration and continuous deployment (CI/CD). |
Industry |
Health insurance |
Project roles |
DevOps Expert, Software Developer |
Tasks & Technologies |
The CI/CD pipeline was implemented in Gitlab using Gitlab Runners. For continuous integration unit- and integration-tests, linting, and docker-linting was implemented. For continuous deployment load, acceptance, performance tests were implemented. |
Project Description |
The goal of this project was to use natural language processing techniques to classify emails into different categories of related content. |
Industry |
Automotive Industry |
Project roles |
Data Scientist |
Tasks & Technologies |
For email preprocessing and tokenization we used the NLTK python library, for modelling we used a support vector machine from Scikit-Learn. |
Project Description |
This workshop consisted of three one-hour sessions. The goal of this workshop was to train data science consultants in the area of auditing and mitigating discriminatory bias in machine learning models. Each workshop session consisted of a theoretical part and a hands-on coding part. |
Industry |
Consulting |
Project roles |
Explainable & Responsible AI Expert |
Tasks & Technologies |
In this workshop we presented different bias metrics and way of how to quantify discriminatory behavior. Furthermore we presented various ways of how to mitigate those biases. For the coding demos we used the python packages Fairness 360 and Aequitas. |
Project Description |
This project consisted of introducing ML Ops best practices such as code, data, and model versioning as well as automated deployment pipelines |
Industry |
Automotive Industry |
Project roles |
Machine Learning Engineer |
Tasks & Technologies |
For code versioning we used Git as a software development best practice, for model versioning we used MLFlow and its model registry functionality. For data versioning we used Delta Lake. The deployment pipelines were build using Azure DevOps. Model containerization was done with Docker. |
Certifications
Databricks Certified Associate Developer for Apache Spark 3.0
Databricks
Issued Feb 2023
CI/CD YAML Pipelines with Azure DevOps
Udemy
Issued Jan 2023
Databricks Certified Data Engineer Associate
Databricks
Issued Jan 2023
Databricks Certified Machine Learning Associate
Databricks
Issued Dez 2022
Deployment of Machine Learning Models
Udemy
Issued Jan 2022
PyTorch for Deep Learning and Computer Vision
Udemy
Issued Mar 2021
Build Better Generative Adversarial Networks (GANs)
deeplearning.ai
Issued Dez 2020
Fundamentals of Reinforcement Learning
University of Alberta
Issued Oct 2020
Build Basic Generative Adversarial Networks(GANs)
deeplearning.ai
Issued Nov 2020
Applied Plotting, Charting & Data Representation in Python
University of Michigan
Issued Apr 2020
.
Introduction to Big Data
UC San Diego
Issued Mar 2020
Big Data Modeling and Management Systems
UC San Diego
Issued Mar 2020
Version Control with Git
Atlassian
Issued Feb 2020
Applied Machine Learning in Python
University of Michigan
Issued Feb 2020
.
Introduction to Data Science in Python
University of Michigan
Issued Nov 2019
SQL Bootcamp
Udemy
Issued Oct 2019
Deep Learning Specialization
deeplearning.ai
Issued Jul 2019
Machine Learning Specialization
deeplearning.ai
Issued April 2019