8 minute read

My journey to statistics and machine learning

First research projects

Classes préparatoires aux Grandes Écoles d’Ingénieurs

As someone with strong interests in science and mathematics, I moved from Lyon to Paris when I was 18 years old to pursue French Classes Préparatoires aux Grandes Écoles d’Ingénieurs at Lycée Janson de Sailly, with a strong focus in mathematics, physics and computer science. I could work on my first research project exploring chaos theory in mathematics and physics through the logistic map and the forced oscillations pendulum (presentation).

Science at École Centrale de Lyon

I then joined École Centrale de Lyon in 2019 and specialised in applied mathematics, numerical methods, statistics, probability and machine learning while in the same time I was following core modules in science such as quantum physics or chemistry and also in engineering such as electrical engineering or fluids mechanics which provided me with a comprehensive understanding of various scientific fields. I have always been fascinated by modelling things to understand and explain how they work, for example in physics, chemistry or biology.

My very first machine learning project

During my first year at École Centrale de Lyon, I conducted my first research project which consisted in designing a machine learning approach for solving a selection and ranking problem, supervised by Pr Christian de Peretti. We reproduced state-of-the-art algorithms from scratch instead of using libraries, including principal components analysis [1], self-organising maps [2] (also referred as Kohonen networks) and a novel hierarchical clustering approach based on graph theory [3]. We then applied the end-to-end approach to financial data to solve the problem of building an efficient investment portfolio, in which one first needs to pick a finite number of assets from a possibly (very) large set and then rank these assets to compute their respective weights. In fact, this is a good example where usual techniques for portfolio optimisation can not be applied since estimating covariances in high dimensions is challenging, referred as curse of dimensionality [4]. This research project was awarded both by academia, with our work being published at the university library [5] and by industry, being ranked first at a national competition organised by Google and others. I would approach the problem in many different ways now that I have more background in the field, but I do believe it was a very good starting point!

Studying machine learning

Statistics and Machine Learning at École Centrale de Lyon

My introduction to machine learning was through following the Centrale Lyon Machine Learning module taught by Pr Yohann de Castro and shared with École Normale Supérieure de Lyon. The courses were given by diving deep in fundamental concepts such as linear models, regularisation, decision trees, ensemble methods, dimensionality reduction or clustering. I really liked the way the lectures were given by explaining how things work in a big picture and then exploring the mathematical details via tutorials, and I think this is what gave me passion for understanding how machine learning algorithms work and how to reproduce them. Pr de Castro was also responsible of the core module in Probability and Statistics and I will always remember about the introduction example on interarrival bus waiting time estimation! It gave me a vision of statistics as a tool to model and extract information from collected data.

MSc in Statistics at Imperial College London

I realised that statistics was a very wide field in mathematics and I wanted to dedicate more time studying it by pursuing the MSc in Statistics at Imperial College London. I learned more about statistical inference, computational and applied statistics, machine learning and deep learning. Among my favourite modules were the Machine Learning course taught by Dr Sarah Filippi and the Deep Learning with TensorFlow module taught by Dr Kevin Webster. This last course really showed me how flexible and complex deep learning architectures can be but in the same time explaining details of basic tools like automated gradients computation or backpropagation. This course covered deep learning fundamentals, but also recent advanced concepts such as computer vision, sequential modelling like natural language processing and generative approaches with their mathematical background and coding implementation.

Learning new concepts every day

I am fueled by learning new things every day, so even after completing the MSc, I keep updating my knowledge in machine learning and deep learning by reading and reproducing scientific papers. I am interested in models explainability and the idea of building unified frameworks for explaining the output of machine learning algorithms such as LIME [6], SHAP [7] or counterfactual explanations [8]. I am reading about novel Bayesian approaches for hyperparameter tuning using Gaussian processes, which is to me more elegant than a classic grid based cross-validation as coded in skicit-optimize. I am curious about novel advances in natural language processing based on transformers and how they are used in large language models such as GPT [9] to generate text or BERT [10] to encode text. Novel advances in computer vision are also exciting, such as diffusion models [11] for image generation. In fact, I find really fascinating how these fields are now overlapping, for example with Stable Diffusion model relying on prompt conditioning using natural language processing, or another example with how transformers can also be used directly for image classification instead of classic convolutional architectures [12]. Overall, I am aware that things move very fast and assimilating them little by little helps me to integrate them into my daily work at Amazon.

Master’s Thesis on a novel approach for changepoint detection

The changepoint detection problem

During the summer term at Imperial, I conducted my Master’s Thesis research project focusing on online changepoint detection and supervised by Dr Dean Bodenham within the Statistics section at Department of Mathematics. In a nutshell, changepoint detection refers to the problem of identifying times at which properties of a time series change. This well-known problem in statistical analysis has applications in a wide range of fields such as health monitoring, cybersecurity, finance or climate. Offline changepoint detection aims to detect changepoints retrospectively in the past, whereas online changepoint detection is expected to detect changepoints sequentially and as fast as possible. Some algorithms are parametric in the sense that they make assumptions on the sampling distribution, whereas other methods are nonparametric.

Novel approach, package and publication

In this thesis, we review state-of-the-art parametric and nonparametric streaming changepoint detection algorithms [13-16] as well as a new approach based on sequentially learning neural networks [17]. While some reviewed methods do have extensions to multivariate data streams, many do not. We propose a novel approach [18] to detecting changepoints in multivariate data streams containing changes in sampling distribution and propose a default deep learning convolutional architecture that does not require any pre-training or fine-tuning. We also introduce a modified version of EWMA [14] algorithm to be used along with the neural network in order to detect changepoints in an online manner. Overall, our contribution shows that we can summarise a multivariate changepoint detection problem into a univariate bump detection problem. We compare it with well-known methods on synthetic datasets and also apply it to real world data streams. The proposed method is shown to successfully detect changepoints in simulated and real-world data, as well as in univariate and multivariate time series. I am currently writing a paper to summarise theoritical foundations and results of this new approach on images (computer vision) or text data (natural language processing), where classic methods would not be as efficient due to the complexity and dimensionality of the inputs. In the same time, I developed and published an open-source Python package, ocpdet, to make them easily accessible and reproducible. Indeed, I believe software development is just as important as theory in research, as it allows more people to benefit from novelty. Developing a such package was a great experience to write clean and high-quality code that can be understood, re-used and even improved by the community. The code is written in a scikit-learn API style and comes along with a documentation written in a Jupyter Notebook format for simplicity.

Working as a scientist at Amazon

In November 2022, I joined Amazon in Edinburgh as a Data Scientist to work on machine learning and targeting for online advertising as my first job. I chose to work in Edinburgh because it is known as one of the best places to work and do research in Machine Learning in Europe, with a strong connection between academics (Edinburgh University) and AI/ML research labs and companies. In a big picture, my work and research at Amazon focuses on extracting useful contextual information and representations of users in order to predict their interests. I therefore work on a wide range of problems such as representation learning, statistical modeling and experimentation. This is very challenging and exciting since it requires innovative solutions to use the minimal amount of user data. I am learning a lot every day on representation learning architectures, language models and even how to productionise machine learning models to do real time inference at Amazon scale.

References

[1] Karl Pearson. 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine.

[2] Teuvo Kohonen. 1982. Self-organized formation of topologically correct feature maps. Biological Cybernetics.

[3] Harald Lohre et al.. 2020. Hierarchical Risk Parity: Accounting for Tail Dependencies in Multi-Asset Multi-Factor Allocations. Machine Learning and Asset Management.

[4] Richard E. Bellman. 2010. Dynamic Programming. Princeton Landmarks in Mathematics and Physics.

[5] Victor Khamesi et al.. 2020. Quantitative Management of Fund of Funds using Machine Learning. École Centrale de Lyon.

[6] Marco Tulio Ribeiro et al.. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ACM.

[7] Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. NeurIPS.

[8] Riccardo Guidotti. 2022. Counterfactual explanations and how to find them: literature review and benchmarking. Data Mining and Knowledge Discovery.

[9] Alec Radford et al.. 2019. Language Models are Unsupervised Multitask Learners.

[10] Riccardo Guidotti. 2022. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv.

[11] Robin Rombach et al.. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv.

[12] Alexey Dosovitskiy et al.. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CVPR.

[13] Ewan Page. 1954. Continuous Inspection Schemes. Biometrika.

[14] S. W. Roberts. 1959. Control Chart Tests Based on Geometric Moving Averages. Technometrics.

[15] Ryan Adams and David MacKay. 2007. Bayesian Online Changepoint Detection. arXiv.

[16] Gordon Ross. 2015. Parametric and Nonparametric Sequential Change Detection in R: The cpm Package. Journal of Statistical Software.

[17] Hushchyn et al.. 2020. Online Neural Networks for Change-Point Detection. arXiv.

[18] Victor Khamesi. 2022. ocpdet: A Python package for online changepoint detection in univariate and multivariate data. Zenodo.