Take your Spark Big Data workloads to the next level by leveraging the power and flexibility of Kubernetes.

In this article, we are going to learn why it may be relevant to use Spark on Kubernetes and how to do it.

For the sake of our demo we will be using a Minikube Cluster to run a basic Spark-Pi job.

Why use Spark on Kubernetes ?

Apache Spark is a framework that can quickly perform processing tasks on very large data sets, and Kubernetes is a portable, extensible, open-source platform for managing and orchestrating the execution of containerized workloads and services across a cluster of multiple…

In this article, I provide feedback about my experience with AWS Machine Learning Specialty Exam (which I cleared with 95%).

Why you should take it ?

Take this exam if you want to boost your Data Science / Machine Learning Engineering career within the world of AWS or any other cloud provider. The high-level concepts learned while preparing for the ceritifcation can be easily applied to other cloud providers as well.


Since it’s first 1.x release, Spark became the de facto Big Data unified processing Engine. 9 of 10 companies chose Spark for their Data processing thanks to its speed, ease of use, modularity and extensibility. The aim of this series of articles is to:

  • Introduce you to the evolution of Big Data and how Spark came to existence.
  • Provide a high-level overview of all the components of Spark’s distributed architecture.
  • Provide best practices regarding Spark jobs code optimization.
  • Provide best practices for tuning, debugging and inspecting Spark applications

We assume that you are already familiar with Spark Structured High Level…

La crise du Coronavirus a été pour nous tous une expérience inédite qui va laisser des traces à jamais. Pendant deux mois, nous avons renoncé à ce qu’il y a de plus fondamental dans notre mode de vie : Notre liberté de circulation. Nous y avons renoncé et nous nous sommes confinés chez nous, malgré nous. Plus de sorties entre amis. Plus de réunions familiales. Plus de balades. Plus de soirées. Plus de cafés ou de restos. Plus de vacances.

Heureusement pour nous, le virus disparait progressivement. Ses mutations le rendent de moins en moins dangereux. Le nombre de nouveaux…

I am currently Taking Deep Learning Specialization which I highly recommend by the way for people who are looking for a great place to start deep learning. The Specialization is offered on Coursera and I have just finished their third Course : Structuring Machine Learning Projects.

The course explains how to be systematic when it come to thinking about Deep Learning projects and gives you an array of tools that’ll help you to make the right decisions and move forward with your machine learning project.

Most of the ideas of this course are not included in university deep learning…


In this tutorial, you will learn how to build your own Meme Detector from scratch like this one. By the end of the tutorial, you will be able to:

  • Create your own dataset using Google Image.
  • Choose and retrain the dense layers of a CNN architecture.
  • Deploy your model as a web app to

Libraries and major dependencies

  1. Google Colab for GPU usage
  2. Fastai v 1.0.52
  3. PyTorch v1

Fastai is an amazing library built on top of PyTorch to make deep learning more intuitive and make it require less lines of code. …


I’m currently taking’s deep learning course. To apply the concepts I learned in the first lesson, I used a really small dataset created during a Machine Learning Lab at ENSEEIHT to do image classification.

The dataset consists of 264 photos of hands doing “Paper”, “Rock” and “Scissors” gestures as follows :

The photos have been taken using a smartphone. It involved a group of 9 students (I was part of the dataset creation process) pursuing the ENSEEIHT (Toulouse) MS in Data Science. …

Disclaimer: I am new to machine learning and also to blogging (First). So, if there are any mistakes, please do let me know. All feedback appreciated.

If you go to google and type “Best online Machine Learning classes”, chances are, Andrew Ng’s Machine Learning class offered at Coursera will be on top for most of the rankings out on the Internet. With more than 2.5 …

Chouaib Nemri

Data and AI specialist. Data Engineer @ Devoteam. Data Science & Machine Learning Instructor @ OpenClassrooms

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store