Member-only story

Push Spark DataFrames to ElasticSearch index

Chouaieb Nemri
4 min readSep 7, 2021

Introduction

The aim of this short blog article is to show you how to migrate data from a Data source that you can connect to via Spark (HDFS for instance) to an ElasticSearch index by leveraging the Elastic-hadoop driver. We will start by a short presentation of ElasticSearch and Spark frameworks and the possible use cases of these two famous Data Engineering tools and then we’ll move forward to the demo.

What is ElasticSearch ?

ElasticSearch is a distributed search and analytics engine. It provides near real-time analytics for all type of data (Structured, Unstructured, Numeric, Geospatial and text data). ElasticSearch allows you to store and index your data efficiently, so that you have optimal performance for data retrieval and aggregation tasks. ElasticSearch is distributed, so it can be scaled-up easily both horizontally and vertically.

ElasticSearch is at the heart of the Elastic Stack composed of Kibana, Logstach and Beats. While Logstash and Beats facilitate collecting, aggregating, and enriching your data and storing it in Elasticsearch, Kibana enables you to interactively explore, visualize…

--

--

Chouaieb Nemri
Chouaieb Nemri

Written by Chouaieb Nemri

Generative AI @ Google - xAWS - Georgia Tech Alumni - Opinions are my own

No responses yet