Member-only story
Push Spark DataFrames to ElasticSearch index
Introduction
The aim of this short blog article is to show you how to migrate data from a Data source that you can connect to via Spark (HDFS for instance) to an ElasticSearch index by leveraging the Elastic-hadoop driver. We will start by a short presentation of ElasticSearch and Spark frameworks and the possible use cases of these two famous Data Engineering tools and then we’ll move forward to the demo.
What is ElasticSearch ?
ElasticSearch is a distributed search and analytics engine. It provides near real-time analytics for all type of data (Structured, Unstructured, Numeric, Geospatial and text data). ElasticSearch allows you to store and index your data efficiently, so that you have optimal performance for data retrieval and aggregation tasks. ElasticSearch is distributed, so it can be scaled-up easily both horizontally and vertically.
ElasticSearch is at the heart of the Elastic Stack composed of Kibana, Logstach and Beats. While Logstash and Beats facilitate collecting, aggregating, and enriching your data and storing it in Elasticsearch, Kibana enables you to interactively explore, visualize…