What is Terraform?

This is an infrastructure as a code, which is equivalent to the AWS CloudFormation, that allows the user to create, update, and version any of the Amazon Web Services (AWS) infrastructure.

Why Terraform?

Terraform utilizes the cloud provider APIs (Application programming interfaces) to provision infrastructure, hence there’re no authentication techniques after, what the customer is using with the cloud provider already. This could be considered as one of the best option, in terms of maintainability, security and ease-of-use.

The motivation behind this post is to, illustrate an example of:

  1. creating an AWS IAM role using terraform.


Photo by Quinten de Graaf on Unsplash
Photo by Quinten de Graaf on Unsplash
Photo by Quinten de Graaf on Unsplash

I was listing down some of the AWS web services that I haven’t personally touched based on, and that’s when I stumbled upon AWS Data Pipeline. Hence I’ll be giving a basic intro on what I have grabbed about this service and also would provide a practical scenario which I did in order to understand its usage.

What is AWS Data Pipeline

Extracted from the AWS docs itself:

AWS Data Pipeline is a web service that helps you to reliably process and offload data between different AWS Compute and Storage services, as well as on-premise data sources, at specified intervals.

In this particular article I’m…


In this blog post, I will be walking through the steps as to how we can utilize the presigned url feature to upload files into AWS S3. Serverless will be used to spin up the necessary AWS resources for this post.

Why do we need a Presigned URL in the first place?

Presigned URL can be used in an instance such as the customer/user wants to upload a file into an S3 bucket, of which he/she doesn’t have access privileges to do so. Hence this mechanism can be used as a secured way of allowing unauthorized users to perform upload/download, into or from S3. This releases the burden from the user’s…


Steps:

  1. Creating a job to submit as a step to the EMR cluster.

1. Creating a job to submit as a step to the EMR cluster

As for this post, I’m going to create a simple Java program which copies a file from one S3 bucket into another. I’m utilizing the aws-java-sdk, in order to access the S3 related APIs’.


What is Docker?

Docker is a mechanism that is created to assist both system administrators and the developers , making it a component of many DevOps tool chains. From the developers’ perspective, they can focus on writing code without having any concerns about the system, that it will eventually be running on. It also packages, provisions and runs containers, which are independent of the Operating System.

What is a Docker container?

This is a normalized component, which can be constructed on the fly in order to deploy a specific environment or even an application. A container basically enfolds an application’s software into a hidden box with everything, the…


What is Spark?

Spark is considered as one of the data processing engine which is preferable, for usage in a vast range of situations. Data Scientists and application developers integrate Spark into their own implementations in order to transform, analyze and query data at a larger scale. Functions which are most related with Spark, contain collective queries over huge data sets, machine learning problems and processing of streaming data from various sources.

What is PySpark?

PySpark is considered as the interface which provides access to Spark using the Python programming language. PySpark is basically a Python API for Spark.

What is EMR?

Amazon Elastic MapReduce, as known as EMR…


INTRODUCTION

Data lake is a single platform which is made up of, a combination of data governance, analytics and storage. It’s a secure, durable and centralized cloud-based storage platform that lets you to ingest and store, structured and unstructured data. It also allows us to make necessary transformations on the raw data assets as needed. A comprehensive portfolio of data exploration, reporting, analytics, machine learning, and visualization on the data can be done by utilizing this data lake architecture.

DATA LAKE VS DATA WAREHOUSE

While a data warehouse can also be a large collection of data, it is highly organized and structured. In a data warehouse…


Introduction to Machine-Learning:

Machine Learning is considered as the execution of utilizing the existing algorithms, in order to inject data, grasp from it, and then make a resolution or forecast about something. So rather than developing software procedures with a certain set of directives to achieve a specific task, the machine is instructed using huge amounts of data and algorithms that provides it the capability to absorb, how to accomplish the endeavour.

As an example, one type of algorithm is a classification algorithm. This basically injects data into different clusters or rather segments. …

Kulasangar Gowrisangar

Machine Learning has kept me thriving…https://about.me/kulasangar

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store