What is Hadoop?
Following my high-level write-up of Hadoop and Big Data, this article will
present each of the components or projects that make up Hadoop with a
technical description of each.
First, what is Hadoop?
Hadoop stores and processes large volumes of a wide variety of data that
changes rapidly. It analyses and summarizes the data. For example: census of
a city, web page analytics, threat analysis, risk models, network failures,
Hadoop is redundant and reliable, powerful and focused on batch processing.
Hadoop divides a large data processing job into many smaller tasks that can
be distributed across all the nodes
Hadoop comprises two main components:
MapReduce: The task to analyse the data and summarize the results
HDFS: The distributed file system, on commodity server hardware, that
contains the data.
On each server there is a task tracker and a data nod... (more)
I previously wrote a review of the Microsoft Azure public cloud and
included a comparison between Azure and AWS (Amazon Web Services) and will
now compare OpenStack and VMware vCloud. For a review of IaaS (Infrastructure
as a Service) see my blog post and video.
This table provides a simple and high level comparison of OpenStack and
Feature OpenStack VMware vCloud Virtualization layer Type 2 virtualization -
Libvirt layered on top of Linux. Supports various hypervisors: XEN, KVM,
HyperV... Type 1 virtualization - bare metal; vSphere hypervisor only.
Management Open API... (more)
Cloud computing is a general term for computing services delivered over the
Internet, as opposed to computing services hosted inside your own network; on
your own premises.
These computing services can be as simple as Internet based email or as
complex as a Customer Relationship Management (CRM) application.
Cloud computing offers cost savings, because users don't have to invest
capital budget to purchase hardware and software, nor expend the operating
costs of electric power, space and cooling for the hardware and employee
costs of maintaining the hardware and software.
The maj... (more)
In prior blog posts, I described Infrastructure as a Service (IaaS) and
Platform as a Service (PaaS).
If I use IaaS I get servers onto which I can load software and applications
which I then maintain, though I don't need to maintain the hardware. I can
customize the applications and software running on the servers, at will. If I
use PaaS, I get a platform of ready to use web servers, application servers,
databases etc. I write my own software application and host it at the PaaS
provider. I maintain the software I write, but not the application servers,
databases or ha... (more)
Traditional IT environments that are built using physical servers can only
scale and grow by buying new hardware and software and taking time to install
and rack the hardware, configure the software and the application. If/when
the excess capacity is not needed the servers stand idle consuming power,
cooling and rackspace. This is inefficient and a waste of money.
Amazon Web Services (AWS) allows customers to scale using elastic demand.
Just like a rubber elastic band stretches to accommodate more items , AWS
provides elastic computing to allow a customer to scale up (or down); to... (more)