Bigdata General

EEDC Seminars – Big and Open Data

Last Thursday (April, 19th 2012) was celebrated in Barcelona the first day of the EEDC Seminars, inside the Master in Computer Architecture, Networks and Systems (CANS-UPC). The first day was focused in Big and Open Data,  you can see the program here.
The seminar was divided in two different presentations:

  •  Big Data by Marc de Palol (@lant)

Among the interesting things covered, he talked about the three main actions in Big Data: Inserting, processing and serving/using the data. During the presentation, several free software project references were enumerated by Marc. A little piece of the list :  Hadoop, Hive, Nosqldb, Memcache, Thrift, Mahout,…

He explained that OpenData is the idea that certain data (like scientific and government data) should be freely accessible to everyone to use and republish as they wish without restrictions from copyright, patents or other mechanisms of control. David Sánchez explained that actually in Spain a business model based on OpenData analysis is already a challenge. Here you can see a useful example of Big Data application.

In few days, the next  EEDC Seminar will cover IaaS, SaaS and Mobile Apps topics.  See you there!

General Social

Services following the Moon

We present an example of a very interesting technique: The follow the moon concept

The Follow the moon concept means to reduce energy consumption and expenditure taking advantage of nighttime temperatures and lower electricity rates, and so have their computing resources chase in day/night boundary. i.e., migrate to datacenters where it is night time. After all, always it is night somewhere in the world.

Although, these techniques have certain limitations because it is necessary to have similar settings, requires visibility among others and the bandwidth latency increases. Well, this theory must be carefully studied as many companies have done.

The Key technologies that can follow the moon are: virtualization, modularization, consolidation and outsourcing appropriate. These are key strategies to achieve eco-efficient IT where cloudadmins can make it happen.

Science fiction or reality?

In detail

Memoria RAM Flexible y sin coste adicional

Receta para el Cloud Admin :
El otro día en el trabajo, estábamos compilando una aplicación dentro de una máquina virtual que teníamos en el Cloud. Y nos encontramos que la compilación iba lenta y fallaba por falta de memoria RAM.
Por lo tanto hicimos una pequeña chapuza para tener un poco  más de RAM de forma temporal. Ya que la solución de ampliar la máquina no nos gustaba por su coste adicional.
La solución es muy básica. Se basa en asignar en caliente memoria SWAP de una forma muy fácil, cómoda, flexible y sobretodo sin coste adicional.
Teníamos lo siguiente:

free -m
             total       used       free     shared    buffers     cached
Mem:           486        125        360          0          1          6
-/+ buffers/cache:        118        368
Swap:            0          0          0
Creamos un disco dentro de nuestro volumen de 1GB.
# dd if=/dev/zero of=disk.swap bs=1M count=1000

Convertimos este disco como disco swap.

# mkswap disk.swap
Setting up swapspace version 1, size = 1023996 KiB
no label, UUID=94ace6ea-2c95-4d47-8287-a2b062b4ce52
Y luego activamos el disco swap creado:
# swapon disk.swap
Por lo tanto ahora podemos trabajar mucho mejor:
# free -m
             total       used       free     shared    buffers     cached
Mem:           486        479          6          0          0        350
-/+ buffers/cache:        128        357
Swap:          999          0        999

Como vemos la solución es muy sencilla y nos damos cuenta que a veces que nos complicamos para encontrar soluciones que resultan más fáciles de resolver. Muchas gracias a Jummi por esta idea, gracias a él no nos hizo falta reiniciar nuestra máquina virtual para ampliar la RAM o crear otra máquina duplicada para compilar la aplicación.

General Social

Traditional On-Premise vs Cloud Computing:

The Cloud computing business motivation is that the resources solutions on demand promise greater flexibility, dynamic, timely and green solution than traditional on-premise computing.

Therefore, we must bear in mind that migrate certain parts or all of a classic on-premise IT to Cloud can provide scalability, can reduce the costs of physical growth, reduce costs and reduce energy use .

On-premise computing needs an initial capital investment, maintenance and the costs of future updates. In contrast Cloud does not need an important initial cost so it has a lower initial investment because Cloud offer elasticity and pay-as-you go cost model.

It is interesting to find which solution is better. But it is more interesting the utilization into both solutions together to keep the best features of each .

In the paper there is an interesting analysis of cost and performance between “Traditional On-Premise” with Cloud Computing classifying the various types of costs CapEX (CAPitalEXpenditures) and OpEX (OPerationalEXpenditures) depending on the attribute to be analyzed ( Infrastructure, Business, Physical Resources, Network, Performance, Energy, budget, etc.). In short we can discuss that in Cloud Computing there are more OpEx and in the traditional on-premise there are more CapEx.

Nowadays generally on-premise infrastructure run with low utilization, sometimes it goes down up to 5 to 10 percent of average utilization. Data centers that utilize Cloud technologies are more efficient than traditional data centers. Energy is usually lost through server under utilization, because most of the time these servers are just in idle mode. But in a Cloud environment the system is managed to run at the highest efficiency. In addition, data center planning allows better power utilization. In traditional data centers, they can have cooling problems and you can run out of space for more servers. There is also a consortium of Cloud providers, who assure that its members optimize their data centers to minimize power consumption.

On-premise solution can be better, whenever if we have a constant full utilization of the IT infrastructure. This often happens in large companies that offers constant services around the world. For example, in their start Facebook was using Amazon services but finally due to their large increase in business, Facebook built his own data center, adapted to their business needs.

Cloud solutions are highly recommended in most areas. But an important factor to consider is that network latency influences negatively in the response time of the Cloud solutions. Traditional On-Premise Computing usually have better network latency and therefore the response time gives better results for the solution.

And also a lot of companies prefer to use on-premise infrastructure for its data privacy and protection. In this project, however, we do not focus on Cloud security.

In conclusion, it is necessary to analyze the CapEx/OpEx balance and the consumption depending on each own case. As we have said, what we study is the energy consumption and in the next chapter, we would like to show our structure solution of hybrid architectures and how they are a solution to reduce energy consumption, without losing too much performance.

Paper: Predicting the Energy, Cost and Performance of Cloud Computing

By Cloud Admin