Hadoop MapReduce Demo

Versions:
  • Hadoop 3.1.1 
  • Java10
Set the following environment variables:
  • JAVA_HOME 
  • HADOOP_HOME

For Windows

Download Hadoop 3.1.1 binaries for windows at https://github.com/s911415/apache-hadoop-3.1.0-winutils. Extract in HADOOP_HOME\bin and make sure to override the existing files.

For Ubuntu

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

The following instruction will install Hadoop as Pseudo-Distributed Operation

1.) Create the following folders:
HADOOP_HOME/tmp
HADOOP_HOME/tmp/dfs/data
HADOOP_HOME/tmp/dfs/name

2.) Set the following properties: core-site.xml and hdfs-site.xml
<property>
fs.defaultFS
hdfs://localhost:9001
</property>
<property>
</property>
core-site.xml
<property>
hadoop.tmp.dir
HADOOP_HOME/tmp
</property>
<property>
</property>
hdfs-site.xml
<property>
dfs.namenode.name.dir
file:///HADOOP_HOME/tmp/dfs/name
</property>
<property>
dfs.datanode.data.dir
file:///HADOOP_HOME/tmp/dfs/data
</property>

<property>
dfs.permissions
false
</property>
<property>
</property>
3.) Run hadoop namenode -format Don't forget the file:/// prefix in hdfs-site.xml for windows. Otherwise, the format will fail.

4.) Run HADOOP_HOME/sbin/start-dfs.xml.

5.) If all goes well, you can check the log for the web port in the console. In my case it's http://localhost:9870.


6.) You can now upload any file in the #4 URL.



Now let's try to create a project that will test our Hadoop setup. Or download an already existing one. For example this project: https://www.guru99.com/create-your-first-Hadoop-program.html. It has a nice explanation with it, so let's try. I've repackaged it into a pom project and uploaded at Github at https://github.com/czetsuya/Hadoop-MapReduce.
  1. Clone the repository. 
  2. Open the hdfs url from the #5 above, and create an input and output folder.
  3. In input folder, upload the file SalesJan2009 from the project's root folder. 
  4. Run Hadoop jar Hadoop-mapreduce-0.0.1-SNAPSHOT.jar /input /output. 
  5. Check the output from the URL and download the resulting file.

To run Hadoop as standalone, download and unpack it as is. Go to our projects folder, build using maven, then run the Hadoop command below:
>$HADOOP_HOME/bin/hadoop jar target/hadoop-mapreduce-0.0.1-SNAPSHOT.jar input output

input - is a directory that should contain the csv file
output - is a directory that will be created after launch. The output file will be save here.

The common cause of problems: 

  • Un-properly configured core-site or hdfs-site related to data and name node?
  • File / folder permission

References

  • https://www.guru99.com/create-your-first-hadoop-program.html
  • https://github.com/czetsuya/Hadoop-MapReduce
  • https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation

Heads Up for BET token on Eos Platform

Airdrops are all the rage on the EOS platform but some tokens have not taken the airdrop route. One such token is the BET token from eosbet.io Their first game is Eos Dice and is live on the EOS platform. You will need scatter to play this game and it will demonstrate the power and speed of the EOS platform. Use my link here

https://dice.eosbet.io/?ref=gmzdenrygage

In only 4 days, they have had:200,000 bets! 2,350,000 million EOS wagered! This shows the run away success of their game however this figure is mostly due to the way they have rolled out their BET token.

There are a total of 88 million tokens and 10% 8.8 million are being "air dropped" on the dice game at a ratio of 1:5 on bets wagered. The ICO is only available to individuals who can afford to invest 1000 eos tokens at a price of USD 0.20 per token, meaning that for everybody else the only way to get hold of these token is to bet on the dice game, or wait till they list on the exchanges.

I am not a gambling person myself but I found that the best way to get the tokens is to bet at the lowest risk rate of 96. Continuously betting at this return will give you approximately the equivalent of $ 0.20 per token. The idea is not to win but to trade your Eos for BET tokens.

As dice is their first game, we can only conclude that the future is bright for this company and owning BET tokens is ownership of the revenue stream of this business paid out on a quarterly basis.

The BET tokens will also be listed on an exchange soon and I suspect that the price could be much higher than the 0.20 cents charged at the moment.

This is a very risky investment in many ways and as with anything with great risk comes great rewards. Just make sure that you only risk what you can afford.

Oh yes. If you use a referral link you get an extra 0.5% payout and the referrer also gets 0.5% return on your bets for life. Which is why many people are enticing others to use their referral link. This is mine. While the offer is on this is a good bet. It is the only way to obtain BET unless you are a whale.

Without Referral



With referral
https://dice.eosbet.io/?ref=gmzdenrygage

If you are not lucky at gambling and you want to get hold of some BET tokens the best way I can think off is to place a constant bet at 96.

This will average out to about 74 rolls,which is about 7 BET tokens. The advantage is that you will be able to receive monthly dividends from the profit that this EOS contract makes forever or until you sell it. Remember that you should increase your expected return by using a referral code. If you use mine, my appreciation and Thans In Advance.

React Todo with Middleware

This blog post explains the updates done to React Todo app such as adding Middlewares, Enhancers, API calls, etc.

Things to notice:
  • Naming convention, inspired from angular I suffixed the files with .x.js depending on the type. For example .container.js.
  • I added action codes in the .container.js file.
  • I use class for components that extend React.Component.
The most important change done is in the configuration of the store. We introduced two new files for development and production store configuration. Each configuration has its own set of middlewares and enhancers. 

Let's see the code:

index.js file

import React from 'react';
import ReactDOM from 'react-dom';
import './index.css';
import App from './App';
import registerServiceWorker from './registerServiceWorker';
import { Provider } from 'react-redux'
import configureStore from './core/store/configureStore';

const store = configureStore()

ReactDOM.render(



, document.getElementById('root')
);
registerServiceWorker();

configureStore.js - determines the environment of our running server

if (process.env.NODE_ENV && process.env.NODE_ENV.trim() === 'production') {
module.exports = require('./configureStore.prod');
} else {
module.exports = require('./configureStore.dev');
}

Development configuration

import { createStore, applyMiddleware } from 'redux';
import thunk from 'redux-thunk';
import { createLogger } from 'redux-logger';
//import api from '../middleware/api'
import rootReducer from '../../reducers';
import promise from 'redux-promise-middleware';
import { composeWithDevTools } from 'redux-devtools-extension';
import { crashReporter } from '../Middlewares';
import { monitorReducerEnhancer } from '../Enhancers';
import { apiMiddleware } from 'redux-api-middleware';

const promiseMiddleware = promise();

const configureStore = preloadedState => {
const store = createStore(
rootReducer,
preloadedState,
composeWithDevTools(
applyMiddleware(
thunk,
apiMiddleware,
promiseMiddleware,
createLogger(),
crashReporter
),
monitorReducerEnhancer
)
);

if (process.env.NODE_ENV !== 'production' && module.hot) {
// Enable Webpack hot module replacement for reducers
module.hot.accept('../../reducers', () => {
store.replaceReducer(rootReducer);
});
}

return store;
};

export default configureStore;

Production configuration

import { createStore, applyMiddleware } from 'redux';
import thunk from 'redux-thunk';
//import api from '../middleware/api'
import rootReducer from '../../reducers';
import promise from 'redux-promise-middleware';
import { crashReporter } from '../Middlewares';
import { apiMiddleware } from 'redux-api-middleware';

const promiseMiddleware = promise();

const configureStore = preloadedState =>
createStore(
rootReducer,
preloadedState,
applyMiddleware(thunk, apiMiddleware, promiseMiddleware, crashReporter)
);

export default configureStore;

The complete source code is available at https://github.com/czetsuya/React-MyTodo/releases/tag/1.0.1

Note: I've added implementation code for using a router and API middleware in the master branch.

References

If you're looking for customization, I'm always available for consultation :-)

Hibernate OGM for MongoDB

So lately I've been playing with Hibernate OGM MongoDB's latest version 5.4.0.Beta2 but I'm not able to run a demo project created from wildfly-javaee7-war archetype following the documentation.

Here are the changes I've made to make it run the Arquillian test:


public static Archive<?> createTestArchive() {
String manifest = Descriptors.create(ManifestDescriptor.class).attribute("Dependencies", "org.hibernate.ogm:5.4 services, org.hibernate.ogm.mongodb:5.4 services")
.exportAsString();

return ShrinkWrap.create(WebArchive.class, "test.war") //
.addClasses(Member.class, MemberRegistration.class, Resources.class) //
.addAsResource(new StringAsset(manifest), "META-INF/MANIFEST.MF") //
.addAsResource("META-INF/test-persistence.xml", "META-INF/persistence.xml") //
// doesn't work on this version
// .addAsResource("jboss-deployment-structure.xml", "WEB-INF/jboss-deployment-structure.xml") //
.addAsWebInfResource(EmptyAsset.INSTANCE, "beans.xml") //
// Deploy our test datasource
.addAsWebInfResource("test-ds.xml");
}

Notice that instead of using jboss-deployment-structure file, we use a manifest. Maybe it's a bug in the release.

You can download the complete source code from:

  • https://github.com/czetsuya/Hibernate-OGM-MongoDB-Demo

Statsstøtte til bitcoin er sløsing

Subsidier til bitcoinproduksjon kan unngås ved å stille krav om samfunnsøkonomisk lønnsomhet. Men i så fall får ingen bedrifter avgiftsfritak.


Norske skattebetalere subsidierer altså bitcoinproduksjon på Dale med 28 millioner i året i redusert el-avgift. Paradoksalt nok er hensikten med bitcoins enorme energikonsum å hindre for mange transaksjoner. Utvinnere, som Kryptovaults maskiner på Dale, belønnes nemlig med bitcoin hvis de vinner et lotteri, og får da lov å legge til en blokk med transaksjoner til blokkjeden.

Blir oppdateringene for hyppige øker faren for manipulasjon. Maskinene på Dale og andre utvinnere holdes derfor opptatt med en ellers meningsløs oppgave; å finne det hemmelige vinnertallet ved hjelp av prøving og feiling. Vanskelighetsgraden justeres jevnlig slik at det tar rundt ti minutter å finne løsningen.

Konsekvensene dersom prosessorkapasiteten dobles, er derfor bare at energiforbruket dobles. Ingenting vil skje med transaksjonskapasiteten til bitcoin. Anlegget på Dale er altså ren sløsing med energi, uten noen som helst nytteverdi over hodet. Om anlegget stenger vil bare blokkjedelotteriet bli bittelitt enklere, og det sløses litt mindre med energi i verden.

Dette er et av bitcoins grunnleggende designelementer. Eneste måte bitcoin kan bli miljøvennlig er om kursen stuper slik at utvinning blir mindre lønnsomt.

At bitcoin ikke kommer til å bli noen utbredt valuta er åpenbart for de fleste. I tillegg til å ikke være bærekraftig, er transaksjonskapasiteten for lav og volatiliteten for høy. Bitcoins Messias, Lightning-nettverket, klarer bare å sende små beløp, og sjansen for at sendingen lykkes er bare sytti prosent. Hvorfor ikke bare bruke VISA?

Anlegget på Dale vil heller ikke gi noen teknologiske ringvirkninger. Å drive bitcoinutvinning krever ingen spesiell kompetanse. Forhåpningene til hva teknologien bak skal kunne utrette virker også overdrevet.

Med tusenvis av utviklere som har jobbet i årevis med å finne alternative anvendelser av blokkjedeteknologien, er det overraskende hvor lite som har kommet ut av det. Blokkjede kan selvsagt brukes til alt mulig, men spørsmålet er om blokkjede er bedre enn andre løsninger. Foreløpig ser det ikke ut til at anvendelsene som er utviklet har klart å konkurrere ut noe som helst.

En blokkjede er et endringssikkert register over transaksjoner, hvor transaksjonene legges til i blokker og kopier av blokkjeden er fordelt på et stort antall servere.

Innovasjonen er at hver blokk har et unikt tall som regnes ut av alle tidligere transaksjoner. Om du endrer én av de tidligere transaksjonene, så vil en ny utregning gi et annet tall. I så fall stemmer ikke stemplet på blokken med utregnet tall, og alle senere blokker kjennes ugyldig. Siden regelen sier at det er den lengste gyldige blokkjeden som gjelder, vil et forsøk på manipulasjon bare bety at blokkjeden din blir ubrukelig.

Blokkjedeteknologi er altså en teknologi som hindrer at historiske data endres når det finnes mange kopier av databasen. Hverken mer eller mindre. For at teknologien skal få utbredelse utover kryptovaluta, må den være kommersielt nyttig. Men kommersielle aktører har allerede en løsning på dette problemet; de har én sentral database.

At blokkjedeteknologien skal revolusjonere verden virker rett og slett ikke særlig sannsynlig. Det er ganske stor forskjell på en teknologi som hindrer manipulasjon av en distribuert database, og en teknologi som binder sammen alle verdens datamaskiner.

Kryptovault bidar altså ikke med noe annet enn å kjempe om en fast mengde bitcoin. Det er ikke verdiskapning. Uten verdiskapning blir det samfunnsøkonomiske tapet på virksomheten på Dale ikke bare subsidien på 28 millioner. Hele strømregningen er et samfunnsøkonomisk tap. Strømmen ville skapt langt større verdier hos en bedrift som ikke drev med bitcoinutvinning. En bedrift som ikke mottar subsidier ville skapt enda mer.

Så da burde vel loven i alle fall stille krav om at virksomheten må være samfunnsøkonomisk lønnsom for å få avgiftsfritak? Problemet er at ingen bedrifter som er avhengig av subsidiert kraft er samfunnsøkonomisk lønnsomme.

Wordpress directory and file permission

The issue of setting the correct permission hits me every time I install Wordpress. So I've created a script that will set its content to the correct permission. Below is part of the script:


// Make sure that the proper owner is set since I'm deploying to Amazon this time, my user is bitnami. But normally we use wordpress.
>sudo chgrp -R bitnami /WORDPRESS_DIR

// Find and sets all the directories permission to 775. This operation is recursive.
>sudo find /WORDPRESS_DIR -type d -exec chmod 775 {} \;

// Find and sets all the files permission to 664. This operation is recursive.
>sudo find /WORDPRESS_DIR -type f -exec chmod 664 {} \;

Apache Cassandra Clustering

This tutorial will help us configure an Apache Cassandra's ring with 2 nodes. It will not explain what Cassandra is, use Google for that.

There are actually not too many properties that we must update in order to set up the cluster. Note that in this particular example, we will be configuring 2 nodes: 1 seed and 1 client.

Configuration

The Seed

"The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes."

Open and edit CASSANDRA_HOME/conf/cassandra.yaml
  • rpc_address - set to the IP address of the node
  • seed_provider / parameters / seeds - set to the ip address of the node
  • listen_address - set to the IP address of the node

Client Node

"All nodes in Cassandra are peers. A client read or write request can go to any node in the cluster. When a client connects to a node and issues a read or write request, that node serves as the coordinator for that particular client operation.
The job of the coordinator is to act as a proxy between the client application and the nodes (or replicas) that own the data being requested. The coordinator determines which nodes in the ring should get the request based on the cluster configured partitioner and replica placement strategy."

Open and edit CASSANDRA_HOME/conf/cassandra.yaml
  • rpc_address - set to the IP address of the node
  • seed_provider / parameters/seeds - set to the IP address of the seed node
  • listen_address - set to the IP address of the node
As you can see the only difference is the value of the seeds.

Now start the Cassandra instance on the seed node, followed by the client node. You should get the following log in the seed machine:
INFO  [HANDSHAKE-/192.168.0.44] 2018-08-02 10:53:24,412 OutboundTcpConnection.java:560 - Handshaking version with /192.168.0.44
INFO [GossipStage:1] 2018-08-02 10:53:25,421 Gossiper.java:1053 - Node /192.168.0.44 has restarted, now UP
INFO [GossipStage:1] 2018-08-02 10:53:25,431 StorageService.java:2292 - Node /192.168.0.44 state jump to NORMAL
INFO [GossipStage:1] 2018-08-02 10:53:25,441 TokenMetadata.java:479 - Updating topology for /192.168.0.44
INFO [GossipStage:1] 2018-08-02 10:53:25,442 TokenMetadata.java:479 - Updating topology for /192.168.0.44
INFO [HANDSHAKE-/192.168.0.44] 2018-08-02 10:53:25,472 OutboundTcpConnection.java:560 - Handshaking version with /192.168.0.44
INFO [RequestResponseStage-1] 2018-08-02 10:53:26,216 Gossiper.java:1019 - InetAddress /192.168.0.44 is now UP
WARN [GossipTasks:1] 2018-08-02 10:53:26,414 FailureDetector.java:288 - Not marking nodes down due to local pause of 79566127100 > 5000000000

Can you guess which IP is the seed?

Node:

References