BLOG MIGRATED !!!!!!!!: 2012

Wednesday, June 13, 2012

Setting up Log4PHP with Code-Igniter

No BS, the steps are as following -

1. Clone the github project at https://github.com/fukata/ci-log4php

>: git clone https://github.com/fukata/ci-log4php.git

2. The ci-log4php directory has 2 folders. Copy and paste the ci_log4php folder in /application/third_party/

3. From the other folder Copy MY_Log.php file. Open config.php.

If $config['subclass_prefix'] = 'MY_';

then place the MY_Log.php file in /application/libraries/

4. Similarly Place the log4php.properties in /application/config folder and log4php_helper.php in /application/helpers folder

5. Edit the log4php.properties file. Set logs logs folder path

log4php.appender.default.file = /path/to/ci-app/application/logs/%s.log

6. Set the $config['log_threshold'] = 4; in config.php according to :

| 0 = Disables logging, Error logging TURNED OFF
| 1 = Error Messages (including PHP errors)
| 2 = Debug Messages
| 3 = Informational Messages
| 4 = All Messages

7. Go inside the application folder and run > chmod -R 777 ./logs

8. Use these commands for logging -

// log_error('thiserror');

// log_info('thisinfo');

// log_debug('thisdebug');

Enabling SSL on MAC OS-X Snow Leopard

cd /private/etc/apache2/

openssl req -keyout privkey-$(date +%Y-%m).pem -newkey rsa:2048 -nodes -x509 -days 365 -out cert-$(date +%Y-%m).pem

Country Name (2 letter code) [AU]:CH State or Province Name (full name) [Some-State]:Zurich Locality Name (eg, city) []:Zurich Organization Name (eg, company) [Internet Widgits Pty Ltd]:Entropy Organizational Unit Name (eg, section) []:Secure Server Administration Common Name (eg, YOUR name) []:www.entropy.ch Email Address []:liyanage@access.ch

Make sure to enter the sitename properly.

Make sure that TextEdit is not running, then type these lines into the terminal window:

chmod 600 privkey-YYYY-MM.pem

chown root privkey-YYYY-MM.pem

open -a TextEdit /etc/apache2/httpd.conf

Uncomment the lines -

LoadModule ssl_module libexec/apache2/mod_ssl.so
Include /private/etc/apache2/extra/httpd-ssl.conf

open -a TextEdit /etc/apache2/extra/httpd-ssl.conf

Edit these lines -

# General setup for the virtual host
DocumentRoot "/Library/WebServer/Documents/new-ui/app/webroot"
ServerName www.gohachi.com:443
ServerAdmin mayank@hachilabs.com

SSLCertificateFile /etc/apache2/cert-YYYY-MM.pem
SSLCertificateKeyFile /etc/apache2/privkey-YYYY-MM.pem

Restart apache

Hit https://localhost/

Media Coverage!

Today we released out Twitter Integration.

Hachi on Techcrunch
Hachi on BetaKit
Hachi on Pandodaily

Other media coverage -

http://www.talenthq.com/2012/05/4-great-tools-for-recruiting-sales/
http://teleinfobd.blogspot.in/2012/05/hachi-new-social-networking-dimension.html
http://www.the33tv.com/community/facebook/kdaf-hachi-social-contacts-portal-story,0,3990138.story
http://pandodaily.com/news/hachi-one-social-contacts-portal-to-rule-them-all/
The Spanish Techcrunch - Explains the large number of spanish users in our system currently http://wwwhatsnew.com/2012/04/17/hachi-para-encontrar-el-camino-mas-corto-que-nos-separa-con-cualquier-persona/

Today I was moved to Symynd [http://www.symynd.com], a project built in Django, a web-development Framework built upon Python. Setting up the project on mac took quite some effort since I had to install a lot of Python and Django libraries to get the project running.

I am thinking of writing a script to make the setup process easy. Will probably work on that tomorrow.

Hadoop!

Hadoop is a large-scale distributed batch processing infrastructure. Batch processing is execution of a series of programs (jobs) on a computer without manual intervention.

Hadoop includes a distributed file system which breaks up input data and sends fractions of the original data to several machines in your cluster to hold. This results in the problem being processed in parallel using all of the machines in the cluster and computes output results as efficiently as possible.

Hadoop is designed to handle hardware failure and data congestion issues very robustly.

In a Hadoop cluster, data is distributed to all the nodes of the cluster as it is being loaded in. [Data is distributed across nodes at load time.]

The Hadoop Distributed File System (HDFS) will split large data files into chunks which are managed by different nodes in the cluster. In addition to this each chunk is replicated across several machines, so that a single machine failure does not result in any data being unavailable. Even though the file chunks are replicated and distributed across several machines, they form a single namespace, so their contents are universally accessible.

Hadoop will not run just any program and distribute it across a cluster. Programs must be written to conform to a particular programming model, named "MapReduce."

Plan for the day

The plan for today is simple -

Read 2 chapters of TiJ
Make notes for the 2 chapters
Read Yahoo! tutorial on Hadoop
Optimize the codebase for the PlaceIQ project
Read an essay from Hackers and Painters
Solve some algorithm questions

Bloom Filters

Bloom filters are probabilistic data-structures built with the aim of handling specific usecases of huge DataSets while keeping the memory consumption minimum. Bloom filters are created by parsing the dataset once and once the entire dataset is parsed, the Bloom filters can be used to quickly Query if a particular Data item is there in the dataset or not.

An important thing to note is that Bloom filters can return False Positives but never False negatives. That means if a Query made on a Bloom Filter returns that an item doesnt exist in a dataset we can be sure about this result, but if the Query made on the Bloom Filter returns that data exists in the dataset, there is a small chance that the element might not exist in the dataset.

BloomFilters consist of a Number of hashtables of fixed sizes. Initially all the bits are set to zero in all the tables.

Each word of the dataset is hashed individually into the Bloom Filters using a single Hash Function or separate hash Functions, and the mod of the value returned by the hash is % with the size of the hashtable and the result key is set to 1.

The data-structure is probabilistic because there is a low but finite probability that two words will have collisions in each of the hash tables and Query might return true for one of them even though it might not be present in the original dataset. However, the chances of such errors are very low and BloomFilters can be customized, configured and optimized to better suit the given DataSet.

Many kinds of variations exist. One which use a single hash function for all tables has tables of varying lengths, so the % comes out to be different. Another implementation uses multiple hash functions but just a single hash table, where all the generated hashes are set.

From http://www.javamex.com/tutorials/collections/bloom_filter.shtml -

we allocate m bits to represent the set data;
we write a hash function that, instead of a single hash code, produces k hash codes for a given object;
to add an object to the set, we derive bit indexes from all k hash codes and set those bits;
to determine if an object is in the set, we again calculate the corresponding hash codes and bit indexes, and say that it is present if and only if all corresponding bits are set.

Java vs. C++

I ve been reading this book "Thinking in Java" and the Author keeps stressessing over the fact that Java is an improved version of C++ and other Object oriented languages in all aspects. The Java way to do things is always better and correct according to him.

Though he explains well why certain desicisions were taken by the developers of Java language, and how it benefits the developers, I would have appreciated the book more if he also talked about the flipside of the decisions taken while developing the language. Understanding the negative consequences of a design decision is as important as it is to know what benifits it provides.

UPDATE: On page 103, finally, the author says and I quote :

... As you progress in this book, you ll see that many parts are simpler, and yet in other ways Java isn't much easier than C++ ...

Thinking in Java

Started reading the book Thinking in Java by Bruce Eckel. Aiming to complete the book in 3 weeks. Here is a list of chapters in the book -

Will strike out the names as i complete them.

Introduction 13 May 5
Introduction to Objects 2 May 5
Everything Is an Object 61 May 5
Operators 93 May 8
Controlling Execution 135 May 8
Initialization & Cleanup 155 May 8
Access Control 209
Reusing Classes 237
Polymorphism 277
Interfaces 311
Inner Classes 345
Holding Your Objects 389
Error Handling with Exceptions 443
Strings 503
Type Information 553
Generics 617
Arrays 747
Containers in Depth 791
I/O 901
Enumerated Types 1011
Annotations 1059
Concurrency 1109
Graphical User Interfaces 1303

Getting started with Hadoop and MapReduce

Map Reduce is a programming paradigm developed for creating high scale data crunching programs by dividing the workload among several parallel machines. Hadoop MapReduce is the framework on which such programs are written.

Input data is fed as Key-Value pairs and the output is also in the form of Key-Value pairs, which enables Chaining of multiple MapReduce jobs one after the other.

This is what I ll be reading to get started -

http://developer.yahoo.com/hadoop/tutorial/

http://developer.yahoo.com/blogs/hadoop/

Ruby Java Bridge

A week ago while working on a Ruby on Rails project, I had to generate a highly complex excel file for certain Reporting requirements of the client. Previously I had used the Spreadsheet gem for generating the xls templates but it was clear spreadsheet was not going to be enough, since it works well on predefined templates only. We also looked at some other gems like WriteExcel but could not find a gem which was robust enough for our purpose.

When all hope was lost, we had to revert to Java, and fortunately Java had some jars which we could use for our requirement. We wrote the code in Java and using the Ruby Java Bridge gem, we could successfully generate the required excel report by reusing the Java code.

Link [Tutorial for RJB gem] http://www.ibm.com/developerworks/java/tutorials/j-rjb/index.html

The 6 URLs

This morning I got an email from my Manager instructing me to read and learn about Hadoop and MapReduce, along with six URLs to help me getting started -

Seems like an interesting project is coming up :)

TopCoder !

A friend talked about his idea of opening a startup today. His idea is to create a website similar to Pagalguy.com for engineering students (Pagalguy is for MBA). Agreed to develop the website for him if he could wait till July.

At Kuliza, I am researching over Groovy and Grails, a web development framework similar to ROR.

In other news, the awesome Topcoder tshirt finally arrived today; had lunch at Meghna Biryani; had a huge brawl with the house owner and the building watchman and finally decided to shift to a new home. Probably will shift to Bohmannahalli area. Travelling in Bangalore really sucks.

Hello World!

++++++++++[>+++++++<-]>++.>++++++++++[>++++++++++<-]>+.>++++++++++[>++++++++++<-]>++++++++.>++++++++++[>++++++++++<-]>++++++++.>++++++++++[>+++++++++++<-]>+.>++++++++++[>+++<-]>++.>++++++++++[>++++++++<-]>+++++++.>++++++++++[>+++++++++++<-]>+.>++++++++++[>+++++++++++<-]>++++.>++++++++++[>++++++++++<-]>++++++++.>++++++++++[>++++++++++<-]>.>++++++++++[>+++<-]>+++.>

Thats Hello world! for you in BrainFuck Programming Language. To run the code, use the interpreter provided at http://brainfuck.tk/ :)