MPI - Message Passing Interface

MPI uses collection of objects called Communicators and Groups to define which processes communicate with each other.

Let's assume Communicator has a pool of processes. There can be several such Communicators.

Two costs are involved in distributed Processing:

  1. Bandwidth - decides the maximum size of the message - and thus how fast we can move large amounts of data through the network.
  2. Latency - places a lower limit on the transmission time for any message.

Blocking Call means that the call will not return until the function arguments are safe for re-use in the program. It will only return when a matching receive has been posted and data transmission has completed. The calling process will be idle waiting to be paired up with a matching receive.

- MPI_Ssend(buffer,...) : Blocking Synchronous. Therefore, deadlock!!

- MPI_Send : If system buffers are involved - Asynchronous. Else, acts just like MPI_Ssend.

- MPI_Bsend : System Buffered. Therefore, asynchronous. User must attach and detach the buffer.

Collective Communications
Broadcast, Scatter, Gather - I think we're splitting different iterations of the loops among various nodes here!
Reduction - relates to mathematical operations on a common variable that has an individual copy maintained by all the nodes. Eg: Finally, summing together all the copies [maintained by the different nodes] of a particular variable.

Derived Data Types

Machine Learning Definitions

Bias and Variance: Blog Post

Further intuition from: Mathematics

Covariance: Quora post

Decision Trees [unpruned] using weka

Pruning - also tells what kind of algorithms are used in J48 classifier to form decision trees.

The problem of Over-fitting: [wiki]
  1. You've got the same or greater number of parameters than the number of observations.
  2. The model performs perfectly well on the training [cos it has learned it rather than learning a generalized concept] set but not so well on the test set.
  3. Pruning helps reduce the problem of over-fitting.
Cross-Validation: [Source]
10-fold: Divide the training set into 10 parts; use 9 for training and the one for testing. Repeat the process 10 times where the part used to test is never repeated. Average the 10 results! Weka then produces an evaluation result and an estimate of the error. The model is then run one last time on the entire dataset. 

How to apply your model [formed using the training set] on the test set: [Source]

OpenCV - A start!

I need to detect the edges of an image using the Sobel kernels.

1. How do I convert an image into a 2D matrix for further processing? And best way to view the matrix?


So I need to get the following running on my system soon [ubuntu 13.04]
OpenMP, MPI, OpenCV, OpenCL, Java, Python, Google App Engine, C++ and C.

Hope Eclipse is able to support all this - main reason for going for Eclipse: IT'S GOT AUTO- COMPLETE :D ; so makes suggestions which is really helpful especially for a language like JAVA that has so many classes.

Downloaded Eclipse 4.3.1 Standard from here:

Install it. Then go to Help - Install New Software - From the 'Work With' drop down list, select 'All Available Sites'. I selected my C/C++ libraries from here.

OpenCV Integration
Follow this tutorial for how to integrate OpenCV with Eclipse. Youtube video to go along with it.


I just want to start this blog post with - 'OPENMP IS SUPER AWESOME!!'

Remember, highly optimized serial code could beat naive Parallelized code.

Note about how the for loop gets parallelized: static sharing [Search for 'for loop' although there are lots of useful stuff in this article].

It's the programmer's responsibility to avoid any race conditions by

  • using locks or 
  • avoiding shared resources as much as possible.
This is important:
'parallel' creates a parallel region - which means every thread will execute that particular block of code.

'for' is a work-sharing directive. When called within a parallel directive, it tells OpenMP to have its iterations divided among the thread team. Be ware of the Barrier Synchronization at the end of the parallel region. All threads will block there until the last thread completes. If the code doesn't use '#pragma omp for', then each thread would execute the complete for loop. When parallelizing your loops. you must make sure your loop iterations do not have dependencies.


The variable is initialized to the value listed in the reduction operators table [Fig. 4] for each thread. At the end of the code block [not at the end of each iteration, I'm guessing], the reduction operator is applied to each of the private copies of the variable, as well as to the original value of the variable.

Use Dynamic & Guided Scheduling are good for [Load Balancing] those situations where:

  • each iteration has variable amounts of work or
  • some processors are faster than others
Guided does better than Dynamic due to less overhead associated with scheduling.
Guided & Dynamic does load balancing where as static does none of it.

Nested For Loops
This link is good.

At the entrance of the inner parallel [for] directive, the OpenMP runtime library [libgomp] detects that there already exists a team and instead of a new team of N threads, it will create a team consisting of only the calling thread. Based on the version of the gcc compiler, there are different ways of handling this problem. Avoid it as much as possible is what I'm going to do.

Android Development

Download Eclipse IDE from here. I've gone for the Linux 32-bit version.

I will be following the instructions as specified here to create my first Android app. Any thing that I find hard or if I feel the link lacks few details, I shall include it here.

Networking concepts

Uses of a Firewall on a Router
Protects you from evildoers who try to control your computer over the Internet. Any service you expose to the Internet should be secured with a strong password.

Network Address Translation
There are not enough unique IP addresses to be able to connect all computers directly to the internet. Hence, to the outside world, all computers within a network are represented by the Gateway IP address.

Difference between Static, Dynamic & Overloading NAT
Static NAT is a one-to-one mapping, e.g an inside local address of can translate to an outside local address of (inefficient - you'll need n IP addresses for n internal machines). 

Dynamic is when you have a pool of available address to use as an outside local address, and internal clients simply use the first available address. Ideal when each client needs it's own internet presence but you want to share them out (useful when not all clients are online at the same time).

Overloading, also known as PAT (Port address translation), useful when you only have one outside local address in which all internal clients need to be seen by. Using PAT - each client can be assigned an individual socket (port\protocol\IP) and have that mapped to the single internet address. PAT is used on almost all home routers today.


[1] - Article from on NAT, port forwarding & port trigg.

Home networking basics

How to set up a Web Server at home

[1] & [4] have been useful in writing this blog post.

Web Server
Software that continuously runs on a computer and allows other computers to download documents from it - serves web pages to clients who view them using a web browser app.

Apache2 Web Server
Users enter a Uniform Resource Locator (URL) to point to a Web server by means of its Fully Qualified Domain Name (FQDN) and a path to the required resource. For example, to view the home page of the Ubuntu Web site a user will enter only the FQDN. To request specific information about paid support, a user will enter the FQDN followed by a path.

The most common protocol used to transfer Web pages is the Hyper Text Transfer Protocol (HTTP). Protocols such as Hyper Text Transfer Protocol over Secure Sockets Layer (HTTPS), and File Transfer Protocol (FTP), a protocol for uploading and downloading files, are also supported.

Apache Web Servers are often used in combination with the MySQL database engine, the HyperText Preprocessor (PHP) scripting language, and other popular scripting languages such as Python and Perl. This configuration is termed LAMP (Linux, Apache, MySQL and Perl/Python/PHP) and forms a powerful and robust platform for the development and deployment of Web-based applications.


Apache2 is configured by placing directives in plain text configuration files. These directives are separated between the following files and directories /etc/apache2:

Pay attention to the file names for description of its contents
  • apache2.conf: the main Apache2 configuration file. Contains settings that are global to Apache2.
  • ports.conf: houses the directives that determine which TCP ports Apache2 is listening on.
  • envvars: file where Apache2 environment variables are set.
  • httpd.conf: historically the main Apache2 configuration file, named after the httpd daemon. The file can be used foruser specific configuration options that globally effect Apache2.
Apache Modules
  • mods-available: this directory contains configuration files to both load modules and configure them. Not all modules will have specific configuration files, however.
  • mods-enabled: holds symlinks to the files in /etc/apache2/mods-available. When a module configuration file is symlinked it will be enabled the next time apache2 is restarted.
More configuration files
  • conf.d: contains configuration files which apply globally to Apache2. Other packages that use Apache2 to serve content may add files, or symlinks, to this directory.
1 Apache Server -> Multiple sites
  • sites-available: this directory has configuration files for Apache2 Virtual Hosts. Virtual Hosts allow Apache2 to be configured for multiple sites that have separate configurations.
  • sites-enabled: like mods-enabled, sites-enabled contains symlinks to the /etc/apache2/sites-available directory. Similarly when a configuration file in sites-available is symlinked, the site configured by it will be active once Apache2 is restarted.
In addition, other configuration files may be added using the Include directive, and wildcards can be used to include many configuration files. Any directive may be placed in any of these configuration files. Changes to the main configuration files are only recognized by Apache2 when it is started or restarted. Refer to the basic settings in [2].

Symboilc Link
Special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution.

Virtual Host
refers to the practice of running more than one web site (such as and on a single machine. Virtual hosts can be "IP-based", meaning that you have a different IP address for every web site, or "name-based", meaning that you have multiple names running on each IP address. The fact that they are running on the same physical server is not apparent to the end user.

Files modified:
apache2 -  sites-available -> jaytest
network_test (Web Server root folder) - .htaccess

All the changes that I made in the respective files can be viewed here:

Changes to the router to enable Port Forwarding - a screen shot to give you an idea:

Useful Links:
[1] - 'How to set up a personal home web server' from LifeHacker
[2] - 'Apache 2 web server' from
[3] - Info about Virtual Hosts
[4] - 'How to access a home server behind a router/firewall' from LifeHacker


A container for objects that have keys. 2 operations - Insert and Extract-min or max; both takes place in log(n) time. n being the no. of keys.

Conceptually - think of a heap as a tree - rooted, binary.

Heap Property - at every node x, key[x] <= all keys of x's children

Consequence - object at root must have minimum key value

Array Implementation

parent[i] = i/2 if i is even  or  floor(i/2) if i is odd.
and children of i = 2i, 2i+1

Use of array is good in terms of storage - no space wasted on pointers. Traversing should be easier.

Insert and Bubble-Up (given key k) Runtime = O(logn)

  1. Stick k at end of last level
  2. Bubble-up k until heap property is restored (i.e. key of k's parent is <= k)
No. of levels in heap tree == log(n) [base 2]  -> n is no. of items in heap

Extract-Min and Bubble-Down Runtime = O(logn)

  • Delete root
  • Move last leaf to be new root
  • Iteratively bubble-down until heap property has been restored [always swap with smaller child]


  1. Sorting: Canonical use of heap - fast way to do repeated minimum computations.
    Runtime = O(nlogn) time (Each heap operation has a logarithmic runtime; do it n times!)
  • Insert all n array elements into a heap.
  • Extract-min to pluck out elements in sorted order
    2.   Event Manager: Synonym for a heap: Priority Queue.
          Example use: Game simulation
  • objects = event records [action/update to occur at given time in the future]
  • key = time event scheduled to occur
  • Extract-min = yields the next scheduled event 
    3.   Median maintenance
      At each time step i, calculate the median of i numbers in O(logi) time.
  • maintain heaps Hlow: supports EXTRACT_MAX; Hhigh: supports EXTRACT_MIN
   4.    Speeding up Dijikstra from O(nm) to O(mlogn) 

Data Structures - Intro

DSs are used to organize data so that it can be accessed quickly and usefully.
egs: List, stack, queue, heap, search tree, hasttable, bloom filter, union-find etc.

Why so many? Different DSs support different sets of operations -> suitable for different types of tasks.

Rule of Thumb: Choose the minimal DS that supports all the operations that you need.

When you think about what DS to use, ask 2 questions:

  • What are the operations it supports?
  • What is its running time?

'sizeof()' of a 'struct' data structure

A struct is a datatype that composes a fixed set of labelled fields or members.
[In C++, the only difference between a struct and a class is the default access level, which is private for classes and public for structs.]

sizeof() returns the size in bytes.

The compiler inserts (padding) unused memory (extra bytes) into a 'struct' so that data members are optimally aligned for better performance. Many processors perform best when fundamental data types are stored at byte-addresses that are multiples of their sizes.

Example code: here

sizeof(int): 4
sizeof(char): 1
sizeof(item): 8

Dijkstra's Algorithm

Can only be applied to a directed graph with each edge having (strictly) positive lengths.

BFS can only be used to find the shortest path in a graph with unit-length edges.

Implementation Example

In each iteration, the shortest path to one new vertex will be found from the source vertex.

- X      = {S}   [vertices processed so far]
- A[S] = 0       [computed shortest path distances]
- B[S] = 0       [computed shortest paths]

Computing Strong Components in a Graph

A directed graph is said to be strongly connected, if you can get to any node from any other node. Or a graph could contain such maximal regions with strongly connected components (there is a path from u to v and from v to u) within it.

If you call DFS from the right node on the graph, it will possibly give you an SCC; else it might capture a union of SCCs and not give you any information at all.

Kosaraju's Two-Pass Algorithm

How do you process the node in the decreasing order of finishing times in the 2nd pass?
You don't want to sort the nodes cos that would take nlogn time. You need to remember the finishing times [in the first pass] in some fashion so all you need to do is a linear scan.

Pseudo Code:
DFS-Loop (graph G)
-global variable t = 0 [to compute finishing times in Pass 1]
-global variable s=NULL [to keep track of leaders in Pass 2]
-Assume nodes labelled 1 to n
-for(i=n; i>=0; i--)
  - if i not yet explored
  - s = i
  - DFS(G,i)

DFS(graph G, node i)
- mark i as explored
- set leader(i) = node s
- for each arc (i,j) that is part of G:
  - if not yet explored:
    - DFS(G,j)

- t++
- set f(i) := t 

Source code in C: here

Topological Sorting

A topological ordering of a directed graph G is a labelling f of G's nodes such that:

  1. the f(v)'s are the set {1,2...n}
  2. if (u,v) belongs to G -> f(u) < f(v)
Use: When creating sequence tasks while respecting all precedence constraints.

It should be an acyclic graph. Running time: O(m+n).


Topological sort via DFS
  • Follow the out-going arcs from a node
  • Once you've reached the end, backtrack!
  • Label nodes along the backtrack path if they have no more out-going arcs else follow the arcs
  • If you meet an explored node on the way, ignore it!

DFS_loop(graph G)
- mark all nodes unexplored
- current_label = n    
- for each vertex v in G
  - if v not yet explored
    - DFS(G, v)

DFS (graph G, start vertex s)
- mark s as explored
- for every edge (s,v):
  - if v unexplored
    - DFS (G,v)
- set f(s) = current_label

- current_label --

Graph Search


  • Find everything findable from a given start vertex.
  • Don't explore anything twice.
  • Run time - O(m+n) time; this is linear w.r.t. the size of the graph.
Generic Algorithm 
Given graph G, vertex S
- initially s explored, all other vertices unexplored. [1 boolean per node]
- while possible:
  - choose an edge (u,v) with u explored and v unexplored   [If none, halt]
  - mark v explored

Breadth-First Search (BFS)
O(m+n) time using a Queue (FIFO)
  1. explore nodes in layers - if there is a path BFS will find it.
  2. can compute shortest paths
  3. can compute connected components of an undirected graph

For shortest path calculation -> changes to the alg.
initialize dist(v) [0, if v==s, something if v != s]
- when considering edge (v,w):
  - if w unexplored, then set dist(w) = dist(v) + 1

For computing connected components:
Applications: To check if a network is disconnected, graph visualization, clustering.
Input: List of all nodes along with the list of nodes each one is connected to.
- Initially all nodes are unexplored, assume nodes are labelled 1 to n
- for i=1 to n:
  - if i not yet explored in some previous BFS
    - BFS(G, i)  -> discovers connected components..

Depth-First Search (DFS)
O(m+n) time using a stack (LIFO or via recursion)
  1. explore aggressively like a maze, backtrack only when neccessary
  2. compute topological ordering of directed acyclic graph
  3. compute connected components in directed graphs.
Stack version:
mimic BFS code, use stack (push, pop) instead of queue (+minor other modifications)

Recursive version (elegant):
DFS (graph G, start vertex s)
- mark s as explored
- for every edge (s,v):
  - if v unexplored
    - DFS (G,v) 

Graphs [Contraction Algorithm]

Graphs contain vertices and edges; can be directed or undirected.

  1. Road networks: All softwares or web-apps store road networks in some form of graphical representation where the vertices correspond to intersections and the edges correspond to individual roads.
  2. The Web: Vertices - individual web pages & Edges: hyperlinks. Eg: First vertex [tail] is going to be a page that contains a hyperlink and the second vertex [head] is going to be the page the hyperlink points to.
  3. Social Networks: Vertices - individuals, Edges - relationships.
Cuts of Graphs
Grouping or partition of the vertices of the graph into two non-empty groups, A and B. 
When we talk about min-cut of a graph, we're mainly talking about the minimum number of crossing edges between these 2 groups.

  1. Identify network bottlenecks/weaknesses. 
  2. Community detection in social networks. - look for small regions that are tightly knit amongst themselves and weakly connected to the rest of the graph.
  3. Image segmentation -
    Input: 2D array (graph) of pixels
    For two pixels that are immediately next to each other [left->right or top->bottom], you put an edge there. That gives us a Grid Graph. We also define edge weights i.e. how likely one expects two pixels to be coming from a common object [eg: perhaps, if they have the same colour map..etc].
Min-cut is then applied [a few times] on this grid graph with weighted edges to identify contiguous [major] objects in the picture.

 How many cuts does a graph with n vertices have? 
Ans: (2^n)-2
- Each vertex can either go into Set A or B - 2 choices! Therefore, we have 1 binary degree of      freedom for each vertex.
- Eg: a vertex can either go in A, B, none or both. The last 2 options are not possible, hence the '-2' .

Graph Implementations
[n: # of vertices, m: # of edges]

Adjacency Matrix
Represent G by a n*n (0 or 1 element) matrix A, where Ai,j =1 iff G has an i-j edge.
[In case of directed graphs, the value stored could be +1 or -1 depending on the direction the edge points in]
Space required: theta(n^2)

Adjacency List
  • array (or list) of vertices - theta(n)
  • array (or list) of edges - theta(m)
  • each edge points to its end points - theta(m)
  • each vertex points to edges incident on it - theta(m)
Space required: theta(m+n); after removing constants

Which is better? - depends on graph density and operations needed.

Adjacency lists are 
  • good to represent massive networks
  • good for performing graph searches
  • doesn't take up as much space as Adj. Matrix.

Random Contraction Algorithm
While there are more than 2 vertices:
  • pick a remaining edge (u,v) uniformly at random.
  • merge (or contract) u and v into a single vertex
  • remove self-loops
return cut represented by final 2 vertices. Each iteration is going to decrease the no. of vertices by 1.

The source code can be found here

CV using LaTeX

So for a while I've been wanting to redesign my CV with LaTeX, the main reason being it's typeset makes the CV look so professional & plus formatting a document properly in Microsoft Word is such a faff. And what better way to write documents than to actually code [used Kate] and compile it [used the command line]. [Reasons why LaTeX is awesome #1 #2]

Right! Let's get cracking!

  1. Few setup instructions: I've got the TexLive setup environment in Ubuntu 12.04. TexLive contains all the LaTex packages you will require like fonts, mathematical symbols..etc.
  2. My CV (the output): My CV in pdf format gets dowloaded if you click the link.
  3. The Latex code: At the moment, it's a very hacked form of the code found in this wonderfully written blog. When my code compiles it complaints about incorrect sequences and missing item elements but hey, Texlive still manages to give me the desired output. I'd like to revisit the code and fix all these issues at some point. But for the time being, it's Job Hunt time! :D

Also, I'd like to try out Lyx at some point - supposed to be a good LaTeX editor/GUI [which can be easily downloaded on Ubuntu via the Synaptic Package Manager, but I'm a big fan of the Command Line].

Quick Sort


Running Time

Choosing the pivot randomly is known to be the best way of performing quick sort because on an average it achieves a run time of nlogn. 

QuestionDefine the recursion depth of QuickSort to be the maximum no. of successive recursive calls before it hits the base case [Note: the recursion depth is a random variable, which depends on which pivots get chosen.]. What is the minimum-possible and maximum-possible recursion depth of Quicksort respectively?
Best Case: When the algorithm always picks the median as a pivot, in which case the recursion is essentially identical to that in MergeSort - 
Worst Case: The min or the max is always chosen as the pivot, resulting in linear depth - Θ(n)

Source Code

  1. Taking the first element as the pivot: code
  2. Taking the last element as the pivot: code
  3. Taking the median (amongst the first, last and middle element) as the pivot: code