Suraj Atreya

Functional programming, Scala, Haskell, Distributed Systems

TwitterGitHubRSS

Elastically adding and removing nodes using Akka cluster

In the previous post, we saw the internals of Akka cluster and how it communicates with each other using gossip protocol and heartbeat for failure detection. Also, I had promised that I would explain the Akka cluster with a use case. This post explores a pull-based master/slave architecture with master accepting work and slaves asking for work from the master. This use case is suitable for anyone who is looking to elastically provision nodes when the load is high than normal and under-provision when the load is below normal.

More


Akka cluster

For anyone who is serious about distributed systems or building one, there are always issues that are commonly encountered.Issues such as replication, consistency, availability and partition tolerance (CAP)[1]. In a real life scenario, partition tolerance is inevitable. So the system must be able to handle partition tolerance when there are network outages. Therefore, P in the CAP is a must for any distributed system. This has been backed by Peter Deutsch in his (Eight Fallacies of Distributed Computing).

More


Data Analysis using pandas

I’ve been meaning to learn pandas for a while ever since I started looking into data analysis tools. Although I don’t have extensive experience in data analysis, I’ve been writing Scala for quite sometime. Although I love Scala and its awesome libraries, it just falls short of doing quick data analysis especially exploratory data analysis (EDA). I was fishing for a good tool to do EDA and almost all the blog posts and all the community around data science suggested R as the primary tool to do EDA. But, I decided to learn pandas anyway since I have already done Python before. I have nothing against R and I will probably learn it sometime. I have checked out RStudio and it amazed me when the first time I used. But R will have to wait.

More


Reactive file handling using Akka and inotify

At Glassbeam, everyday we deal with terabytes of data. As part of SCALAR platform, we have to ingest many files and parse them at real time. The files arrive asynchronously and fast. These files are then picked up for parsing in the order in which they are arriving. For such a demanding requirement, we had to design a system which should not only be concurrent and asynchronous but also scalable.

More


Function composition

Function composition is a great tool when you want to combine many functions to output a new function. This new function will take a parameter to output the result. Function composition takes the form (b -> c) -> (a -> b) -> a -> c. In other words this takes 2 functions; one function with (a -> b) and another with (b -> c). A quick observation will reveal that the output of second function which is b is the parameter to the first function. If this matches then we can compose two functions.

More