Saturday, July 29, 2017

On eventually consistent file listing


Introduction 


Cheap s3 storage comes with unusual cost: correctness.  One of the key problems while working with highly partitioned data stored in s3, is the problem of eventual consistency in file listing. What exactly is this problem and how can we think about mitigating its impact: we will discuss in this post.

File listing is a very common operations since the invention of files. Given a directory or path, it gives us the list of files under that path. Tons of lines of code written over file systems, depend on correctness of this operation.  Unfortunately, this assumption breaks when the listing is done on s3 or for that matter any blob store.

Deconstructing file listing 


One way to think about eventual consistency of file listing is to argue that we get a wrong answer. This is correct to some extent but not powerful enough to do something about it. To do something about it, we need to dig a bit deeper and understand the nature of this wrongness. I find it useful to characterise this in the following form:
  • Ghost files 
  • Conceived files 
Lets try to understand what they mean. Ghost files are files which are listed by the file listing operation but they have actually been deleted from the file system. This is a very common reason for job failures in spark and hive. They are called ghost for obvious reasons. Conceived files on the other hand are those files which actually exist, but were not returned by the listing API.  In the happy (immediately unhappy) path, eventual consistency causes jobs to fail, because further operations on ghost file keep failing, irrespective of the number of retries. In the unhappy (short term happy) path, we have data loss because of conceived file, because they are simply missing in the final result,  resulting in incorrect answers.  

Given these two constructs, we can argue that the wrongness of a listing operation will occur either because of Ghost files (files which shouldn't be present in the listing but are) and conceived files (files which should be present in the listing but are not there). We can now have separate solutions for dealing with detection and consequences of these two file classes.

Dealing with Ghost files


Ghost file are files which shouldn't have existed in the listing API to start with. Once they show up, they cause different problem depending on what operations we are doing with these files. Most common problem would be subsequent file not found errors. One way to deal with this is to do a fresh listing operations and do a set subtraction.
Let A be the result of listing operation at time t1
and B be the result of listing operation at time t2,
where t2 > t1.
Set A-B i.e the files which are in A but not in B, is essentially the list of currently known ghost files. Once detected, we can choose to deal with them in some form. One simple way is to ignore the failures caused by ghost files, because we know they should fail. The other option is to remove them from our task queue, because we know they are not part of the final solution set. We might need to iterate multiple times (say, till a fixed point) to find out all the ghost files.

Dealing with Conceived files


Conceived files are the files which didn't even show up.
Lets again consider that A be the result of listing operation at time t1
and B be the result of listing operation at time t2,
where t2 > t1.
Set B-A i.e the files which are in B but were not in A, is essentially the list of current known conceived files. These are files which we would have missed if we only do a single listing operation. Once detected, we can choose to deal with them in some form.  Handling of conceived files is relatively simple. We just need to add them to our working set. We might need to iterate multiple times (say, till a fixed point) to find out all the conceived files and treat them as if they were part of the original listing operation. 

It is tempting to say why not wait until the system has become consistent before starting the work. In theory it works, it practice we don't know how much time it will take. Starting with whatever information we can get from the listing API, we get a head start and can keep revising our task set depending upon what further listing API reveals. What we get through this approach is correctness but without introducing any performance penalties.

In conclusion, we can deal with eventual consistency in file listing operations by repeating the listing operation, detecting ghost and conceived files and modifying our work queues to take our new knowledge about the listing status into account.



Saturday, July 08, 2017

Don't screw yourself

It is a good principle to practice is life. In other terms it would mean don't do something stupid, or something that is not good for you.  Here and now, this is easier to practice. Add time and space to it and we have no clue what it means.

One of the popular ways of screwing yourself involves passage of time. Smoking is a classical example. Not saving, eating unhealthy food, no exercise, not learning new things, the list is endless.

Self screwing is not the only option. We can screw ourselves through other also. One of the simplest ways to punch yourself in the face is to punch someone else. Or consider not respecting right of other people to join the traffic. When we block someone, they block someone else. Since roads are a inherently connected, these small steps helps in creating larger deadlocks.

The point I wanted to make is: we are interconnected and these interconnections make is difficult to view our actions in isolation. These connections connect us not only to one another now but also ourselves to our future. Screwing others is just another way to screw yourself...not now, not here..but someday and somewhere.

Monday, February 01, 2016

Apartment Security: Can we make it smarter using smartphones

We love our families and one place it shows up is when it comes to securing our apartments. What are our expectations? With in reasonable limits, we want to prevent unauthorised people from entering our apartment. Specifically we want to answer the following two questions about visitors, before they can step inside:
  1. Who are you?
  2. Are you invited?
We do a good job of answering the second question by enforcing the rule that for every visitor, security should call up the resident and confirm before letting anyone in, assuming we have intercom. But we really don’t know much about 1. At best we ask the visitor to do an entry into the register which cannot be proved or disproved. We just hope it is correct and useful, incase we need it. Inspite of these measures, few important questions go unanswered.
  • What if something wrong happens? How do we help police in finding the culprits? CCTV helps but we know that inspite of CCTC footage, the accused is still absconding after 2 years of Bangalore ATM assault case. VIDEO: CCTV vs Phone Number
  • We do take care of entry of visitors, but are we absolutely sure about what happens between them leaving the flat and showing up on the main gate? How can we prevent visitors from loitering around in the apartment complex once they are done with the primary visit. 
Unfortunately we don’t know much. SiftApps is trying to solve this problem. Most of the time when we bring additional security measures we are faced with the dilemma of additional time that both the security and visitor have to spend in establishing credentials and authorisation. For example to ensure that we have a valid phone number of the visitor, we can come up with a rule that every visitor need to make a call on the security guard’s phone number and it is the responsibility of the security guard to enter the phone number himself in the visitor’s register. Obviously it will increase the time the visitor has to spend at the gate. Moreover we need “better” security guards. This still leaves us with the problem of knowing or preventing loitering by the visitors after they leave the flat and before they show up at the main gate. 

SiftApps has given considerable thought to this problem. How can we make security not only effective but also efficient in terms of time spent admitting any visitor. Another area which we focused on is making the process simpler enough which normal security guards can understand and follow. Specifically we tried to answer the following questions: 
  • Can we use smartphones instead of servers and computer terminals? 
  • How does instant connectivity helps us in becoming smarter collectively as a whole without requiring non trivial effort from any of us?  
The solution that we have come up with solves all the problems that I have described above. It consists of two apps. One app is for the security and the other app for residents.  The two fundamental questions that we want to answer about visitor remain:
  • Who are you?
  • Are you invited?
Resident app allows residents to “invite” visitors. It requires residents to input the visitor details including the phone number of the visitor. Using this information, the app will generate a unique code and SMS it to the visitor. This code is for one time use only and encodes all the information we need for the visitor including the resident who invited him. When the visitor arrives, all he has to do is to share the code with the security guard, who will input it in the app. The security guard’s app will validate the code, record the entry time of the visitor and display the result of validation to the security guard who can then allow or deny entry to the visitor. VIDEO: How Sift Works? Note that we don’t need visitors to have smartphones and nor do they need to install any app. 

At this stage we have validated phone number of the visitor and we have made sure that any visitor can be easily admitted or denied entry in less than 10 seconds by checking the code in the security guard’s app.

We went ahead and created another optional flow which takes care of the other problem I mentioned earlier. Namely how to prevent unwanted loitering of the visitors once they leave the flat and before they show up at the main gate.

We created a notion of exit code in the app, which is very similar to the entry code I explained above. It can be generated via the resident app when the visitor is about to leave. It is also SMSed to the visitor. When the visitor shows up at the main gate, we know how much time he has spent between generation of the exit code and its actual usage. This way we have complete information about the visitor:
  • his validated phone number 
  • time of entry 
  • time of leaving the flat 
  • time of showing up at the gate 
  • and the resident which invited the visitor 
It is not far fetched to imagine how this information can be used to create real time alerts for security to manage security in a proactive manner. Consider how the following will help in better security:
  • Ability to know exactly how many visitors are in the apartment at any point in time
  • Ability to raise alert to the security if a delivery person has not completed the visit within some pre-configured time
  • Ability to notify security if a person has left the flat, but not shown up at the main gate within some pre-configured time 
  • Ability to track multi-flat visitors and tracking of exit time at each of the flats 

It is a very good solution which gives us the information we need, without causing any extra effort on the part of the residents or the security guard. It doesn’t requires any additional infrastructure except may be a smart phone for the security guard. All our visitor information is available in digital format, easy to search and lookup. The privacy of our visitors is maintained as this information is kept securely on servers and not on some paper register which is available to scrutiny of any visitor who is willing enough to read. 

I would love to hear your feedback and would be happy to answer any questions. Please get in touch with us if you would like to see a demo or start your free trial.  Thanks for your time. Looking forward to help you secure your apartment.

Email: iamrohit  at  gmail.com
Watch this 95 seconds video to see Sift in action. VIDEO: How Sift Works?

Saturday, September 12, 2015

One time password

What is the point of hiding one time passwords?

  • Since I am entering it for the first and last time in my life, let me see it so that I don't type something wrong.
  • It is OK for other people to see it. It is anyway useless immediately after use. 
I wish Apple opens up a bank so that banks have someone to copy.

Friday, September 11, 2015

Generosity of the few

For a long time I felt that the only solution to the tragedy of the commons is structure - based on money or authority. I have a new contender now. This one is called generosity of the few.  It doesn't really works always but in many situations, generosity of the few is good enough to keep the system in balance. Few examples:

  • After being stuck in a traffic jam for some time and before the police shows up, someone who is probably not even driving any vehicle, stands up and starts managing the traffic, all the while listening to the shouts of angry drivers.
  • On roads without any traffic lights, someone just stops the car to let some pedestrians cross the road.
  • Some people actually answer questions on StackOverflow.