Friday, October 21, 2011

Use cases for Cacheismo

Cacheimso is a scriptable in-memory lua object/key value cache, written in c and works on most posix compliant systems. It is based on a simple idea of exposing objects with methods instead of opaque values.

  • Simplest use case would be to use cacheismo as a key value cache, just like memcached. It supports the tcp ascii memcached protocol. No need to specify or tune slab sizes. You would be surprised by improvement in your cache hit rate. This comes from using special memory management scheme to optimize memory usage.  Cacheismo Memory Management 
  • Second use case for cacheismo is when you would like to store server side state which is frequently modified. Typically this is where you would use CAS operations in memcached.  Example could be counters, rate limiting, last N users, etc. If you use memcached for storing session objects, it would be essential to map users to specific machines so that session information is not corrupted because of concurrent modifications to session state. With cacheismo you could keep your session object in cacheismo server and let any app server to handle any user request.  Cacheismo allows creating server side objects in lua scripting language which can be atomically modified by use of get operations on special keys called virtual keys.  Cacheismo running on single core can at max do about 80K get operations per second and this includes running a minimal script on every get operation. Your mileage will depend upon complexity of the script and size of your objects.   Sliding Window Counter Quota With Cacheismo Cacheismo Lua API
  • Cacheismo also supports talking to other cacheismo server from the scripting environment. This is based on two functions getFromServer and getInParallel.  These can be used to create synchronous  replication, proxy functionality to route requests to different servers and for in memory parallel computations.  Cacheismo Cluster Using Cacheismo Cluster For Real Time Analytics 
  • Cacheismo also supports querying for existing keys through scripts. This can be used to create replica of an existing system if the current system needs to be shutdown for any reason. Finding Keys In Cacheismo
  • Cacheismo supports server side consistent hashing. If you find it difficult to update all your app machines when new servers are added, you could use this feature to do seamless upgrades. This comes with a network cost. One hop to the first cacheismo server which finds the right node and second to the right node for the key. 

Thursday, October 20, 2011

Education for Education

Education loans are based on the same model as other loans, the difference being that interest rates are less compared to other kind of loans.  The model makes sense since banks are in it for money. But what if this was done by some non profit organization whose only function is to make it easy for people to get educated.

Lets say it cost C1 at time T1 to get education E. Lets say at some time T2 in the future the cost of same education is now C2.  Lets say "payback" amount for an education loan is simply a function of current cost of that education. You can "payback" at any point in time, after 2 years or 10 years or 20 years or do partial payments. Basically what you get at start is "cost of education" and what you payback is also "cost of education".  Essentially I am decoupling "cost of money" which is interest that banks charge over time from "cost of education".

Under normal circumstances, this would have ensured that non profit organization which has funds to support education of N persons, will continue to have funds to support N persons at all times in the future.  When I looked up wikipedia for cost of education is US, it looks like this would have been a very bad deal given that cost of education in US has increased 2.5 times more than inflation (which is a good indicator of interest rate scheme).  I don't know what is going on here.

The economic feedback loop is pretty big here.
 => High cost of education
 => Student debt
          => Unable to pay
                => Economic instability or political unrest
                     => Depression ?
 => Choose not to go for higher education
          => Less qualified people for the job
              => More salaries for qualified people
                     => Now debt for higher education is justified
 => Universities make loss, less students enrole
        => Universities decrease prices
            => Stability - cost of education is justified

It takes years before we know the mistakes and rectify them.  I keep coming back to the same point, money is a bad abstraction. We need a better model for making people help other people now, in belief that they will be helped when needed in future.

Index Based Salaries

Prices change. Some change every day, others may be once a month. Salaries are fixed, unless company decides to change it.

When government decides to increase petrol prices, price of almost everything increases because almost every business depends on transportation. But salaries don't change. When SBI increases cash reserve ratio or base rate, EMI increases, production decreases but salaries don't change.

It is kind of unfair that everyone in business can increase/decrease prices either based on market conditions or increase/decrease in cost of doing business but salaried people don't have that choice. The only choice they have is - talk to your manager or find a new job.

What if salary was a function? Say Hari Sadu has a home loan, drives a car and has children studying in school.  If his salary was linked to his home loan rate, cost of petrol and cost of education, he would be a very happy employee. Just like businesses can adjust their prices, if salaries could be adjusted the same way, it would create a much faster feedback loop for the economy to adjust to the new conditions. If a business decides that it cannot support salary function of some employee and wants to decrease his salary, it will come out clean and open. The current system of things, simply says things are same (salary stays the same) but actually employee is taking a hit and it stays unsaid, unacknowledged.  A 10% salary raise at yearly performance evaluation might be less that inflation or much much less than "impact of inflation" for a given employee.  In this new system, raise is a raise, it gives employee more spending power.

This is obviously a bad move for the companies because it makes one of fixed costs of business, variable. But what it also does is that it brings transparency. Just as businesses understand and adjust to prices of other businesses, it makes sense to understand and adjust to impact of prices on your employees.  If home loan rate increase by HDFC is eating into margins of Wipro or Infosys, they are in a better position of negotiate with HDFC than each employee on his own and then may be HDFC will figure out a deal where Wipro or Infosys charges less for software.  I guess all I am talking about is businesses being more people aware than money aware, because money in itself can do nothing, only people can.

Tuesday, October 18, 2011

Finding Keys In Cacheismo

In the last post I talked about how to do map-reduce like stuff with cacheimso. It occurred to me that their is no method to get all the keys. The global hashmap object now supports new method getPrefixMatchingKeys which as the name implies returns keys which match a certain prefix. Typically objects are stored with following template objectType$objectKey. Thus an object created by file quota.lua with key myKey will be stored with key quota$myKey. To do something with all the quota objects:
   
local keys = getHashMap():getPrefixMatchingKeys("quota$")
for k,v in pairs(keys) do 
   print(v)
end

If you would like the quota object itself to support say getAllKeys method then ...add this to quota.lua in scripts directory.



function getAllKeys() 

    local keys = getHashMap():getPrefixMatchingKeys("quota$")
    local result = ""
    for k,v in pairs(keys) do 
       result..v.."\n"  
    end
end


Now a get request with key quota:getAllKeys would return a new line separated list of all active quota objects in the map.

This is good enough but probably not very interesting. I am planning to support indexes as first class objects to make it faster to reach interesting objects in quick time. These indexes will automatically remove objects which are deleted or forced to expire because of LRU caching. So if you need to find all the quota objects that are close to quota limit, create an index on quota used value.

Thursday, October 13, 2011

Using Cacheismo Cluster for Real Time Analytics


Cacheismo now supports invoking requests on other server.
  - getFromServer
    This function takes the server name and a key as an argument and returns the result to the script.
  - getInParallel
    This function takes map of server names to list of keys and gets values for all the keys from all the servers in parallel. Once all the results are received, the script gets a map of server names to map of keys and received values.

Here is a simple example code which gets to top accessed keys from the cacheismo cluster using a simplified, framework-less map-reduce.  The contents below should belong to file names mapreduce.lua

-- function called on every cacheismo node to get 
-- the top keys by access count
-- virtual key : mapreduce:topKeys:


function topKeys(count) 
    local keys  = accessCounter:getKeyAccessCount()
    -- sort keys by access count
    table.sort (keys)
    local result = {}
    local i      = 1
    for (i in 1..count) do 
       result[i] = keys[i]
    end
    return result
end


-- function called on one of the cacheismo nodes 
-- which distributes work to all other nodes using getInParallel
-- virtual key : mapreduce:manageGetTop:10 


function manageGetTop(count) 
     local query = {}
     for k,v in pairs(servers) do 
        query[k] = "mapreduce:topKeys:"..count
     end
     local results = getInParallel(query)
     local total = {}
     for k,v in pairs(results) do 
           for k1,v1 in pairs(v) do 
              total[k1] = v1
           end
     end
     table.sort (total)
     local topkeys = {}
     for (i in 1..count) do 
       topkeys[i] = total[i]
     end
end


I have taken some liberties with the syntax and their is no accessCounter object by default in cacheismo, but is fairly easy to create in the script.  Note that the above implementation doesn't have ant intermediate nodes, but is trivial to create by calling get for mapreduce:manageGetTop instead of calling mapreduce:topKeys.

Single cacheimso server might be serving thousands of such queries because everything is happening in non-blocking manner but we give user the illusion of synchronous stack using lua coroutines.

Links:
Cacheismo Introduction
Cacheismo Lua API
Cacheimso Sliding Window Counter Implementation in Lua
Cacheismo Quota Implementation in Lua
Cacheismo Memory Allocation
Cacheismo Source Code
Cacheismo Support