Tuesday, August 11, 2009

Friendfeed and aggregator

Friendfeed accepts Facebook's friend request. link

I think it is the best things happened to Facebook for a long while. I had said it long ago (2 years?) that Friendfeed completes my social network needs.

I was feeling excited when Friendfeed made its first Facebook app. Now, I could aggregate my online identity into a single place for friends who cares. And, I wished to see the deeper side of my friends too (what book they put in their Amazon's wishlist, what do they blog, what do they digg etc.) The critical mass of facebook made it so useful (if Facebook's platform where more friendly to FriendFeed applications)

Technically, aggregation is the way to go for the ultimate network effect. However large you're, you cannot cover everything.

Sure, if everyone plug-in to you, it will be easiest. (Facebook's app, Windows app, iPhone app)

And, some of the major Waterloo in technologies happens when a large company believes that they can makes existing best of breed change for them, and makes it the only option. Examples are WinFS, where they designed it to be useful when everyone changes their file format in order to join WinFS. Oracle's Database Filesystem is another. Same for Cardspace and other identity service.

Google search (and Desktop search) is the most representative a successful aggregation. Instead of requiring how a webpage should look, Google, the Aggregator, invests in the bridging, gopher, and pull in everything. It works better for standard conforming page, but still works for other. Desktop search (from both companies) is much less than the vision of WinFS, but then it works because it doesn't require you to change Microsoft Word's file format for it to be useful.

I predict Google Wave is another major Waterloo. The UX of Google Wave is superb, and the scenario is simply convincing. But instead of aggregate, it requires you to embed the Wave app and dictated Google as the only storage. It would be nice to them and user if they can pull it off, but I doubt it. Unless the figure out a way to include the best of breed to come or already out there, I don't see that they can go very far.

Saturday, February 21, 2009

Dreams of an "Engine Company"

Maybe Dreams of Engineers too!?

It is a well-made "commercial", feeling bad calling it so, but I don't know what else to call it. It definitely improves my perspective on Honda. Totally inspirational. I love it.

I think I am failing every day. Since the day I started the ideas on leafsoft.com, there had never been a day when I felt I finished enough. Sometime I lost to distraction; lost to urgent things that are not as important; lost tireless; sometime I just wanted to finish more; I struggle to become more productive, to ignore unimportant things, to find the right balance to allocate time between tools and end product, to stay focus (which is one of the hardest).

I doubt everyday am I smart enough for my goals. I think the differences between stupidity and admirable endurance are hair thin. One is you do exactly the same things and never stop; the other is you do almost exactly the same thing and never stop.

The video gave me a powerful push today. I know I won't finish as much as I wanted today. I know I am not going to settle for less. I tried to ask myself for less, I couldn't.

Thank Honda!

Saturday, January 17, 2009

Computer on everyone hands

The 500 Millions App downloaded is amazing. It is even two years since it comes out.

It just reminds me the day that Windows 3.0 come out. All little utilities flourish. There were even CD selling that come with like 300 sharewares. I wish the innovation at Apple continue no matter what Steve's health will be...

Tuesday, January 13, 2009

A new year reflection

Welcome to 2009. A year's just started! Time to renew the blog habit, and do some new year reflection.

This blog was most active in the beginning of 2006. At that time, I was working for a mid-size software company on their Eclipse IDE product in Fremont Seattle. (I missed the day I walk to work, and the coffee shops around.) While I loved the tools, working in UI wasn't exactly what I wanted to build my career long term at the time.

I liked to drive deep and interested in Transaction, Caching, O/R kind of stuff, which was what I learned from my first paying jobs.

I wanted to continue on that, so I registered http://cacheca.com (it is like my 8th idea) and worked a bit on my own on distributed cache. Read and think a lot on the topic and that was why I have much to write on that.

I joined a large software company, a clustering project, on the manageability team. I was hoping my knowledge on the my previous work will be useful. In the beginning, I was really wishing to join for the engine side of the team. (I am still on manageability but no longer under the stealth project. The org structure makes sense, manageability at scale has much border application than scaled up or scaled out server. Manageability at scale is the manageability problem.) It is only natural for a critical server product to work on scalability, but it was a stealth project so I didn't try blog about work. Sometimes, a lot of the inspiration is coming from work. Without blogging those, I simply blogged much less.

After joining the company, on my own time, I shifted a little bit and develop a hobby project which was related to social network. I thought an open-platform can be game changing even against fierce competitors. I was looking at the online-identity (eg, openid.net) stuffs for that idea. On a thick stack of loose-leaf papers I drawn the idea, algorithm, and page flow, etc. In code, I didn't go too much beyond coding the login authentication logic (separate login server, passing obfuscated token with page redirection etc. It was interesting to understand what all these HTTP 3xx code about.) Well, I have witnesses on open-platform social-network idea. When facebook came out with their own, and I looked at their initial API, I knew they got it. (I only have the idea. I am not even close enough to any result to feel sour about. Result is everything.) Online-Identity actually become less relevant because of it, imo. Shall the killer app at the time uses online-identity, the landscape might look quite different today. Success is path dependent. It is also the network effect.

Busy at times for different reasons, and phew! 3 years have passed.

Looking forward, there are a lot more interesting problems calling (and even screaming) for solutions.

1) Phone hardware (and even OS) is certainly coming to the rip time. Time has comes for a computer on everyone's hand.

2) As more interesting web apps emerge independently, there is also data scatter problems.

3) We are also tearing ourselves by letting too many irrelevant notifications interrupted us for the fear of missing some important one.

4) Alright, this is the last one, I need to throw some food for those curious minds:
I start to think that the last piece of puzzle of Turning Test is not AI (Artificial Intelligence). It is not a problem about intelligence. (Intelligence is like a cocky joke at the right moment that makes everyone laugh.) The last piece of puzzle is something much more predictable and readily extract from the memory we have. Someone can be very human even when he don't do any cocky joke, but share some common memory with you. You will feel closer (and more human), if someone reminds you that you two shared the feeling on an anecdote.

This year, I am working on the solutions on these problem. They are not as complicated to solve as it appears, and they overlap a lot. Those are going to be blog topics this year. Stay tuned!

Tuesday, June 24, 2008

Google vs. Live Search

With English as my second language, I occasionally use “Internet Search” to check expression that I am about to write down. I aware that sometimes a phrase that I come up might not be the way a native speaker would express it. Sometimes, what I feel a bit odd might be quite regular to him/her.

This time, the phrase is
“Knock on the heart”


http://www.google.com/search?q=knock+on+the+heart

http://search.live.com/results.aspx?q=knock+on+the+heart


I tried it on Google and “Microsoft’s Live Search”.
It demonstrates again Google has superior linguistic analysis on the query (and what is indexed) than Live Search.

With Google, the first link is a video that is an exact match. The other links has a title “Knock outs sweet heart”, “Knock out my heart” and “Knock against my heart”, etc., which *mean* similar thing as my input query. With Live Search, all the links are random with the words “Knock” and “Heart”.

Imagine it is a Tuning Machine test and I ask “show me something that has the expression of knock on the heart”.

Google, as robot wrapped in a human-like skin, replies, I know a video on “You tube” has the exact same title. I know Skechers has a line of shoes called “Knock Outs Sweet Heart”. I know Deidre wrote a blog entry about an Auto Show at Geneva. The title is “K.O. Cars Knock Out My Heart”.

It would be remarkable. It would be almost scary. I would say, “Wow, you’re so knowledgeable!!”

Live Search, as a pretty lady, replies, I know a joke “Knock-Knock Jokes for the young-at-heart.” I also know “Maisie is the heart of Knock Knock.” My reply? Eh? Conversation ended!

Wednesday, November 07, 2007

on Google Android

I won’t bet on Google Android yet.

Has Google convinced me that he understands the messy aspects of how to build a platform to allow third party development? For a platform to gain momentum, they really need some killer apps with it. The killer apps and platform is a chickens and eggs that need be solved at the same time. I am not saying Android cannot succeed, and Google maybe cooking something right under the cover. But, nothing is shown. Also, observed those who can do an killer app is not on-board. None of Sony, Nokia, RIM, MSFT, Apple are onboard.

Sun tried it with Java. It was a great platform, and Sun really know how to write good API and doc. But, …

Apple iPhone has the most proof so far. iPhone has succeeded as a killer app, and it invertible to become a great platform, even Apple tried to resist to become a platform. iPhone even has killer app using Google Map service. If Andriod gains any momentum, iPhone just need to drop the price.

Windows Mobile always has the old bags for the killing apps: Pocket Word and Excel, and most important deep integration with Outlook (Contacts, Calendar, Corp mail). The ability to stay in the game cannot be questioned.

RIM, Nokia and Sony are still making products that interest some segments of customer without Android. They’re probably going to stay in the game until they make very big mistakes on their own.

Labels:

Monday, December 11, 2006

Switched to google blog

Labels? Not Tags? :-)

Labels:

Wednesday, October 18, 2006

How can I talk to Kim?

Well, to get across the message about “portability”, first I have to suffer the lacking of it.

I was trying to add a link or a trackback to Kim's blog thread on
BBAuth and OpenID move identity forward

First, it wasn't a fault of Cardspace. I sent him a message using the message post page on his site on September 20. The message was not answered and I had no way to tell if it was problem of 2idi.com or a spam filter. (I wish he wasn't trying to ignored me. If I didn't ask the question in the right way, at least I think my idea was pretty original. He got to give me credit for saying something new. I bet.)

Now, his relevant post about BBAuth reminded me to try again. The private way didn’t work. Maybe it should be a blog-to-blog discussion to begin with anyway. He would read user comment on his blog, I said.

Ar, it required another login (not 2idi.com that required to send him a message). Maybe it was better, cus 2idi.com didn't work for me anyway. It was an annoying fact of life of the web without federated identity system.

Now, trying to post a comment, I got this:
https://www.identityblog.com/wp-login.php

Alright, I found no link for creating a new account. Tried with Firefox first. It tried to fetch info for the required plugin, but didn’t suggest me how to get CardSpace plugin with it.

Alright, let try IE then. It didn’t work. Hum, I thought maybe IE 7 would. I downloaded it, gave my trust to a Beta, and restarted my computer. (It was pretty scary indeed. The download page asked me to backup all data I had before I proceed to avoid losing of all my data.) And, going to the site again, it was what I got:
To install Windows CardSpace, install .NET Framework Runtime 3.0.

Another download (I think .Net is a big thing), another restart?, taking another risk of losing all my data?

At least before a new system is widely adopted, all I want to say is that I wish there is an easier way to get a message across.

Tag: ,

Questions to Kim Cameron on Identity

Kim,


I appreciate your work on identity and the way you devote it to the public.


Introduction
--------------
I have a few questions (and some scattered ideas) about CardSpace. I have read most of document/demo/example on your site briefly, but other than that I am new to CardSpace.


Problem Space
------------------
I am looking at it because I am investigating on aggregating information for the same user from multiple sites that each use different authentication. (it is a personal project that I have been working on prior to joining the current company. :-)


Fixing Passport
------------------
My first question is the following:
What do you think about fixing Microsoft Passport, instead of introducing CardSpace. Please see my post on my blog:

identity-crisis


The Laws
------------
For the seven “laws” that you defined, many of them can be fixed without the radical from Passport to CardSpace.

For example, “User Control and Consent” and be built, so does “Minimal Disclosure for a Constrained Use”, “Justifiable Parties”, “Pluralism of Operations”.


Adoption of CardSpace
----------------------------
While I see CardSpace is a good solution in theory, I remain doubt about the adoption, even I aware Firefox and Sarifa demo was shown.


Accessibly that I am not willing to give up
----------------------------------------------------
I access my web email on work, home desktop, laptop, cell phone, and friends’ computer. All of them are running on Microsoft platform (including my cell phone), I don’t foresee all of them support CardSpace soon enough. For example, a friend of mine still use Windows 95, and my Windows smart phone is not upgradeable. I don’t think it is convincing for a user to move to a new mechanism to lose accessibly that he has already enjoyed.


Passport-like mechanism is not unique to Microsoft
---------------------------------------------------------------
In fact, other major portal is using similar authentication mechanism (forward to id server, request user/pass, forward back). They’re doing so in a more controlled manner and didn’t cause as much as bad publicity as Microsoft does. For example, Flickr.com use Yahoo id server to authenticate. I am not saying they don’t have security problem of their own. But, authenticating mechanism like Passport is already there and it worth the effort to fix it, instead of scarping it altogether.


Spoofing and Key Trapping
---------------------------------
You mentioned a few time that Spoofing as a major problem. However, the concept of having a USB drive to store my CardSpace cards concerns me much more than spoofing. How can I trust a computer (in internet café for example) not stealing my entire Cardspace cards in my USB drive once I plug it in? If it require a master password to open my Cardspace card, then I need to concerns about key trapping software in a internet cafe.

To me, Key Trapping problem can safely solved by deposable password like those generated by a RSA token. But, Cardspace doesn’t address it problem, which also part of the adoption problem. (of course, RSA token has adoption problem of its own… because of the cost?)

What do you think about adoption?


Tag: ,

Saturday, September 09, 2006

Flexible Rails (New Book on Ruby Rail and Macromedia Flex)

I am very excited to relay this news from Peter Armstrong:
Flexible Rails Alpha Version Released!

This book is about using Macromedia’s Flex 2.0 and Ruby on Rails 1.1 together. The book presents the technologies as a tutorial. It gives a brief introduction and covers entire Web 2.0 application development: front end (Flex), web tiers (Rail), database and installation. It goes beyond typical tutorial books that you actually got a working and usable application at the end.

Peter is a great friend of mine. He graduated from the UVic a bit earlier than me. He went to work in Bay Area (and back to the Northwest) a bit earlier than me. But, he learned to appreciate Macallan a lot earlier than me. He is an early adopter of technologies and very passionate software engineer.

Excellence job, Peter!

Tag: ,
,

Wednesday, September 06, 2006

Reading List -- Sept 06 -- Google Research

Links in my reading list:
http://labs.google.com/papers/gfs.html
http://labs.google.com/papers/bigtable.html

Tag:

Identity Crisis

A couple weeks ago, I spent a couple of days learning about login/identity. Didn’t come around to blog it until today.


Revisits of Microsoft Passport
-------------------------------------
Although Microsoft almost declares Passport Network a defeat, I think it can be made useful with a few twists. I feel that what it requires most is actually not technical related. First, it should improve its transparence to user on how its work. It should require user explicit consent to allow third party site to identify a user id, and allows easy modification of the allow list. It should not assume user has only one identity. It should also get rid of the centralized data store in msn sites.

Otherwise, the login delegation mechanism and the ability to use the same login for multiple sites are worthwhile to keep, at least until something else come along.
The login delegation works like this:
1/ User visits a site that supports login delegation. (let’s call it the action site for now)
2/ The site forward user to the login site (such as Passport)
3/ The login site shows login page to request user for password (if not yet log in)
4/ The login site forward user back to the action site with some information that identify the user as successfully logged in.

If a user already logged to Passport network previously with the same browser, and visits an action site (that supports Passport) site, the site can request Passport network to identify him by forward it to the login site.

User might expect to browse the site anonymously but was being identified. The problem can be fixed by letting user to identify what site should identity him and what site shouldn’t. If user is being forwarded by an action site the first time, user should be shown an agreement and warning. User should be let to choose between “Always Allow”, “Allow for this session only”, “Disallow”. If user chooses Disallow, every time the action site requests the user to be identified, the login site should always forward back as if no user is login to that computer. User should able to modify his/her choice later by going to the login site directly.

It should not assume user to have only one identity. The problem can also be helped with the above modification. When user logged into works, an action site shouldn’t able to identify his work id without his consent. When the same user logged into his hotmail account, if the user approves, the action site might identify him.


New to InfoCard
--------------------
Microsoft takes InfoCard as the next steps of Passport Network.

Folks in the group identified a few critical rules for identity and authentication in general: The LAWS OF IDENTITY. While all seven of them are totally critical, I don’t think they are covering everything.

The omission was portability (or should I say accessibility). Without total portability, users are bounded to two choices. First, not accessing the feature they want where they need it (for example, don’t check email in a public computer, don’t use cell phone to check WAP mail), or use a different mechanism when another feature is supported. The first choice is painful. The second choice… the strength of a chain is its weakest link. Either way, it doesn’t solve the original problem. It either restricts access, or leave user alone when he needs the security most.

It is a chicken and egg problem. The portability limits the adoption. And, the adoption limits new devices to support the mechanism, which eventually limits portability.

The omission is leaking from the laws to InfoCard, and I think it is fatal.

Tag: ,

Very interesting conference about Grid Computing

Too bad that I have to miss it this year.

Those "Topics in Grid Management" interests me most. I hope they will make the paper/powerpoint slides/video avaliable later.

Tag: ,

Saturday, June 24, 2006

Economy of Scale vs. Scaling Economy

Sometimes, large scale can bring disadvantage. Let’s starts with one of the largest cluster: Yahoo! Mail.

When Gmail comes out, I then realized I how deep I integrated with Y! Mail.
- Pop mail,
- SMTP,
- web mail,
- spam filter,
- another yahoo account for spam
- disposable email address,
- address book synchronization with outlook,
- email notification to Yahoo Messenger,
- message archive,
- mobile email alert,
- stock alert,
- weather alert,
- custom email address support,
- multiple account support,
- color coded email by account,
- ads free Yahoo Plus,
- WAP Yahoo mail, and
- Support reading multiple mails using browser’s tabs.

Yes, I used all of the listed features on daily basis.

I have a lot of sympathy to Yahoo when people mistaken Gmail was better. No, Gmail is years behind.

When Gmail switch from 1G storage to 2G, it hurts Y! Mail badly. At the time, the number of Yahoo mail users vs. Gmail’s is probably 100:1. (the ration these days maybe closer to 15:1)

To match Gmail average storage for each user (most user don’t use anything close to 2G), Yahoo pays 100 times more. Each additional MB for 300 millions of users is sustainable money for acquisition cost for the hardware and operational cost like electricity.

To scale to 300 millions users, it is much more than adding machines. It takes a lot of tricks and R&D to get it close to liner scaling. Anything less than liner scaling costs exponentially more. It makes or breaks the economy and feasibility of providing the service.

Werner Vogels [Amazon CTO] talked a lot about Dark Art, scaling and its pain.

In the famous “we [google] are a $100 billion company” financial conference, Eric Schmidt [Google CEO] quoted that the know-how, software and infrastructure to scale to massive number of user is one of Google’s key strength that they are expanding their leads on. However, Gmail (so does Yahoo) has periods that the performance is slowing down badly. It drives away users if not just growth.

Then Steve Ballmer [Microsoft CEO] use analogy about “data center” build everywhere in the world like electric stations. Later, Microsoft announces the billions of missing future profit to build the infrastructure to complete.

The dot com races reach the point that idea alone is far from enough. It is also about the ability to make economy sense out of the service providing to massive amount of users. The ability to scale the computing power is playing a big part to determine the success of a business.

Tag:

Kelowna

I took two days off. Adding a weekend and friends, I got a great 4 days trip to Kelowna BC Canada. Okanagan Lakes was amazing, so does the scenic wineries along it. Warm weather, clear sky, mild breeze from the lake: life is good! It was 4 hours from Vancouver. But, the highway was great. Two lanes in each direction for almost the whole trip (will be). If you like Napa, you "must" give Kelowna a chance. It can become one of your favorite too.

Tag:

Monday, May 29, 2006

Quest to a more efficient LockSet (volatile field, semaphore, reentrant lock)

While I am staying up late, I recall an article about anti-sleeping pill that I read a couple of weeks ago. Somehow, I tend to have a much clearer mind late at night (I mean after I awake for many hours :-P). It explains why many of my blog was written late at night.

The feeling that I can stay up as late as I want in a Sunday is very good. A long weekend means that I can have two really long late nights for my own. Labor Day wasn't only an extra weekend day in a week, but also doubling my productive time to my project.

And, I spent it on coding a small part of the cluster cache project: the quest to a more efficient LockSet. It actually started while I was driving 100 miles north Friday. I was driving alone and turned off the stereo in my car to think about the problem. After about an hour running thru my 4-steps synchronized block LockSet in my head, I realized the only way to get an more efficient (in term of how much synchronization I need to do) lock set requires the ability to enter a semaphore before leaving the other.

I read the pseudocode of lock from Gray/Reuter book. The sequence looks something like that:
1/ semaphore get on the data structure for the lock set (ie, spinlock S),
2/ find the node represent the lock, or create one if not exists
3/ semaphore get on the node (ie, spinlock N)
4/ semaphore give the lock set (ie, unlock S)
5/ acquire the lock (may block the thread)

It sounds simple, but it cannot be done “efficiently” with Java’s synchronized blocks.

Spinlock (compare and store) is a primitive pillar in concurrent world that it cannot be reduced. Of course, you can simulate a spinlock using Java synchronized block. But, synchronized block is itself built by Spinlock.

Java 1.5 provided a set of new concurrent utilities. I know that it has a lock interface that would allow me to do what the Gray/Reuter lock implementation did. So, I dig deep into it.

After digging deep into the code, I found ReentrantLock is itself pretty expensive. Semaphore is a bit lighter, because it doesn’t maintain a linked list for thread, but it still maintaining more state than I need. But, in the process, I found what I want, the spinlock.

It is exposed to the API thru (AbstractQueuedSynchronizer/AtomicXYZ.compareAndSetState(int, int)). The “spin-unlock” is achieved by AbstractQueuedSynchronizer.setState(). The javadoc didn’t mention the memory model constraints. However, the way Semaphore implemented using those methods implies that compareAndSetState() and setState() acts as a read-barrier and a write-barrier respectively.

I would expect setState() to call a method in the other class that declare native. To my surprise, it simply sets the volatile field, that compareAndSetState() set using a native method.

Why it reqiures setting a volatile field only? I remember volatile as read/write-ordering on that specific field. But, a write-barrier is a different guarantee.

It was because I rarely find a useful case for volatile field. Because the guarantee was on the field only, you cannot use it to guard another data. While I aware JMM changes in Java 5, and followed the mailing list for a few good months, I didn’t pay much attention to volatile field changes.

Brian Goetz explained it very well with his article in developerWorks.

The semantic of volatile field in Java 5 is updated to have memory boundary guarantee.

Now, after this quest to a more efficient lock set, I gained the understanding of not only how to implement it efficiently, but also what was missing in older version of Java, why JMM change is needed, way better understanding of the JMM itself.

It is a good feeling that what I need is already created by someone else before I need it.

Now, I think I know JMM very well, come challenge me with tough questions! :-P

Tag:

Wednesday, May 24, 2006

Good load test to expose the vunerabilities of the cache

In response to a user question on load test, I think it worth a blog by itself. Those are excellence questions. :-)

Coherence is a pretty mature product. I would think it should work pretty for the read-only cases.

I can think of 4 areas that the cached system can be choked with:
a/ network-load for cache synchronization,
b/ cpu load for cache management and synchronization,
c/ cpu load for doing deserialization of your data,
d/ and database access

Depends on the way which the application access the data, the system might still choke on the last two before the cache management overhead become a problem.



Database might still be the bottleneck
--------------------------------------
For example, if I have an application need to scale to high number of users who didn’t sharing too much data among them (HR application that most user concerns mostly about his/her own data), I would want to watch the CPU, file I/O, and network utilization of the database as I am adding more machines to cache cluster, especially if it is a single database (or a cluster of database) that all machines connect to. It is good to do a little projection on how many cached machines that the single database can support.

Deserialization
---------------
If I have an application that most machine shares the same set of data, then I would watch for the time that each machine spent on deserialization. If each machines request the same cache, the data will be sent over the wire and being deserialize on each machine. The time spend on deserialize might be significant. I am not sure Coherence’s near-cache is cache as object or serialized form. It would be good to check. Even if the near-cache is kept as object, with moderate changes to data, you might still see quite a bit of deserialization, because the cache will need to re-fetched.

Really large cluster
--------------------
If you have a really large cluster (say 64 machines or up), then you might need to profile the first two as well. The overhead is believed to be small, but the total times spend on the communication is at least in the magnitude of bigO(n^2), where n is the number of machine. Even the overhead is unnoticeable with 4 machines might show up as significant for when you have 64, for example.

Tag: , , ,

Tuesday, May 23, 2006

Economy of Scaling

Echo everywhere.

Friendster as a counter example.

Google's finical statement. Name scaling as its core strength.

Amazon Werner's interview. Scaling economically.

Microsoft incentive. Putting data center everywhere. The missing $2 billing.

Yahoo mail. 2GB of storage. 10 times the user.

Tag:

Monday, May 08, 2006

Questions about Context Switch in VM

Blogging is often about opinions, solutions, and feedback. But, what if I have questions?

While watching the MySQL video, it surprised me when Stewart Smith said the storage note daemon runs on a single thread. They have its own context switching that is more efficient than using the thread from the OS.

Talking about context switching doesn't work best for some situation, I think of another situation: when the OS runs inside VM like VMWare. Even when the primary OS is mainly idle, the guest VM is still not very responsive.

Would it be because we have too many context switches happening in the primary OS, and it makes context switch in the guest OS happens in bad time?

What VM system is doing to help this situation? Will we have a configuration flag for Linux (or other OS) to let the OS context switch differently when it is a guess? (Of course, the guest machine is not supposed to know it is guest, unless you config it as such.)

Tag: ,

MySQL Cluster

Relay the news from Ramblings. :-)

Tag:

Saturday, April 29, 2006

Relational and jCache Model's differences

Much of Cameron’s presentation also echo my experience (in Cache JDO cache, cluster work in the BPM company, the recent works I do with the distributed cache work, and even reading of Transaction Processing book that I mention a few times).

But, he reminded me an old problem I deal with in Castor JDO. The cache [and lock] was local, but we was trying to respect the isolation such that if there was another machine making incompatible change, data integrity will not be compromised but causing transaction roll back. After years, I now understand the problem better; I know that I didn’t achieve it. To be specific, I didn’t achieve Serialization (or Phantom read) level of isolation.

Let use class registration as an example. A student is allowed to add as many as 18 credits for a quarter. So, we do it in one query, and insert only if the first query return a result that met our rule. First,
SELECT sum(credits) FROM student_course_table WHERE student=? AND quarter=this
Now, if the sum returned by the query and credit of the new course is less than 18, we let the new course to be added.

In this case, we either disallow other thread to insert another course, or, we want to cause this transaction to fail.
The solution is pretty hard to implement efficiently (to allow parallelism). Because we read a range of value to get the result, we need to lock more than the just new row to insert, to ensure result is correct. So, we need lock set.
  • 1/ A simple solution will be all read will also hold a share lock for the table and the item. And, if an insert is issued, the lock of the table is upgraded to exclusive lock.

  • 2/ A more efficient implementation for reader to hold IS (intent share) or IX (intent exclusive) on the table.

  • 3/ More efficient yet is to use IS or IX predicate lock (lock on a range).


  • Cameron didn’t mention about lock set with Coherence. And, I thought the only way to get isolation of right was to use lock set. So, I had a discussion with him.

    It turned out the problem spaces are different. Because jCache use get(), put() which dissent it from caring about the inter-dependencies from data. So, we don’t need lock set. The specification is different.

    So, does it mean jCache model is easier? Not necessarily. They are difficult in different ways. Cameron explained to me why even lock cannot guarantee to be enough (because of out of order message, absolute time problem). On the other hand, the database has a log (journal) that essential defines the absolute time (or order of events).

    However, ORM product designer should aware of the differences in between relation model and jCache model, when they utilize jCache to scale out, especially, if Serialization isolation level is desired. One way is to pick (or let user pick) the right level of granularity. In case of the course registration example, choose student as the lock and relationship as depended objects will work (assume courses are stable). But, in some case, those are difficult problems and require analysis of the trade offs.

    Tag: , , , ,

    Distributed Caching: Essential Lessons

    Deadline was looming. My most productive (day)time in a week is often Wednesday and Thursday late afternoon. Fremont’s Peet’s Coffee was giving out free coffee to celebrate the one year anniversary of the store, and I was a bit over- caffeinated. :-P Under the temptation of getting more work done, I had almost forgone the ISTA meeting. It was a talk about distributed cache by Cameron Purdy from Tangosol.

    Glad that I were there! Beside he scared me with a poor joke in the beginning, the talk was great. (I can no longer remember the joke)

    I remembered his presentation as four parts.
  • 1/ Introduction of himself, the company, the problem space, and the product name (ie, what “coherence cache” means technically).

  • 2/ the evolution (what, how, why) of the distribution cache. (from replication cache, partition, failover cache, local cache, standalone cache server, to write behind cache)

  • 3/ highlights of technical challenge in the product implementation (finite state machine to model the communication; edge case in partition local cache; the gap in load the data and propagate it to cache; no absolute time in distributed system; network constraints: 12 ms latency, out of order delivery; cannot be proved correct, but can't find incorrect; leaving cluster etc.,)

  • 4/ cluster system design guidelines (13 of them, [java] serialization/externalization, identity, define equals, idempotent, etc.)

  • With the great wealth of experience he had with real-life systems, full knowledge of the product since the beginning, the talk was vastly interesting. (and, no one fall from his/her chairs even the talk ran pretty long :-)

    Tag: , , , ,

    Monday, April 24, 2006

    Summary of On Clustering Articles

    A summary my previous blog entries on Clustering:

    High-volume computing (On Clustering Part I)

    Cluster (On Clustering Part II)

    Database Driven and Entity Tier (On Clustering III of VII)

    Stateful Session (On Clustering Part IV of VII)

    Cache (On Clustering Part V of VII)

    In Depth look at Data-Driven Cluster (On Clustering Part VI of VII)

    Future (On Cluster Part VIII of VII :-)

    Tag: , , ,

    In Depth look at Data-Driven Cluster (On Clustering Part VI of VII)

    In Depth look at Data-Driven Cluster
    ------------------------------------
    Let’s focus on the scalability of data-driven applications. The demands for scaling of this kind of applications are increasing, but solutions remains expensive.

    To understand why scaling out this kind applications are challenging, let's start with a nominal view of data operation, and categorizes them into two: read and update (including create and remove). A system performance (P) can be represented as the sum of the rate of Read (V) and Update (U):
    P = V + U

    Ideally, we would like the performance (P[n]) of a system to be linear as the number(n) of machines increases:
    nP = n(V + U) -- ideally

    However, it is not possible. To ensure data integrity, each update must be propagated to all machines, such that the next relevance read operations will obtain the newest values.

    Now, consider a two machines cluster, the performance is theoretically limited to
    P[2] = 2P - 2U

    It is because, for every data update to one machine, the second machine needs to be updated as well. It is the penalty we need to pay for scaling.

    The Equation
    ------------
    Similar, for n machines, the (simplified) performance can be defined as
    P[n] = nP - n(n-1)U

    For a small U (close to zero), scaling can be very linear. With load-balancer, round-robin DNS, co-locations data replication, we indeed achieve very linear scalability for read-only data in the real world.

    However, for a larger U, the performance peak off quickly. The penalty runs at the order of bigO(n^2).

    (Note that the equation above is simplified because as more performance is spent on update, we actual have less to do Read as well. The proper equation rebalance the V:U ratio and the actual penalty is slightly lower for larger n, such that the performance will not become negative.)

    IP Multicast
    ------------
    Some may tempt to think that using IP Multicast will eliminate big0(n^2) performance hit. It is not true. Even if multicast is used, each machine receiving the update packet from another machine need to update its own version of the data. Consider a cluster of 100 machines, and assume each machine makes 1 update per second. So, in every seconds, each of the 99 machine now send out 1 multicast about its update, and receive and process 99 multicasts (99x99 updates). The big0(n^2) term doesn't go away.

    Reduce U
    --------
    It does, however, reduces U, and in some case significantly. Similarly, there are other fancy techniques to reduce U, but not eliminate the big0(n^2). These techniques include data invalidation (instead of full update on each node), lock tables, voting and centralize updates. All of these techniques are important, but also come with its own trade offs. For example, centralized updates basically force us to rely on a single massive machine (that we want to replace using commodity hardware).

    Not Just Data Replication
    -------------------------
    The equation might appear to apply to data replication setup only. It is not true. Invalidating data often mean we need to read all data from a centralized machine. In this case, we are just pushing the updates and scaling problem into a single machine. It is opposite to the goal of scaling out.

    Ad-hoc Cluster Cache
    --------------------
    Caching might appear to relieve the single machine problem for non-data replication setup. However, the same can not be said for clustering. Applying invalidate technique to non-cluster aware cache (some might call it clustered-cache) works for smaller number of machines and frequences of updates. When either or both value gets large, the hit rate of the cache quickly approaches zero, because there are much more machine trying to invalidate the cache, and it renders the cache empty most of the time. (Of course, a true clustered cache design aware of this problem and try to do better)

    Reduce N
    --------
    If bigO square on n cannot be reduced, the next best is to reduce n. In fact, it is an important consideration in real-life tuning. To reduce the n, we want to spun out any read that is not relevant to the data application. To ensure serializable level of data integrity, we need to keep track of relevant read to avoid read->write dependencies Chapter 7.6 on Gray book, or use exclusive lock on tables. It makes the performance penalty to be very high to be spun out relevance data. Only data that has no dependencies on other can be spun out.

    Two the Parallel
    ----------------
    If a parallel set of data can be isolated, we can have run two cluster systems instead of one. For example, if the two set is about as intensive, we will get
    2 x P[n/2] = 2nP x 2(n/2)(n/2 -1)U

    It approaches does not apply to all data. It takes symmetry out of the system, which increase design and administration complexity and cost.

    Partition
    ---------
    Similarly, we might able do data partitioning to reduce n as well. Partitioning can be divided with data-range, hash code, lookup table or other algorithm. This approach also depends on the data schema, and increases administration complexity and cost. For instance, it might require periodical administrative task to load-balance between the partitions. (or, requuire other software)

    Isolation
    ---------
    The ideas of division and partition are the same: to exploit parallelism. The concept of exploiting parallelism can go even further: much further. In some case, they can be automated with some restrictions that is ok with most application. I will share some of them in a later post.

    Execution
    ---------
    They are no silver bullets, either. But, putting them together helps. I tend to think that good enough solutions are already been discovered. The challenging problem of scaling data driven application is awaiting cost effective implementations. The execution matters!

    Tag: , , ,

    Wednesday, April 12, 2006

    Do not format your harddrive

    I have been reading a book about "information theory". This idea come to my mind:

    Do not format your harddrive,
    because erasing memory always increases entropy,
    and increasing entropy is a bad thing.

    Tag:

    Tuesday, March 14, 2006

    Java / Tomcat / Virtualization

    I was talking about Virtualized Linux/BSD distribution with Java and Tomcat

    And, I am glad to discover that it is there.

    I notice eApps.com before I made the previous blogs on December. However, until recently, they get to the price point that is very interesting: $20 a month.

    For the price, I got my own virtual server, and we setup to run as many domain name as I want. To my surprise, it meets almost all my criteria. The HD footprint was about 63M with core linux, jdk and jre, iptables, tomcat, mail, ftp, mysql, ssh and various software. Additional software can be installed with simply checking a checkox. To my surprise, they also provide the XFree86 X11 Libraries. I tested that the JRE is able to utilize it. I was able to run a swing app on the virtual server and display the swing app on my home desktop.

    The performance is rather unacceptable for interactive UI applications. I don’t except I need any kind of UI performance from a hosted server anyway. It also consider slow for shell, or ftp operations. However, it seems to work reasonably well for serving webpage.

    The management software of eApps is provided by SWsoft. The control panel, HSPComplete is very intuitive. The Virtual server infrastructure, obviously also licensed from SWsoft.

    eApps rocks! In term of features, it certainly beat my expectation. Highly recommended!

    [Update June 24, 06] I update to the $30 plan, the performance of eApps are getting pretty good. Not sure if it is because of the plan upgrade, or is it because their ongoing performance improvment in general.


    Tag: , ,

    Monday, March 13, 2006

    Moved!

    I am dedicating this blog for Clustering, Grid Computing and Virtualization.


    I am moving this blog to its own domain: TheBigGrid.com.


    Please update your link and site feed!

    Wednesday, March 01, 2006

    HSQL and Memory Database

    The quest continues on HSQL.

    Creating a new database was a snap. Simply obtained a connection by passing in the jdbc string appending the filename at the end. The database files was created related to the "working directory". (ie, jdbc:hsqldb:file:db)

    I wrote the database table schema in SQL. The SQL syntax was pretty standard. It was also nice that they support IDENTITY for a field for primary on sequence.

    The doc explains clearly how to use its SQL tools interactively or non-inteactively. Tables were created without any problem. Restarting the non-inteactive sqltool still saw the tables I created.

    However, when it came to the next steps: making samples, the behaviour was unexpected. The row that inserted disappeared after restarted. The table was created as "MEMORY" type by default. After reading the doc, it should be fine. The whole data set was kept in memory, but unlike TEMP type, the data was persisted and reconstructed when the database is restarted. However, launching the sqltool again and I find no row with select all. Twisting the table to CACHED or TEXT type didn't help. After a couple of hours, I still didn't get further.

    Ok, tight schedule, the next inline would be Derby from Apache.

    Tag: , ,

    HSQL and Read Uncommitted

    As a database person, the second step is to design database schema. I am going to push much to the database. The JSP will be a thin presentation layer to present different views of the database.

    The initial deployment will not require high volume. But, I expect it will in the future. After a brief sketch of ER diagram, I am start picking a test database.

    I came across HSQL, the Java embedded database a few times when I was working on Castor JDO. So, it was my first choice. The User Guide is very good. Clean, direct, and straightforward without any marketing talk. Small download and its claims good performance.

    As I was reading it, I found myself a little bit disappointed to
    the transaction support
    . It supports only at READ_UNCOMMITTED level, the level 0, which means if user want to guarantee data integrity it has to access data in more restricted way or to do quite a lot of works themselves.

    Whether it matters depends on application. To ensure data integrity with weak isolation from the database can be quite challenged, as the complexity of the database schema grows. However, stronger isolation will incur performance disadvantage and decrease concurrencies. It might not be feasible for high-volume app.

    I would start with stronger isolation first and have the database do more work for me. I would also want the flexibility to turn by changing the isolation level. So, I know I will need to switch before my web app is deployed to the real world.

    Tag: ,

    Web Project

    After years of working on very backend components (O/R mapper, clustering framework), I recently started a web project of my own.

    Of course, I picked the Java camp. JSP Servlet, Tomcat, Eclipse WTP is my technologies, container, and developer tool. I expect to incorporate other J2EE technologies later. But, it is always good to start simple and only with what you need.

    I first coded up some login logic. For my application, it is important to roll my own. I am going to reuse authentication done by other, such as Microsoft Passport, AOL Screen name, or Yahoo when it is available.

    The main verdict with login logic is that I need redirection a lot. Learned the different between a few HTTP response 30x. I hit a bug in IE that if it is redirected by HTTP 302 (moved temporarily), and then is redirected back (with user input) by HTTP 301 (moved permanently), the URL in the “Address” is not updated correctly and is still pointing to the temporary address.

    Yahoo’s own login doesn’t seem to suffer the problem. I need to learn its workaround. :-)

    Tag:
    ,

    Wednesday, December 21, 2005

    Virtualized Linux/BSD distribution with Java and Tomcat

    I has been in search for Linux/BSD distribution with Java and Tomcat support that is suitable to be used for virtualization. I spent a few weekends (over the last few months), but I haven't found any that suits the task.

    Both installation and runtime memory footprint should be very small, such that as many as instances can be fit into the same machine, and the VM instance can be activate or passivate quickly.

    1/ The kernel should boot really fast, (less than a min with P2-500Mhz level machine)
    2/ It boots as little drivers as possible,
    3/ A firewall is optional but welcome,
    4/ A few file transfer protocol and SSH is essential,
    5/ support popular SAN, NAS clients,
    5/ To certify for full Java support, it must also able to support Swing (so, it requires XWin of some sort). Ideally, the distribution only has Xwin client, but not the Server part to save space,
    6/ Total installation 100MB with JDK 1.5.x and Tomcat 5.5.x would be ideal,
    7/ Ant, CVS client, SVN client support (to obtain source or binary for app deployment)
    8/ Kernel working set footprint of 32MB or less,
    9/ Out of the box Java support, Tomcat, (and even an open source JMS), Type 4 JDBC drivers of popular database.

    BEA jRocket's BareMetal sounds very interesting on that respect. Only very little information was released. It is hard to guess its availability. I think I better put my hope to Linux/BSD distribution with Java, at least for now.

    I think such distribution, if available, we be an enabling technology that can change the game: it would make Java much more popular compare with PHP, Ruby etc. Java has been focus on scaling big, and it has been successful on it. But, it is losing ground as the development platform for weekend projects. It really shouldn’t be. Most projects started small. Simple projects are the ground for the bigger. Java Hosting is always limited and a few years behind in term of availability, feature and price. The hosting offerings are even worse than .Net which only becomes suitable for web programming a few years after Java/Servlet getting popular.

    I don’t think the demand of Java hosting is low to begin with. The uncompetitive hosting options reflect that Java system is hard to be maintained cheaply. Indeed, individual, corporation, and system administrators face the same problem. Because of it, the Java hosting market is never getting mature.

    I believe when such Java distribution is available, with VMWare (Microsoft Virtual Server, or XenSource) the game will change in favor for Java.

    I tried to resist it, and I often prefer writing code than doing integration. But, maybe it is the time to roll my own Linux/BSD distribution. I am doing reading on T2 Project and Debian Developers' Corner.

    Tag:

    Thursday, December 15, 2005

    Visa Gift Card

    I saw a banner ad on a news site for “Visa Gift Card” a few days ago. O yeah. It was a neat idea. Why didn't they come up with this idea earlier?

    The answer probably goes back to fifteen years ago. At the time, majority of merchants use physical devices to make an imprint of client’s credit cards to charge. The process didn’t involve electronics at all. It was probably next few days that the physical imprint was sent to the bank for deposit and verification. A merchant might call-in to verify a card, but they can not always do it. I had seen a cashier actually checked the client card number against a thick book with thousands of counterfeit number to protect themselves against fraud. Even just a few years ago, Credit Card still made you pay a big penalty if you spent beyond your limit.

    They certainly can’t make the purchaser of a Visa gift card to pay penalty for over the limit, or no one will buy it. The arrival of Visa gift card signifies that physical imprint device was totally obsolete. Chick-Cuck.

    Links on Clustering Design Docs


    Oracle Cluster File System


    Oracle released a cluster file system implementation to Linux as an open source project since Late 2003. Its design document unveils many typical clustering concerns and solutions. Oracle Cluster File System Design

    Among all, I found the file system header design most interesting:


    The OCFS assume a shared storage architecture to host database in the same cluster. The file header is the data structure for nodes to get access to which chuck of data, to check isAlive check, to do voting between node etc.




    MySQL Cluster Architecture


    MySQL has a MySQL Cluster Architecture Overview document on its site. (requires your email). The interesting separation of Data nodes and Server nodes. Each machine was assumed to have its own storage. The Data node keeps as much information in memory as possible, and commuicate with each other via network commuication. It looks appropiate to be tailored to be Database is the app server model.







    Tag: ,

    Monday, December 05, 2005

    Database is the App Server?

    A few weeks ago, in ISC2005 (Supercomputer Conference), Bill Gates mentioned his vision of Grid computing. According news.com, his vision was to bring the computation closer to the data. The article didn’t mention how and why. Google didn’t yield much else on Gates’s speech.

    Even though I didn’t know more about Bill’s version of data grid, I tended to agree.

    Sun’s Grid
    ----------
    For example, Sun’s current Utility offering ($1/cpu day) are rather limiting. It is only suitable for low I/O and computation intensive application. It rules out most application that requires a database, which most enterprise application and researches analysis requires it. There was no option to rent long-term storage such as SAN that are local to the grid. Does the fact that the machines are rented means the software must be reinstalled every time? What if I want to form a cluster with a lot of machines? What speed can I expect from the inter-machine connection? Will they share the same LAN (switch and router)? Are the network shared with other computers that other people rented? In fact, the white paper I read a few weeks ago on Sun’s site suggested something about secure connection to and from your company and didn’t even mention clustering. It worried me.

    It is true that owning and maintaining machine are expensive and a large capital investment. However, the Sun’s value-added is limited the physical hardware and lower level OS leasing and maintenance. It is hardly a big part of the TCO. The simplicity view of computation power, the remote administration limitation (bandwidth for example), and the temporary nature of renting sounds like adding a lot to the system maintained cost. Sun and Jonathan simply needs to come up with a more convincing story.

    EGA
    ---
    In constrast to Sun's current offering, Enterprise Grid Alliance's "Reference Model"
    capture better the complexity of what are required to make Grid a reality for enterprise. (to be fair, Sun is also onboard. The current offering is bad on itself and doesn't necessary capture Sun vision to the future.)

    Data Grid
    ---------
    Now, back to Gates’ vision of data grid. Over the weekends, I read a few articles from Jim Gray, the authoritative of Transaction Processing who now working for Microsoft Research. It unveils what had gone into Gate’s mind.

    Distributed Computing Economics by Jim Gray.
    And,
    A Call to Arms -- Avalanche of Information by Jim Gray and Mark Compton.

    Active Database
    ---------------
    My hobby to implement a distribute locks and cache also makes me aware of how hard it is to ensure data integrity all the way up to phantom level. Together with Jim’s articles, my vision of future high-volume enterprise computing calls for modifications. Maybe database will take a much more active roles: applications live inside a database, instead of split to different tiers. It is a dangerous thought.

    I am also surprise that it is Jim Gray from Microsoft who has this vision, instead of marketing from Oracle. Oracle has been an active advocate of database trigger; it puts JVM into the database since the early days, and added CLI into it recently. But, if Jim Gray represents the unison vision of Microsoft vision, it is more database-centric than anyone else.

    Tag: , , ,

    Sunday, December 04, 2005

    IS, IX and SIX

    Deadlock and IS, IX and SIX
    ---------------------------
    Occasionally, I hit deadlock when developing a database application. Entering the Oracle error code, a page about Oracle lock mode come up: IS, IX and SIX, S, X. Most people recognized S as Share, X as eXclusive. It maps well to Read or Write lock.

    LockSet
    -------
    On other occasions, I developed in memory lock set. Read/write, and even (update lock) are really easy, and I used it as a starting block. On the other hand, maintaining a set is more difficult to do efficiently. The main difficulties lie in obtaining the specified read/write lock struct from the lock set. If the specified lock doesn’t exist, a new struct representing an individual lock needs to add to the lock set. Two threads try to acquire the same lock must resolve to the same struct instance. So, the obtaining of a lock struct from the lock set must be guarded by a semaphore S(t). After a thread obtains the lock from the list, it then tries to acquire the lock. If the thread is acquiring the lock in a mode that conflicts with what has granted to another thread, it waits on the lock. The acquiring is protected by another semaphore S(r) to allow concurrency. In this way, acquiring different the lock will wait on different semaphore. Similarly, when the lock is finished, it go into S(t)again to see if the lock can be removed from the list. Based on this thinking, I developed this algorithm (of course, the actual code look different):

    synchornized(lockSet) {
    Lock lock = lockSet.get(id);
    if (lock==null) {
    lock = new Lock();
    lockSet.add(lock);
    lock.incrementVisitor();
    }
    }
    synchronized(lock) {
    lock.acquire(id, mode);
    }
    synchronized(lockSet) {
    lock.decrementVisitor();
    boolean free = false;
    if (lock.hasNoVisitor())
    synchronized(lock) {
    free = lock.isFree();
    }
    }
    lockSet.remove(lock);
    }

    I believe this is working code. However, it takes 4 synchronized blocks to achieve it. This is pretty inefficient: there must be a better way.



    Tag: , , ,

    Tuesday, November 29, 2005

    Microsoft always thought the hardware should be free. Intel always thought the software should be free...

    Economy is interesting. Consume seeks cheaper and better products. Producer seeks bigger market.

    Barney Pell made a “meeting minutes” on his blog. Sam Jadallah made an interesting quote: “I talked to CEO of GM, trying to get media inside the car. He said: 'at $250/month of ads or services, I can actually give you a car'. That's believable! These devices, phones, cars, are just vehicles for delivering advertising.”

    Car manufactures and dealers have been the big advertisers. It is virtually showing for a third of TV ads at peak hours. Now if GM's CEO “thinks” it is better off to compete with Google on the ads dollars, who is going to buy the ads? Maybe Toyota? Or, maybe Google?

    Tag:

    Monday, November 28, 2005

    Future (On Cluster Part VIII of VII :-)

    The future of high-volume lies in flexibility and integration. Grid computing and virtualization will come into plays.

    Utility computing model today is very limited. While Sun is trying very hard to convince people that computer power can be purchase as easy as electricity, it is based on a much simplified model. The currently model only apply to computing intensive tasks, such as simulation and data analysis.

    It is not a situatable model for most enterprise applications, which generate very high network IO. It is critical for these applications to stay close to the data. The amount of data is also huge that cannot be moved easily. Those applications might require on large array of other applications or supporting system. Security concerns also makes it more fesible for those applications and supporting system to be put behind a same set of corporate firewall for access controls. Once network and disk IO performance, backup, large amount of configuration, seucrity, and data to be sync between a company and computer power provider, it is no longer as simply as measuring megawatt of electric power. With these constraints, leasing model simply doesn’t make sense.

    Grid computing and virtualization that remains inside corporate firewall will result in very significant saving, without of cost and disadvantage utility model. Virtualization like VMWare provide a lot of flexibility. Each service can be setup in its own virtual machine. Depending on demand, they can be moved into or out of a single physical machine, suspended or activated. A three server tiers of a four tier system (Web Server, Application (business logic) Server, and Database Server) can begins in one physical machine with 3 VM instance. In fact, multiple four-tier system serving different departments can begin in one physical machine. As the traffic increase, the heaviest loaded tier can be moved out to live in its own machine. Service can also be occupying more machines in peak hours. The application can shrink to fewer machines during off-peak, leaving more machines in the Grid pool for offline reporting and data analysis. (VMWare has the functionality of moving a VM instance from one machine to another in real time -- less than 5 second downtime, with heavy system like Microsoft SQL Server running and taking more than 50% of CPU time constantly. The machine in which the VM move to can also has different configuration. It is wickedly cool. Check them out in your next trade show.)

    Today, Oracle already claims to support Grid computing in the database tier (previously known and Oracle Parallel Server). Machine can easily added to an existing cluster to support higher loads.

    In the future, application server will support similar Grid computing capacity. Such application server will able to spawn new instances of the application server into another other physical machine upon demand. For it to happens, a scalable distributed cache and lock will be integrated in the application server. Application is needed to be developed on a slightly different programming model, in which it accesses database indirectly (thru ORM layer that is integrated with the distributed cache and lock), locks resource thru the provided lock framework, and subscribes to incoming message endpoint indirectly via the service provided by the application server. Today component model already provide most of the abstraction, the change to the programming model is minor. The model does not incur performance penalty for single node operation nor limit programming flexibility. In single node operation, remote functionality is turned off. Small synchronization overhead will incur only when the application server is resided into multiple machine. The Web tier or Web Service tier can pick a random instance of the application server among the pool to make request. Each application server instance has the ability fulfill the request or forward by considering the state of the cache and locks state, such that the pool work in completely parallel manner.

    Tag: , , ,

    Sunday, November 27, 2005

    Cache (On Clustering Part V of VII)

    Caching
    -------
    Cache to an application is like a palette to a painter. A set of data is temporarily put on the cache as the application runs, the same way a painter picks a few color to the palette as he works on the painting. Data might be combined and modified in the cache, the same the color is mixed on the palette. The cache keep some frequently used data in memory, and save the application from accessing disk drive or database all the time, which save time. The palette keeps the mostly frequently used color, and reduces painter trips to the color tub. Of course, no analog goes all the way. In this case, data does change and need to store back. But, painter doesn’t put the color back into a tub when he discovers a new color that he likes.

    Memory is multiple magnitudes faster than drives and database. By saving access to drives, and keep some data in the memory, the application runs a few times faster. Of course, if requests are extremely random that doesn’t tend to repeat, and/or the data set is extremely large compare with the memory size, cache might not be well utilized and incurs unnecessary overhead. But, it should be looked at exception.

    If multiple machines are used, and data need to be stored back, keeping the data in the cache of each machine can become a challenge. When we have one machine, we always know if the data is updated or not. If we modify it, then it is the new data, and we need to store it back. If we didn’t modify it, then it is up-to-date. With multiple, we didn’t modify it, some other machine might. Synchronization mechanism is needed and it must handle machines that try to modify the same data at the same time. Distributed cache is designed for just that. In the Java world, multiple J-Cache implementations are available. Tangosol Coherence appears to be the leader of the space and claims deployed customers in multiple industries.

    Turning the cache off can be a painful answer. It means now we need a few times more machines just to achieve the performance we had with one. One strategy is using cache in the data store level, which helps. It is like having a palette, but instead of having carrying it, it is fixed on the table. It is often what the “Share-Nothing Architecture” does.

    Distributed cache is relatively new and requires additional integration. I envision distributed cache will be integrated part of application server in the future and will be part of J2EE and .Net offering. I also saw a LAMP stack company ActiveGrid job post for engineer to implement distributed cache.

    In my opinion, the use of distributed cache is preferred over share nothing architecture and it will be the model of future. I am actually developing one myself as my hobby. We will see how the industry unfolds on this.

    Tag: , , , , ,

    Saturday, November 26, 2005

    Stateful Session (On Clustering Part IV of VII)

    Of course, high-volume computing didn’t stop at real-time updates. Let’s continue!

    Stateful Session
    ----------------
    Applications like Y! Mail can use stateless session approach for to achieve scaling. However, in some application, it is highly desirable, in programming perceptive, to have stateful session. Information of the current logged in user is good candidate to be store in a session. Some site might enable GUI hint to allow editing of some attribute. In other case, stateful session also enable easy design of web application that involves multiple steps. A questionnaire might involve multiple pages. With stateful session, previous page’s answer can be retained in the session. Some new high interactive site that provides full desktop-like experiences use stateful session heavily to store current active view, opened document, table sorting order etc. Session does not tend to be survives forever. It is often discarded after timeout or user log off. Scaling such application requires different strategies.

    (side note: many developers site use shopping cart as an example for stateful session. However, it is probably not the right approach. User might log out due to various reasons. But, it is often desirable to have the cart still when user logged back in.)

    Application server or web server framework is often used for session management. Scalability is often built into the application server of framework itself. For web application, HTTP load-balancer (dedicated hardware) are often use to aid the task. A HTTP load-balancer understands HTTP session ID, or rewritten URL and always reroute the same client to the same server, (unless the server failed) such that the session can be lived in the physical server. Failover of session has been a major selling point of an application server.

    It is important to note that the functionality of stateful session can be simulated by moving all the trasient state to browser (probably hidden fields in HTML form), always store them back to the data layer, or both. In a way, stateful session can be viewed as "division" strategy. By factoring out data that is transient and short-lived, and keep them in memory, it reduces the hit to the data layer and increases the performance. The division also enable the use of HTTP load-balancer.

    Tag: , , ,

    Database Driven and Entity Tier (On Clustering III of VII)

    Database Driven
    ---------------
    Database provides query, indexing, transaction, storage management, data security and other support. Database driven application is very common. Database tier is being inserted in between the application and storage. Query optimization and indexing enhanced performance and help the application to achieve higher volume. However, its power and convenience is usually turned to extra feature to the application and rarely leave user impression of performance. Share-nothing strategy can be used. Bandwidth from the application tiers and database tiers are very critical to the overall performance. Before gigabyte ethernet becomes common, high-volume application utilize multiple 100Mbps ethernet card per machine to achieve the desired bandwidth.

    The scaling and tuning of database its an art of itself. Different optimization models and expertise are widely available, some for signficiant cost.

    When performance is absolutely critical, a subset of the table can be divided into memory-only database, like TimesTen (now a Oracle product) It provides advanced functionality like other DBMS. However, its keeps all its data in-memory, and requires no disk access for query.

    Entity Tier
    -----------
    In the J2EE world, EJB was famous on its EntityBean concept. EntityBean is Component model representation of a single entity (relational db) in a Database. (loosely speaking, component is an object which lifecycle managed by a Container.) With this model, an application into two tiers: the presentation tier and the business logic tier. The EJB model represents the later. Direct access (thru RMI or RPC) by client pretty much went out of fashion. The presentation tier most likely is a web server that serve HTML or expose the Business logic functionalty as Web Service. The scaling and load-balancing of Entities are enabled by the remote Stub and Skeleton model. Clustering is provided by that application server with EJB container. The Stub on the client side, when instantiate, was assigned to one of the server.

    The database access (persistent) of the entity is sometimes provided by the container. However, no serious application developer didn't complains about the performance of this container managed persistent (or CMP). The first EJB design was simply too heavy-weighted. EJB 3.0 moves to a much ligher-weight model. It appears to be good enough for long-term sucess. However, a very wide selection of programming model and framework exists.

    Tag: , , ,

    Ethernet is pretty much dead

    I still remember that an instructor of my college favored token ring and thought it was a better technology. He envisioned that if token ring had better marketing at the time, it would have dominated the market. The cost differences, he argued will disappear because of the huge economy of scale once a technology like that had become mainstream. But, it was too late. Ethernet had already dominated the market. I thought he was right.

    It was eight years ago. After all, both of protocols vanished. People still refer to Ethernet all the time. However, is Ethernet really Ethernet anymore?

    Since the market moved to 100Mbps, the concept of Hub phrased out. Because a hub does not wait until the whole frame (layer-two packet) to be transmitted before forwarding the frame, a hub has advantage on lower latency. However, it becomes less significant as the speed increased 10 folds. When the frame size remains the same, the latency is cut by 90%.

    The beauty or uniqueness of Ethernet is actually happening in the layer 2. It sends frame after waiting a random period, send the frame, do collision detect, and resend if necessary. Now, Ethernet is connected port to port from a computer to the switch. All Ethernet are full duplex, Cat-5 twised pairs cable use different wire for transmission and reception. It essentially guarantees that no collision is possible. Say bye to aloha protocol that powered corporate network for more than 10 years.

    Using switch actually bring significant performance advantage. In a busy network, the theoretical network thru put is about 57% (if I remembered it correctly). With full duplex and use of switch, we are getting pretty close to 200Mbps.

    Most of the switch (at least entry level) use 5-port switch chip. The interconnection is likely to be Star Network. The Mac layer packet is forwarded exactly once to reach the destination.

    8-Port Switch can be constructed by combining two chips. One port of each chip is connected on the circuit board level. Packet is forward 1 to 2 times. 16-Port can be constructed using 5 of such chips, with one port of each of the 4 chip connecting to a port on the 5 chip. Packet is forward 1 or 3 times.

    Tree network won the final battle.

    Friday, November 25, 2005

    Cluster (On Clustering Part II)

    Cluster
    -------
    When the volume cannot be satisfied by wait and division, and it is more economical to spend engineers’ time comparing with buying massive computer, multiple machines are used to form a cluster, software and configuration can be modified to accommodate.

    As the data scenario getting more complex, more scaling strategies are employed. Let’s try to categorize the nature of high-volume applications.

    Static data
    ------------
    Scaling static data to high-volume is easiest. Putting up the multiple machines, replicate the data, and make them serve client randomly.

    Non real-time updates
    ---------------------
    Data in most useful applications changes, however, they don’t necessarily change frequently. Even if the data changes all the time, a user may not be given the most updated version of the data. Those applications can be divided into two set of computers: low-hits updates, and high-hits queries. Updates are done as frequently as needed to the updates set. All queries hit routes to the second set which has its own set of data that is a snapshot of the past. It scales the same way as for static data. Once in a while, changes are aggregated from the first set and updated to the second set as a batch operation. The batch operation can be done periodically, or done during lower-traffic period. Majority of high-volume applications fall into this categories.

    For example, USPS tracking are used by thousands of user everyday. The updates are brought from the low-hit set to high-hit set every night. During the day, no matter how many times you refresh the page, it was the data of yesterday night to show in your browser. UPS update the data much more frequently. But, it is still not real time. The “delivered” status showed on the site only an hour after I had sign for my package.

    Many e-tailers show their stock status as “in stock”, or “low”, or “out-of-stock”, but the status is not updated instantly, but only a few times a day, even though real-time data will be highly desired.

    Likewise, search engine are divided into update set and query set. In this case, both set require sophisticated clustering strategies itself. Reader can refer to The Anatomy of a Search Engine for details.

    But there is another reason making this approach is so popular. The small-hit set (might well be one computer) already exists before the business goes online. For example, a computer stores might has its stock management system. Adding a new set of computer to serve web customer queries about stock makes a lot of sense and bring minimal modification and burden to an existing system.

    Real-time updates
    -----------------
    When both real-time data and high-volume are needed, customized software are needed. For example, Web mail fall into this category. Scaling of mail is a common-enough problems. Finished customized software for mass mails is relatively easy to find.

    However, early companies like Yahoo highly customize their mail servers to achieve the volume. Yahoo employed a number of strategies to scale, including Partition, Division, memory only data. First, yahoo partitioned its servers with geography locations. User id is unique globally, except for a few countries. Japan is such as exception country. For most sites, like Y! Mail, logging thru other countries page will route you back to the server of your country. Even the Y! id is unique, the email address are not interchangeable. For example, sending email to helloworld@yahoo.ca will bounce if the account is opened with yahoo.com.

    The second strategy is to use in-memory database. Y! has 300 millions user worldwide, the last time I heard of it. Even with it huge number, it is still possible to store the entire id list in the memory of a single server. Only the most primitive data are stored along with the id, such as the country code. Of course, the data itself is persisted back to storage and is fully backed up. The list changes relatively slowly. Multiple servers are used along with the server that other server will do the query. The data replicate among them. It takes a few minutes for a newly created account to propagate globally. In this case, it is acceptable.

    To make a useful app, the in-memory database doesn’t provide enough information. Some data also might change more rapidly. For example, when did user logged in last time, when the session is expired, has user logged out. Another layer of division is used: login server. The number of user who currently logged in is way smaller than the number of total user, even though yahoo allows user to log in a few times simultaneously with multiple browsers or computers. A cluster of computers are used just to keep track of logged in session. Web apps like Y! Mail check against a logged session with the page cookies to determines if the operation is valid. The logged server is high-volume itself.

    The application itself is handled by another cluster of server. Those servers are designed to be stateless. So, all operation can be randomly route to any of the server and serve user request. It posed some limitation and challenge to the developers of the application, but it works.

    Scaling: The Share-Nothing Architecture
    ---------------------------------------
    The Y! Mail architecture capture the core design of "Share-Nothing Architecture". The LAMP (Linux Apache MySQL PHP/Python) stack which is increasingly popluar use this as the scaling model. The same can be said for Ruby on Rails.

    Another layer that requires clustering itself is the storage. Because the application itself is stateless, all data is coming from the storage layers directly. The storage receive extremely high hit. Cache and RAID techniques are certainly used. Y! uses NetApp solution for scaling of its storage layer.

    Tag: , , ,

    High-volume computing (On Clustering Part I)

    A long way coming, I started my programming career as a Technical lead of an Object-Relational mapping software, Castor JDO. Concurrent programming, transaction, real-time data, data integrity, caching, distributed, clustering, scalability, load-balancing, high availability, and high-volume have always interested me since my first job. All of these are old terminologies that exist since the early days of computing. I called them the essences of Enterprise Computing. But, after years, such software is still a huge challenge and great endeavor.

    High-volume
    -----------
    High-volume applications aren’t new at all, but the web makes many applications becomes high-volume. It the early days, high-volume is often handled by massive computer and highly customized software. Just like water always finds its way downward, the market finds a cheaper way to do what needs to be done. As water gather in a meadow and the water level rises, it finds (or breakout) new and faster ways to go down. As high-volume applications become common, new and cheaper way are designed to handle them.

    One machine
    -----------
    The first method to scale is to fully utilize one machine, adding more memory, faster disk, and more CPU if the system supports it. Multi-cores, or multiple CPU system becomes common. Sun’s provides some really good resources in this area. You can being with this video Sun Nettalk (Scale)

    Wait
    ----
    The second cheapest solution to scale to higher volume sometimes is to wait. Of course, it is often unacceptable in most case. But, computer is doubling its speed every 1.5 years. The market constantly introduces new technologies announced that helps increase system preformance. Who said “waiting never solve the problem?” Same apps will be running twice as fast, if you spend the same hardware cost again 18 months later. If your volume is doubling less frequently than every 1.5 years, your problem is solving itself eventually.

    Cost of making software or is very expensive. The cost of spending a man month of engineer time cost the same as a decent mid-range server. One man month doesn’t really get that much done. It is especially true if the software develop cannot be resued. Good engineers are always overworked, and should be considered as limited resource. However, salary is often considered as a fixed cost to business. Sometimes management favors spending engineer time over buying hardware. Of course, acquired hardware isn’t maintenance free and incurs cost.

    Division
    --------
    Division is arguable the most important aspect of high-volume computing. As you will see from the following paragraphs and later blog, high-performance often comes from the right division. The first steps are to divide different service into different machine. For example, move HTTP service out of the machine that serving mail. The second is to partition the service across nature boundary. For example, mail belongs to a heavy site to be single out onto another machine. Going deeper, you might move HTTP server that serve images server from server that serve HTML pages.

    Application might also be divided according to its scaling behavior. Application that serves static information is easiest to scale. The static part can be factoring out into a different machine or a set of machine, and leave as much as CPU to the part that change frequently. Move storage into a separate machine, or dedicated hardware. Adopt multi-tiers architecture and splits each tiers into machines with high-speed network connection.

    Tag: , , ,

    On Maintainability

    My first computer comes with MS DOS 3.1. Books were very expensive for a 5th grade kid, there was no internet, and “/?” parameter wasn’t incorporated in the dos command. Only knows “cd”, “c:”, “echo” commands, the menu are for display only. I made a batch of single-letter name batch files to achieve menu like behaviour. Not for long, I encounter of software maintainability shortly after my first menu system was done: I got a new game from a friend. The new game became one of my favorite, and I would like to put it ahead of other games. I rename each batch files and edit each of the menu’s “echo” line. Not for long, I gave up the idea of having a menu. It was my first encounter of software maintainability. Even for simple menu like this, the cost of ownership was beyond the time spent of creating the system in the first place.

    Ok, enough stories of the good old day. :-)

    Tag:

    PC-compatible XT


    I still remember the day that my mother bought us our first PC to share with my brother and sister. I was grade 5 if I remember correctly.

    My mother was business woman and she saw that computer was more widely and wanted us to learn it early on. At the time, the original “red-and-white” Nintendo was fiercely popular. A lot of our friends had it. She rejected the idea of Nintendo and bought us the computer. The monitor alone cost about a month of most people salary. As recommended by the sales, we got higher model of Thompson CGA monitor. The speed was in middle of the pack, 5MHz PC-compatible XT machine with a “Turbo” switch that could bump the speed to 10MHz. Other configuration was pretty advanced. It has 2x 1.2MB floppy drive (dual side), a 30MB self-compressed hard drive from Seagate, and 640KB of memory.

    Some software and games were included. The day to pick up the computer had come, when we pick up the computer, the sales showed us how to launch a few of the software: putting the floppy in, type “a:” type “dir”, then type something ended with “exe” and change disk when it asks you to. Looked easy!

    When we got home, I was confused about “dir” and “dos”. Some other software was actually self-booting. So, for a few days, I keep booting to the simple games, until I went back and ask about the “dos” command. And, I started by typing whatever words showed in the dir list, and spend days on the stack of disks.

    After awhile, I learned about batch file, and trying to create a menu such that we could jump directly to the games with a letter and enter key. I called it my first “programming” experience. It was long way coming.