SimpleDB. Simple? well…. DB? umm…

Amazon has announced it’s next software-as-a-service play – SimpleDB – and the technical world is all a-flutter. You can see the breathless reporting all over the net. Is this the beautiful panacea of unleashed database power it’s being reported as?

A few months back, I examined S3 and EC2 (two of Amazon’s earlier web service offerings) and came away with the sense that Amazon is changing the rules of the game in a big way, but that there is still no good way to implement an online scalable database with these services. (I know that there are attempts to put a relational database on EC2, but they seem to be quite painful.)

In short, S3 is a great scalable online file system, and EC2 is a great scalable processor. Neither one of them allows us the sort of slicing, dicing, remixing, and re-serving of data that the world has come to expect from a database.

So when the news of SimpleDB hit the wires, I figured that Amazon was stepping up to answer my cry and providing an online scalable database. Well, they are, and they aren’t.

They are providing a way to store structured information online, but it’s hard to call it a database. In fact, it’s a bit disingenuous of them to do so.

Those who are concerned about the details can quickly find out for themselves that SimpleDB has nothing to do with the Relational Model that has been the basis for databases for the past 40 years. To call something that doesn’t even smell like the relational model a database is pure marketing.

But let’s leave the marketing aside – the service is in the field, it’s a totally new beast, and it’s called what it’s called. What does it look like? It looks like a place to store object instances. No classes, no schema, they-are-just-what-they-look-like, object instance. Besides coming with a whole new metamodel, it comes without a lot of the sugar that mature database systems have led us to expect. There’s no fulltext search, queries are lexicographic (so they don’t deal well with numbers or dates), the set of operators on a query is more limited than we’re used to, there’s no verification of the data, no triggers, etc.

All this is fun, but the real kicker is this – reading data from SimpleDB immediately after a write may not reflect the latest updates. This is called eventual consistency. That’s what you tell your customers – it’ll get there eventually.

What happens now? What happens now is that the developers start to relearn the way they handle data. No existing database applications – back-end, front-end, or middleware – can be easily ported to run on top of this new beast. You can’t tell your customers that the data will get there eventually, so you tell your developers to cover the gap. This service might save you a database administrator, but in the near term, you’ll need another developer to take his place.

In the long term, we’ll start seeing SimpleDB, S3, and EC2 aggregated under another layer – one that presents the tried and true relational model. SimpleDB will handle the tuples, S3 will handle the BLOBS, EC2 will grind the queries, and the application developer won’t have to worry about it. Whether Amazon delivers it or someone else does, it’s coming – the reliable online scalable database.

(Some of the sharper analyses: O’reilly compares pricing of SimpleDB to S3, Marcelo smells a familiar data model, Inside Looking Out lays out some of the technical hurdles, rc3 wonders how to tune it, and the comment from daveadams sings a love song to the relational model)

Joy of the Job

I just spoke with a friend about the mission and structure of an organization he’s setting up. He floated a few ideas past me. On some of them, his eyes lit up and he almost jumped out of his seat. On others, he slumped in his seat, and looked as if he would much rather be somewhere else. Why would he even bother considering the latter? All sorts of external pressure – what he thinks the funders want, pressures from the other people mixed in to the organization, a nagging sense of what’s supposed to be done. All sorts of nonsense.

It’s going to take a whole lot of energy to make an organization work. Without excitement about what you’re doing, you’re dead in the water. When you’re planning out a business, excitement is the compass. When you’re running it, excitement is the gas. Maybe some people get excited about the money; I supposed such things are possible (if a bit twisted.) The best workers in any job are the ones who are excited about the job. Certainly, the leaders of a non-profit better be all sorts of excited about the mission and about the organization.

Other factors come in, for sure – you need to build an organization that can succeed, you need to keep up a good relationship with your funders, you need to build a team that can work well together. None of these things should be done in a way that squelches the excitement that the leaders of the organization have. If you allow that to happen, you’re hobbling the horse before it leaves the gate.

Global Development

With cheap software talent all over the globe, it’s tempting to spread a development project around the world and around the clock. You can skype over to those guys for free, it’ll be like you’re in the same room, and hey, your developers are sleeping a full third of the day; you’ve got to be able to improve on that!

The pitfalls of adding people to a team are well known – you add communication and learning overhead that will often end up delaying, rather than hurrying, the completion of a project. But let’s say you manage that issue – the teams are brought on line together at the beginning of the project, or the new guys have experience in the field, or you budget in a healthy load of learning time – you still run into the tragic poetry of Conway’s Law.

Simply stated, Conway’s law says that any team that produces a product is constrained to produce a product that is a mirror image of its communications structure. It sounds like black magic at first, but it really makes all the sense in the world. Creating a single thing – anything – takes a lot of communication and shared understanding. Put a team in two different buildings and you’ll already start to see trouble. It will be easier to make assumptions and keep coding then to walk outside in the cold. Introduce time zone differences, language barriers, and shoddy IP telephony, and you’re lucky if the two teams manage to communicate at all. Throw in one more team on another continent, and you’re doomed.

When my little company was swallowed by IBM, we were brought under a team in Massachusetts that was also trying to swallow a team in Boca, coordinate with a team in Texas (that had a person working remotely in California), manage the offshore folk in Malaysia, and be manged by the suits in Silicon Valley. Together, we were tasked with putting together a system that really had to function as one unit. A full half of the time was spent in throwing pieces back and forth over the walls between the groups groups and trying to figure out who was responsible for what. Only when we disengaged a bit, and plowed forward on a piece that could really stand alone, did we make any progress. In the end, we created as many different pieces of the puzzle as we had teams, and they worked together just as well as our teams did.

There are situations where having multiple teams can be helpful. Have the offshore team do the translation work, write plug-ins against a well documented API, or do the data-entry. As long as the assignments and the architecture are in line with the implicit team dynamics, you’ve got a good shot.

Let’s assume for a second that what I’ve said is right. The big question comes from the development methods of the major open-source projects. They have people spread out to the four winds, and they manage to come out with some really great stuff. Granted, a lot of these projects have a very small team at the lead with the rest of the folk filing in as testers and patchers, but some of the biggest and most successful are entirely distributed. How does that work? I’m thinking that there may be another dynamic at play.

If a team has a home in a physical location, bringing on a remote team will be a serious chore. But if the only home for the project is on the Internet, then no one is remote. If everyone knows that the core communications happen online, then the project has just one home – online. Remote communications are no substitute for face-to-face communications, but if they are all that you’ve got, you make the best of them. If I don’t have any team without the ‘remote’ team, then the whole distributed team can really function as one.

Is it possible to carry the mindset of the entirely distributed team into a situation where there are people physically concentrated together? I haven’t seen it happen. My intuition tells me that it’s a tall order, but I’m happy to be proven wrong.

Of Content, we have Plenty

Nicholas Carr argues that a key factor in Kindle‘s downfall is the lack of an already living and healthy market for free reading material. I share his misgivings about the Kindle, but I don’t agree with his reasoning. Although new books are generally locked up in copyright, there’s no lack of free reading material. That’s what we (as humans) have been busy developing for the past 15 years or so. It’s called the Internet.

And that’s really where Amazon is missing the ball. By locking down their device to access a tiny percentage of the potential content, and making it difficult to get the free content on and off, they’ve painted themselves into a proprietary corner. What they could have done was put a lightweight web browser into our hands – now that would have been fun.