Morgan is looking in to EC2 on MySQL, so I thought I'd pipe up about stuff I've been playing with.
The ephemeral nature of the data is troubling, because at best you're going to have some lag before you can back stuff up to S3 or some other place. (Unless that was happening continuously... but we'll come back to that) On the other hand, if you're doing app sharding or something similar, this essentially just makes you plan that your machines can all die at any time. If you used Google's semi-sync replication patch, you could easily spin up little replication clusters as needed.
Hm. Clusters. Well, I'm also a fan of MySQL Cluster. What if you ran MySQL Cluster on a single ec2 node (both data and sql nodes)? What if, further, you wrote (and by you, I mean me... code coming soon, I promise) an AsyncFile implementation for Cluster that read and wrote to S3 instead of local disk. Cluster itself is already decoupled from disk write latency. Sounds like a good UC talk...
Then you could do the same thing with a multi-node cluster, but Amazon doesn't let you control the network between EC2 nodes, so the latency there could kill you.
I have put together a few scripts for spinning up an EC2 node with MySQL (and Cluster) ready to go, but I'm sitting in an airport right now, so I'll have to post the code later.
I think the possibilities for scale-out here are fantastic, but like all application partitioning approaches, they do require some engineering of the application to take advantage of it.
The ephemeral nature of the data is troubling, because at best you're going to have some lag before you can back stuff up to S3 or some other place. (Unless that was happening continuously... but we'll come back to that) On the other hand, if you're doing app sharding or something similar, this essentially just makes you plan that your machines can all die at any time. If you used Google's semi-sync replication patch, you could easily spin up little replication clusters as needed.
Hm. Clusters. Well, I'm also a fan of MySQL Cluster. What if you ran MySQL Cluster on a single ec2 node (both data and sql nodes)? What if, further, you wrote (and by you, I mean me... code coming soon, I promise) an AsyncFile implementation for Cluster that read and wrote to S3 instead of local disk. Cluster itself is already decoupled from disk write latency. Sounds like a good UC talk...
Then you could do the same thing with a multi-node cluster, but Amazon doesn't let you control the network between EC2 nodes, so the latency there could kill you.
I have put together a few scripts for spinning up an EC2 node with MySQL (and Cluster) ready to go, but I'm sitting in an airport right now, so I'll have to post the code later.
I think the possibilities for scale-out here are fantastic, but like all application partitioning approaches, they do require some engineering of the application to take advantage of it.
0 Comments