MySQL Camp Google Notes

After speaking at Yahoo earlier, I drove a few miles down the street to Google for MySQL Camp. I caught the last session of the day, by Googlers saying how they used MySQL internal to Google. (I assume for the Adwords application.) Here are the stream-of-talking notes I took. The most fascinating bits I took out of it is how they take a partitioning/sharding strategy similar (but notably different in some ways) to WordPress.com and that they use DNS to manage all load balancing, high availability, datacenter failover, etc. DNS is a pretty powerful building block.

Application level partitioning of databases. Put the identical schema on multiple databases, the application knows how to find the data across different databases. Split by account ID.

Primary, commodity RAID subsystem, maybe a battery cache

Replicas are on identical hardware.

Slim replicas are on cheap whitebox hardware, cheapest they can get. Horizontially scaled in shards.

Really simple extensions to SQL implemented in Python. Looks like MySQL command line, but it executes on all shards. No two-tier replication, everyone talks to the master.

Try to avoid multi-megabyte inserts and such. Funny things with truncated binlogs. (Hundreds a day.) Also bandwidth limitations. Some data is on all shards, but typically data is unique to the shards.

High availability and failover – require application to respect DNS. Use DNS to steer the load of the database systems. Given application can tolerate hard failures. Traffic light – red, yellow, green. Red, take everything off. Yellow, existing stuff stays, no new stuff. Python script that runs on cron. Balancing betwene read slaves. All hosts have all data.

Replicas are within 5-10 seconds in regular operations. Discourage all reads from master. Largely IO bound, nothing in memory. Usually innodb sizes are above 15 times the memory.

DB – shard number. Physical name, logical name. Example: db accounting, shards 0-5.

Read/write ratio. Order of magnitude, 10x more reads than writes, easily more than that.

Failover for masters. Speed up the planned failover. DB3 points at an IP, want to switch to a new IP. They have a way to do that with about 15 seconds of downtime, they modify the DNS TTL. Use priviledges as a barrier to write access. Create a copy, it’s writing a binlog, it’s read-only. Make the change, let DNS propogate, kill everyone without REPL, turn off writes on the old master, when everything is clean switch to the new master. Usually < 10 seconds of downtime. Automated way to bring slave databases up. They all talk to the same master. Unplanned failover, list of replicas in DNS. Don't need a sync, just move the DNS and push permissions that allow people to write to the new master. Ongoing production operations: complete backup 24 hours of all primaries. They use ibbackup. Taking the backup, restore it automatically to another place as a snapshot. It's slow. It's one command to set up a new replica. We'd have to lose a large number of machines for it to touch our data. MySQL guy: They're releasing to MyISAM++ will do everything InnoDB does, but way better. Readers don't block writes, etc etc. Different from Falcon. Both will be out next year. Schema changes. Fairly formal schema change process, goes through a committee that looks at the schema changes. They get checked into special source control data. Schema changes cause problems. Planned downtime once a month. That's when they do something that will lock a big table for an hour. Commonly: add keys, add fields, creating tables, creating indicies. MySQL guy: adding index, add/del column live in future version. Query snipers, they're supposed to be killing queries that violate policy in one way or another. 3 names of CNAMES going from normal name, logical name, and physical name. Thousands and thousands of machines. Production monitoring, PROCESSLIST, INNODB STATUS, etc. Small piece of glue that extract that data and put it into company-wide monitoring systems. Access controls, lots of users, lots of tables, very percise permissions. Modify MySQL tables directly, do FLUSH PRIVLEDGES. Something about Fedora Core 3 statically links the glibc fragments memory per connection, causes bad out of memory errors. Fixed in MySQL.com binaries. Roles in MySQL? System growth and scaling. Adding new partitions is easy, set up new partitions and put new data in it. What's not easy is migrating existing data. Create new shards relatively rarely. Shards are about the size that fit on 2U commodity hardware, I would estimate a few hundred gigs. They shard different types of data differently and separately, and scale each independently. They add more replicas to the mix, frequently. Always moving in and out of datacenters, constant churn. Track problem queries. What takes the most resources, who are they, why are they using? Also track number of queries. Individual tables all grow at different rates. Historical data can be moved to different types of systems. Capacity planning and trending. How different machines perform vs each other. Everything grows at "google pace". On-call rotation of 6 guys. Customer Data dumper News reporter Turbocharger Accountant Explorer Packrat Leech Trying to decentralize support process, scale slaves etc, a single machine running cron jobs can get stuck. Bad clients that leave queries running. If a client disconnects, MySQL keeps the query running. Diferent guy, talking about InnoDB: Mark Callaghan. Was at Oracle for 8 years. InnoDB is a joy, some of the best database internals code he has seen. In 5.1, page-level compression is moving along. OLTP? He's IO bound, but he can easily trade-off performance for data size etc. Shrink DB by 2x. One feature he'd like, 5 has a lot of enhancements for managing integrity between X and binlog. A few issues, InnoDB was written for single disk and single CPU systems. Was written in mid-90s. Doesn't know about SMP problems, because they're IO bound. IO is not much work to fix either, InnoDB uses background threads to do IO because it doesn't trust Linux async IO. You get four threads. Each thread is doing sync IO. OLTP, you're bound flushing dirty pages to disk. You want one or more write threads per disk. Concurrent sequential scans, innodb can issue prefetch requests, the buffering that the OS cache doesn't help much. You want more than one pending IO at a time. How can you get more? It looks like all you have to do is modify one file, make some simple changes. Someone started, but never finished. Simple perf benfits, was able to saturate PCI bus, 120MB/s with more write threads, in default config could only get 60 MB/s. Random IO benefit isn't as much. Linux 2.6 has 4 IO schedulers, you can utilize the system a lot more. Noticed a specific scheduler that works the best? Uses software RAID on multi-disk (1-10 disks), deadline and participory are about the same. CFQ and O-op is much worse. RAID 0, 1 meg stripe size.

25 thoughts on “MySQL Camp Google Notes”

Matt Brotherson says:

November 10, 2006 at 7:49 pm

Awesome post. Seriously, good info! I think one day Google will become the web. I read the article in eWeek about their datacenters. Simply amazing.

Patrick Havens says:

November 10, 2006 at 9:50 pm

Wow… a little bouncing around, but great notes. Thanks for the write up… no mattered how scattered.

Pingback: Living in the Whine Country » Matt Mullberg of Wordpress goes to Google’s MySQL Camp
Jeremy Zawodny says:

November 11, 2006 at 12:17 am

It’s good to see more of this getting out in public. Thanks, Matt (and Google folks).

ludo says:

November 11, 2006 at 4:13 am

Great notes Matt, why don’t you follow them up with a series of posts on how you manage these things at WP.com? 🙂

Raj says:

November 11, 2006 at 4:49 am

Thank you for the notes 🙂 Yea, lot of stuff bouncing stuff… but really great.

Andy Smith says:

November 11, 2006 at 9:29 am

man, it sucks that this is the kind of information i need to be interested in again, awesome notes though i am sure i will be referring to them constantly.

Garam Chai says:

November 11, 2006 at 5:01 pm

Cool stuff, they’re doing. Thanks for taking notes and publishing.

Pingback: STEEVdotnet » diigo-links 11/11/2006
Ozh says:

November 12, 2006 at 2:20 am

(hey ? nice new theme !)

Matt says:

November 12, 2006 at 4:30 am

Shhhh, it’s not quite public yet. 🙂

Pingback: Around the web | alexking.org
Dante says:

November 12, 2006 at 1:12 pm

(Pssst, Matt! Nice redesign! I was wondering when you were gonna get around to a new one…)

Carthik says:

November 12, 2006 at 2:25 pm

Great Design Matt! Your blog looks beautiful. I was just thinking yesterday about how your blog hasn’t had a redesign in a few years and here I am, looking at an awesome new Photomatt!

danithew says:

November 12, 2006 at 4:30 pm

Matt, it looks great! Totally surprised me.

D. Saito says:

November 12, 2006 at 8:42 pm

Great reporting, one of the session that I wanted to see and hear at MySQL camp. Unfortunately I couldn’t go, but this article pretty much sums it all up. Thanks Matt!

James Oppenheim says:

November 13, 2006 at 1:30 am

Hey Matt, thanks for the notes. I am sure I will use them in the future. Nice work!

Pingback: æœ¬æ—¥æ›¸ç±¤ « penk - Keep on rockin’ in the free world
Pingback: MySQL inside Google - alexmoskalyukblog
D. Schmidt says:

November 14, 2006 at 8:54 am

Great notes, Matt. Thanks for the insight. BTW the blog looks great.

jeff says:

November 19, 2006 at 11:10 am

nice, very nice

Ruturaj K. Vartak says:

December 3, 2006 at 10:56 pm

I’m waiting for MyISAM++
It’ll solve my problem of Write/read ratios of 10:1

Thanks matt !

Ankur Agrawal says:

January 12, 2007 at 4:46 pm

Its a great reporting Matt!! Thanks for guiding me for MySQL importance. I will continuously refer to notes man..Great work!!

D. Rivers says:

March 11, 2007 at 3:34 am

Terrific synopsis. Interesting read. I’m considering using WordPress myself.

Pingback: Jengates Blog » links for 2007-05-07

Matt Mullenweg

Unlucky in Cards

MySQL Camp Google Notes

Related

25 thoughts on “MySQL Camp Google Notes”

SHARE YOUR THOUGHTSCancel reply

Share this:

Related

25 thoughts on “MySQL Camp Google Notes”

SHARE YOUR THOUGHTSCancel reply