Website Reviews

August 27, 2008

Amazon review & architecture

Filed under: review, webstore — admin @ 10:36 am

Amazon grew from a tiny online bookstore to one of the largest stores on earth. They did it while pioneering new and interesting ways to rate, review, and recommend products. Greg Linden shared is version of Amazon’s birth pangs in a series of blog articles

Site: http://amazon.com

Information Sources

  • Early Amazon by Greg Linden
  • How Linux saved Amazon millions
  • Interview Werner Vogels - Amazon’s CTO
  • Asynchronous Architectures - a nice summary of Werner Vogels’ talk by Chris Loosley
  • Learning from the Amazon technology platform - A Conversation with Werner Vogels
  • Werner Vogels’ Weblog - building scalable and robust distributed systems

    Platform

  • Linux
  • Oracle
  • C++
  • Perl
  • Mason
  • Java
  • Jboss
  • Servlets

    The Stats

  • More than 55 million active customer accounts.
  • More than 1 million active retail partners worldwide.
  • Between 100-150 services are accessed to build a page.

    The Architecture

  • What is it that we really mean by scalability? A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added. Increasing performance in general means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.
  • The big architectural change that Amazon made was to move from a two-tier monolith to a fully-distributed, decentralized, services platform serving many different applications.
  • Started as one application talking to a back end. Written in C++.
  • It grew. For years the scaling efforts at Amazon focused on making the back-end databases scale to hold more items, more customers, more orders, and to support multiple international sites. In 2001 it became clear that the front-end application couldn’t scale anymore. The databases were split into small parts and around each part and created a services interface that was the only way to access the data.
  • The databases became a shared resource that made it hard to scale-out the overall business. The front-end and back-end processes were restricted in their evolution because they were shared by many different teams and processes.
  • Their architecture is loosely coupled and built around services. A service-oriented architecture gave them the isolation that would allow building many software components rapidly and independently.
  • Grew into hundreds of services and a number of application servers that aggregate the information from the services. The application that renders the Amazon.com Web pages is one such application server. So are the applications that serve the Web-services interface, the customer service application, and the seller interface.
  • Many third party technologies are hard to scale to Amazon size. Especially communication infrastructure technologies. They work well up to a certain scale and then fail. So they are forced to build their own.
  • Not stuck with one particular approach. Some places they use jboss/java, but they use only servlets, not the rest of the J2EE stack.
  • C++ is uses to process requests. Perl/Mason is used to build content.
  • Amazon doesn’t like middleware because it tends to be framework and not a tool. If you use a middleware package you get lock-in around the software patterns they have chosen. You’ll only be able to use their software. So if you want to use different packages you won’t be able to. You’re stuck. One event loop for messaging, data persistence,
    AJAX, etc. Too complex. If middleware was available in smaller components, more as a tool than a framework, they would be more interested.
  • The SOAP web stack seems to want to solve all the same distributed systems problems all over again.
  • Offer both SOAP and REST web services. 30% use SOAP. These tend to be Java and .NET users and use WSDL files to generate remote object interfaces. 70% use REST. These tend to be PHP or PERL users.
  • In either SOAP or REST developers can get an object interface to Amazon. Developers just want to get job done. They don’t care what goes over the wire.
  • Amazon wanted to build an open community around their services. Web services were chosed because it’s simple. But hat’s only on the perimeter. Internally it’s a service oriented architecture. You can only access the data via the interface. It’s described in WSDL, but they use their own encapsulation and transport mechanisms.
  • Teams are Small and are Organized Around Services
    - Services are the independent units delivering functionality within Amazon. It’s also how Amazon is organized internally in terms of teams.
    - If you have a new business idea or problem you want to solve you form a team. Limit the team to 8-10 people because communication hard. They are called two pizza teams. The number of people you can feed off two pizzas.
    - Teams are small. They are assigned authority and empowered to solve a problem as a service in anyway they see fit.
    - As an example, they created a team to find phrases within a book that are unique to the text. This team built a separate service interface for that feature and they had authority to do what they needed.
    - Extensive A/B testing is used to integrate a new service . They see what the impact is and take extensive measurements.
  • Deployment
    - They create special infrastructure for managing dependencies and doing a deployment.
    - Goal is to have all right services to be deployed on a box. All application code, monitoring, licensing, etc should be on a box.
    - Everyone has a home grown system to solve these problems.
    - Output of deployment process is a virtual machine. You can use EC2 to run them.
  • Work From the Customer Backwards to Verify a New Service is Worth Doing
    - Work from the customer backward. Focus on value you want to deliver
    for the customer.
    - Force developers to focus on value delivered to the customer instead of building technology first and then figuring how to use it.
    - Start with a press release of what features the user will see and work backwards to check that you are building something valuable.
    - End up with a design that is as minimal as possible. Simplicity is the key if you really want to build large distributed systems.
  • State Management is the Core Problem for Large Scale Systems
    - Internally they can deliver infinite storage.
    - Not all that many operations are stateful. Checkout steps are stateful.
    - Most recent clicked web page service has recommendations based on session IDs.
    - They keep track of everything anyway so it’s not a matter of keeping state. There’s little separate state that needs to be kept for a session. The services will already be keeping the information so you just use the services.
  • Eric Brewer’s CAP Theorem or the Three properties of Systems
    - Three properties of a system: consistency, availability, tolerance to network partitions.
    - You can have at most two of these three properties for any shared-data system.
    - Partitionability: divide nodes into small groups that can see other groups, but they can’t see everyone.
    - Consistency: write a value and then you read the value you get the same value back. In a partitioned system there are windows where that’s not true.
    - Availability: may not always be able to write or read. The system will say you can’t write because it wants to keep the system consistent.
    - To scale you have to partition, so you are left with choosing either high consistency or high availability for a particular system. You must find the right overlap of availability and consistency.
    - Choose a specific approach based on the needs of the service.
    - For the checkout process you always want to honor requests to add items to a shopping cart because it’s revenue producing. In this case you choose high availability. Errors are hidden from the customer and sorted out later.
    - When a customer submits an order you favor consistency because several services–credit card processing, shipping and handling, reporting–are simultaneously accessing the data.

    Lessons Learned

  • You must change your mentality to build really scalable systems. Approach chaos in a probabilistic sense that things will work well. In traditional systems we present a perfect world where nothing goes down and then we build complex algorithms (agreement technologies) on this perfect world. Instead, take it for granted stuff fails, that’s
    reality, embrace it. For example, go more with a fast reboot and fast recover approach. With a decent spread of data and services you might get close to 100%. Create self-healing, self-organizing lights out operations.
  • Create a shared nothing infrastructure. Infrastructure can become a shared resource for development and deployment with the same downsides as shared resources in your logic and data tiers. It can cause locking and blocking and dead lock. A service oriented architecture allows the creation of a parallel and isolated development process that scales feature development to match your growth.
  • Open up you system with APIs and you’ll create an ecosystem around your application.
  • Only way to manage as large distributed system is to keep things as simple as possible. Keep things simple by making sure there are no hidden requirements and hidden dependencies in the design. Cut technology to the minimum you need to solve the problem you have. It doesn’t help the company to create artificial and unneeded layers of complexity.
  • Organizing around services gives agility. You can do things in parallel is because the output is a service. This allows fast time to market. Create an infrastructure that allows services to be built very fast.
  • There’s bound to be problems with anything that produces hype before real implementation
  • Use SLAs internally to manage services.
  • Anyone can very quickly add web services to their product. Just implement one part of your product as a service and start using it.
  • Build your own infrastructure for performance, reliability, and cost control reasons. By building it yourself you never have to say you went down because it was company X’s fault. Your software may not be more reliable than others, but you can fix, debug, and deployment much quicker than when working with a 3rd party.
  • Use measurement and objective debate to separate the good from the bad. I’ve been to several presentations by ex-Amazoners and this is the aspect of Amazon that strikes me as uniquely different and interesting from other companies. Their deep seated ethic is to expose real customers to a choice and see which one works best and to make decisions based on those tests.Avinash
    Kaushik calls this getting rid of the influence of the HiPPO’s, the highest paid people in the room. This is done with techniques like A/B testing and Web Analytics. If you have a question about what you should do code it up, let people use it, and see which alternative gives you the results you want.
  • Create a frugal culture. Amazon used doors for desks, for example.
  • Know what you need. Amazon has a bad experience with an early recommender system that didn’t work out: “This wasn’t what Amazon needed. Book recommendations at Amazon needed to work from sparse data, just a few ratings or purchases. It needed to be fast. The system needed to scale to massive numbers of customers and a huge catalog. And it needed to enhance discovery, surfacing books from deep in the catalog that readers wouldn’t find on their own.”
  • People’s side projects, the one’s they follow because they are interested, are often ones where you get the most value and innovation. Never underestimate the power of wandering where you are most interested.
  • Involve everyone in making dog food. Go out into the warehouse and pack books during the Christmas rush. That’s teamwork.
  • Create a staging site where you can run thorough tests before releasing into the wild.
  • A robust, clustered, replicated, distributed file system is perfect for read-only data used by the web servers.
  • Have a way to rollback if an update doesn’t work. Write the tools if necessary.
  • Switch to a deep services-based architecture (http://webservices.sys-con.com/read/262024.htm).
  • Look for three things in interviews: enthusiasm, creativity, competence. The single biggest predictor of success at Amazon.com was enthusiasm.
  • Hire a Bob. Someone who knows their stuff, has incredible debugging skills and system knowledge, and most importantly, has the stones to tackle the worst high pressure problems imaginable by just leaping in.
  • Innovation can only come from the bottom. Those closest to the problem are in the best position to solve it. any organization that depends on innovation must embrace chaos. Loyalty and obedience are not your tools.
  • Creativity must flow from everywhere.
  • Everyone must be able to experiment, learn, and iterate. Position, obedience, and tradition should hold no power. For innovation to flourish, measurement must rule.
  • Embrace innovation. In front of the whole company, Jeff Bezos would give an old Nike shoe as “Just do it” award to those who innovated.
  • Don’t pay for performance. Give good perks and high pay, but keep it flat. Recognize exceptional work in other ways. Merit pay sounds good but is almost impossible to do fairly in large organizations. Use non-monetary awards, like an old shoe. It’s a way of saying thank you, somebody cared.
  • Get big fast. The big guys like Barnes and Nobel are on your tail. Amazon wasn’t even the first, second, or even third book store on the web, but their vision and drive won out in the end.
  • In the data center, only 30 percent of the staff time spent on infrastructure issues related to value creation, with the remaining 70 percent devoted to dealing with the “heavy lifting” of hardware procurement, software management, load balancing, maintenance, scalability challenges and so on.
  • Prohibit direct database access by clients. This means you can make you service scale and be more reliable without involving your clients. This is much like Google’s ability to independently distribute improvements in their stack to the benefit of all applications.
  • Create a single unified service-access mechanism. This allows for the easy aggregation of services, decentralized request routing, distributed request tracking, and other advanced infrastructure techniques.
  • Making Amazon.com available through a Web services interface to any developer in the world free of charge has also been a major success because it has driven so much innovation that they couldn’t have thought of or built on their own.
  • Developers themselves know best which tools make them most productive and which tools are right for the job.
  • Don’t impose too many constraints on engineers. Provide incentives for some things, such as integration with the monitoring system and other infrastructure tools. But for the rest, allow teams to function as independently as possible.
  • Developers are like artists; they produce their best work if they have the freedom to do so, but they need good tools. Have many support tools that are of a self-help nature. Support an environment around the service development that never gets in the way of the development itself.
  • You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.
  • Developers should spend some time with customer service every two years. Their they’ll actually listen to customer service calls, answer customer service e-mails, and really understand the impact of the kinds of things they do as technologists.
  • Use a “voice of the customer,” which is a realistic story from a customer about some specific part of your site’s experience. This helps managers and engineers connect with the fact that we build these technologies for real people. Customer service statistics are an early indicator if you are doing something wrong, or what the real pain points are for your customers.
  • Infrastructure for Amazon, like for Google, is a huge competitive advantage. They can build very complex applications out of primitive services that are by themselves relatively simple. They can scale their operation independently, maintain unparalleled system availability, and introduce new services quickly without the need for massive reconfiguration.
  • Youtube Review & architecture

    Filed under: review, videos — admin @ 10:30 am

    YouTube adds a new rich set of APIs in order to become your video platform leader–all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you’ll need to expose them later anyway.

    YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google?

    Information Sources

  • Google Video

    Platform

  • Apache
  • Python
  • Linux (SuSe)
  • MySQL
  • psyco, a dynamic python->C compiler
  • lighttpd for video instead of Apache

    What’s Inside?

    The Stats

  • Supports the delivery of over 100 million videos per day.
  • Founded 2/2005
  • 3/2006 30 million video views/day
  • 7/2006 100 million video views/day
  • 2 sysadmins, 2 scalability software architects
  • 2 feature developers, 2 network engineers, 1 DBA

    Recipe for handling rapid growth


    while (true)
    {
    identify_and_fix_bottlenecks();
    drink();
    sleep();
    notice_new_bottleneck();
    }
    This loop runs many times a day.

    Web Servers

  • NetScalar is used for load balancing and caching static content.
  • Run Apache with mod_fast_cgi.
  • Requests are routed for handling by a Python application server.
  • Application server talks to various databases and other informations sources to get all the data and formats the html page.
  • Can usually scale web tier by adding more machines.
  • The Python web code is usually NOT the bottleneck, it spends most of its time blocked on RPCs.
  • Python allows rapid flexible development and deployment. This is critical given the competition they face.
  • Usually less than 100 ms page service times.
  • Use psyco, a dynamic python->C compiler that uses a JIT compiler approach to optimize inner loops.
  • For high CPU intensive activities like encryption, they use C extensions.
  • Some pre-generated cached HTML for expensive to render blocks.
  • Row level caching in the database.
  • Fully formed Python objects are cached.
  • Some data are calculated and sent to each application so the values are cached in local memory. This is an underused strategy. The fastest cache is in your application server and it doesn’t take much time to send precalculated data to all your servers. Just have an agent that watches for changes, precalculates, and sends.

    Video Serving

  • Costs include bandwidth, hardware, and power consumption.
  • Each video hosted by a mini-cluster. Each video is served by more than one machine.
  • Using a a cluster means:
    - More disks serving content which means more speed.
    - Headroom. If a machine goes down others can take over.
    - There are online backups.
  • Servers use the lighttpd web server for video:
    - Apache had too much overhead.
    - Uses epoll to wait on multiple fds.
    - Switched from single process to multiple process configuration to handle more connections.
  • Most popular content is moved to a CDN (content delivery network):
    - CDNs replicate content in multiple places. There’s a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.
    - CDN machines mostly serve out of memory because the content is so popular there’s little thrashing of content into and out of memory.
  • Less popular content (1-20 views per day) uses YouTube servers in various colo sites.
    - There’s a long tail effect. A video may have a few plays, but lots of videos are being played. Random disks blocks are being accessed.
    - Caching doesn’t do a lot of good in this scenario, so spending money on more cache may not make sense. This is a very interesting point. If you have a long tail product caching won’t always be your performance savior.
    - Tune RAID controller and pay attention to other lower level issues to help.
    - Tune memory on each machine so there’s not too much and not too little.

    Serving Video Key Points

  • Keep it simple and cheap.
  • Keep a simple network path. Not too many devices between content and users. Routers, switches, and other appliances may not be able to keep up with so much load.
  • Use commodity hardware. More expensive hardware gets the more expensive everything else gets too (support contracts). You are also less likely find help on the net.
  • Use simple common tools. They use most tools build into Linux and layer on top of those.
  • Handle random seeks well (SATA, tweaks).

    Serving Thumbnails

  • Surprisingly difficult to do efficiently.
  • There are a like 4 thumbnails for each video so there are a lot more thumbnails than videos.
  • Thumbnails are hosted on just a few machines.
  • Saw problems associated with serving a lot of small objects:
    - Lots of disk seeks and problems with inode caches and page caches at OS level.
    - Ran into per directory file limit. Ext3 in particular. Moved to a more hierarchical structure. Recent improvements in the 2.6 kernel may improve Ext3 large directory handling up to 100 times, yet storing lots of files in a file system is still not a good idea.
    - A high number of requests/sec as web pages can display 60 thumbnails on page.
    - Under such high loads Apache performed badly.
    - Used squid (reverse proxy) in front of Apache. This worked for a while, but as load increased performance eventually decreased. Went from 300 requests/second to 20.
    - Tried using lighttpd but with a single threaded it stalled. Run into problems with multiprocesses mode because they would each keep a separate cache.
    - With so many images setting up a new machine took over 24 hours.
    - Rebooting machine took 6-10 hours for cache to warm up to not go to disk.
  • To solve all their problems they started using Google’s BigTable, a distributed data store:
    - Avoids small file problem because it clumps files together.
    - Fast, fault tolerant. Assumes its working on a unreliable network.
    - Lower latency because it uses a distributed multilevel cache. This cache works across different collocation sites.
    - For more information on BigTable take a look at Google Architecture, GoogleTalk Architecture, and BigTable.

    Databases

  • The Early Years
    - Use MySQL to store meta data like users, tags, and descriptions.
    - Served data off a monolithic RAID 10 Volume with 10 disks.
    - Living off credit cards so they leased hardware. When they needed more hardware to handle load it took a few days to order and get delivered.
    - They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.
    - Suffered from replica lag. The master is multi-threaded and runs on a large machine so it can handle a lot of work. Slaves are single threaded and usually run on lesser machines and replication is asynchronous, so the slaves can lag significantly behind the master.
    - Updates cause cache misses which goes to disk where slow I/O causes slow replication.
    - Using a replicating architecture you need to spend a lot of money for incremental bits of write performance.
    - One of their solutions was prioritize traffic by splitting the data into two clusters: a video watch pool and a general cluster. The idea is that people want to watch video so that function should get the most resources. The social networking features of YouTube are less important so they can be routed to a less capable cluster.
  • The later years:
    - Went to database partitioning.
    - Split into shards with users assigned to different shards.
    - Spreads writes and reads.
    - Much better cache locality which means less IO.
    - Resulted in a 30% hardware reduction.
    - Reduced replica lag to 0.
    - Can now scale database almost arbitrarily.

    Data Center Strategy

  • Used manage hosting providers at first. Living off credit cards so it was the only way.
  • Managed hosting can’t scale with you. You can’t control hardware or make favorable networking agreements.
  • So they went to a colocation arrangement. Now they can customize everything and negotiate their own contracts.
  • Use 5 or 6 data centers plus the CDN.
  • Videos come out of any data center. Not closest match or anything. If a video is popular enough it will move into the CDN.
  • Video bandwidth dependent, not really latency dependent. Can come from any colo.
  • For images latency matters, especially when you have 60 images on a page.
  • Images are replicated to different data centers using BigTable. Code
    looks at different metrics to know who is closest.

    Lessons Learned

  • Stall for time. Creative and risky tricks can help you cope in the short term while you work out longer term solutions.
  • Prioritize. Know what’s essential to your service and prioritize your resources and efforts around those priorities.
  • Pick your battles. Don’t be afraid to outsource some essential services. YouTube uses a CDN to distribute their most popular content. Creating their own network would have taken too long and cost too much. You may have similar opportunities in your system. Take a look at Software as a Service for more ideas.
  • Keep it simple! Simplicity allows you to rearchitect more quickly so you can respond to problems. It’s true that nobody really knows what simplicity is, but if you aren’t afraid to make changes then that’s a good sign simplicity is happening.
  • Shard. Sharding helps to isolate and constrain storage, CPU, memory, and IO. It’s not just about getting more writes performance.
  • Constant iteration on bottlenecks:
    - Software: DB, caching
    - OS: disk I/O
    - Hardware: memory, RAID
  • You succeed as a team. Have a good cross discipline team that understands the whole system and what’s underneath the system. People who can set up printers, machines, install networks, and so on. With a good team all things are possible.
  • August 17, 2008

    Hello world!

    Filed under: Uncategorized — admin @ 7:23 pm

    Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!

    Powered by WordPress