Webpulp.tv
By Josh Owens
To listen to an audio podcast, mouse over the title and click Play. Open iTunes to download and subscribe to podcasts.
Podcast Description
A podcast about the latest and greatest technology used to serve web apps. From memcache to varnish, ruby to java, and many other technologies. We explore "the guts of the web".
| Name | Description | Released | Price | ||
|---|---|---|---|---|---|
| 1 | VideoGlean gems of search knowledge about lucene and solr | Join us for a deep dive into Websolr with Nick Zadronzy. He covers all kinds of gems about Lucene and sunspot, brush up on your search knowledge! * Websolr is a hosted service of, and provides tools for users of [Apache Solr](http://lucene.apache.org/solr/) search engine. Apache Solr is a HTTP wrapper around the [Lucene Search Indexing Library](http://lucene.apache.org/), the industry standard for top-quality search. * Websolr, which is currently hosted on [Amazon EC2](http://aws.amazon.com/ec2/), largely EC2 East, runs at around 30 instances. * Websolr makes heavy use of Solr's replication functionality for both their live failover and their data migration when they're provisioning or de-provisioning servers, balancing clusters, etc. * Most of this happens in the middle-routing layer of Websolr's systems - a custom routing & proxying layer. * Many of Websolr's customers are also on [Heroku](http://www.heroku.com/), which Websolr actually started off as an add-on of. * A Websolr stack starts off at a URL beginning with index.websolr.com, which points to an [Amazon ELB](http://aws.amazon.com/elasticloadbalancing/) the first level of the stack. * There are two proxies on this level that requests will be sent through: ELB and [HA Proxy](http://haproxy.1wt.eu/). Having two proxies might sound odd, but it gives Websolr freedom to control which instances they talk to. * The next level has a proxy Websolr wrote called Proxy Machine, which makes it easier to match a request based on regular expressions against the entire body of that request. * Nick and [Kyle](https://twitter.com/#!/kmx) both have random side projects including adding solr to ruby gems, and consulting. * Nick helps maintain [Sunspot](http://sunspot.github.com/). * Solr helps because of it's flexible ability to match terms to others that may not necessarily have the same exact characters. (i.e. - 'ran' and 'running') * Lucene's job is that of any search engine - to build an index of words. * [Faceting](http://wiki.apache.org/solr/SolrFacetingOverview) is a kind of browsing pattern. An example of faceting would be refining a search on a site such as Amazon or Ebay. * Faceting is often a feature most clients asking for full-text search really want. It's less obvious of a search feature but search engines are perfect tools to build a faceting feature. * Location search in solr comes down to filtering points in a given radius, or sorting results based on their distances from a certain point. Or, perhaps, using that as well as relevancy of the result to a search term, in order to filter results. * Solr 4 will have [Geohashing](http://wiki.xkcd.com/geohashing/) implemented because it can be mixed in with full-text searches to help sort by distances. * Stemming takes all the different variations of a word and turns them all into a standard root version of the word. * Dictionaries come into play with this through Synonym Expansion. Synonym Expansion is a great way to use search on multiple languages. Download Episode | 1/3/12 | Free | View In iTunes |
| 2 | VideoLearn how tropo built a huge voice cloud while using DNS for a key/value store | Join me as I talk with Jose De Castro from Tropo, a powerful yet simple API that adds Voice, SMS, Twitter, and IM support to the programming languages users already know. Tropo makes it easy for developers to build communications applications. Tropo started out with basically voice phone calls and SMS and over the last couple years, they've added almost every instant messaging network, as well as Twitter. You can think of it as a real-time communications mashup platform. You can receive phone calls, play something your callers may put in a conference, or have a call that triggers a tweet, or a tweet that triggers an SMS They support Ruby, Groovy, Javascript, Python, and more Voxeo is the parent company behind Tropo. Tropo has been doing telephony and call center-related things for 10 years. Tropo has massive telephony infrastructure inside their seven data centers. Tropo's provisioning API and the website right now and Java, so they're using Jersey framework. Tropo also uses RabbitMQ and ActiveMQ for all of their internal queuing infrastructure. Tropo's core media server is written in C++ with some Java sprinkled in. The core media server is what does all of the audio processing. Tropo started off as a VoIP company, and they've spent the last 10 years really perfecting high-quality audio, and how to make that experience as good as possible. Tropo can do cool things like speech recognition, so users can actually talk to their apps. Tropo can synthesize speech in 24 languages. Tropo can do audio conferencing, answering machine detecting, and fax detection. End users only have to learn a very simple API. Tropo owns and operates their own data centers. They deal with several VoIP carriers, both nationally and internationally. Tropo uses low-latency data pipes, and from there calls go into this network role called the session border controller. They can route calls dynamically to different customer clusters. They can spin up dedicated servers for that customer on demand Every role in the network is completely virtualized, including the proxy servers, the billing system. Tropo uses DNS for their key value stores. Tropo launched almost 3 years ago. The company had this idea of what a next-generation real-time telephony and telecommunications tack would look like. Logging is probably Tropo's biggest challenge. They've partnered with a company called . In a Splunk setup, there are three roles: a light forwarder that grabs a file and ship it off to index servers; a huge farm of index servers; and UI servers. Tropo sponsors the Adhearsion project, a Ruby framework for building voice apps. Ryo is a real-time message-oriented server. Check out labs.voxeo.com and check out the blog. They have cool APIs coming out and sponsoring cool APIs. Stay in touch using their blog at blog.tropo.com or via Twitter at @Tropo. Download Episode | 12/8/11 | Free | View In iTunes |
| 3 | VideoScaling with JRuby and Thrift at Outright | Join me as I talk with Ben Curren from Outright, whose company has developed a hassle-free way to keep track of your small business finances. Learn how they use Thift and Ruby to run their app. Outright helps small businesses organize their business finances, so they don't have to do them themselves. They simplify bookkeeping, so you can focus on your business, not your books. When Ben first started Outright three years ago, he used JQuery and rolled a bunch of custom Javascript for templating. Now they're looking at moving to Backbone. While Outright is geared towards sole proprietorships, it can work for LLCs, but lacks multi-user access, which can be a problem if you have a partnership. Outright is hosted completely on Amazon Webservices. Their servers are on EC2. Outright runs on Chef. Their app is really split into core pieces -- JavaScript and JQuery on the front end, with a backend process that goes out and aggregates data. They have a service that will go to eBay and collect your sales information, and will also connect to your bank account. When you collect all that data, you have to do a lot of analysis and processing on it and deduplicating across sources and reconciling. Outright will spit out really clean-looking transactions. On the backend, their aggregation service can aggregate other aggregators. Outright uses Yodely, which supplies most of their banks and credit card connections. They connect to additional sources like FreshBooks, eBay, Etsy, Google Chat -- anywhere where income would come in. The development can stay fairly organized and move fast, and the app can scale. Outright launched a partnership with Etsy about two weeks ago.In general, they've tried to focus on building a great company and answering customer questions. Outright uses MySQL, with a bit of Memcache. Outright is trying to build iPhone and Android apps. From Ruby, they can directly call methods on the JRuby stack. Outright created a Thrift collection that extends the base collection. Ben believes in strong-typing in public interfaces that are heavily used across multiple codebases; it can be helpful to have well defined structures. Outright uses sprockets to take all the JavaScript and combine it in the right require order. Outright has recently moved over to Mustache, because it plugs directly into Backbone. Security is extremely important for Outright; they encrypt the API tokens on the backend, and create policies around data access. Outright uses Jenkins; they have a build process that takes JavaScript and compresses it, CSS and compresses it, does Sha blends, so they can long cache on the browser; then it goes through unit testing, then functional, then integration, then regression, then gets deployed through QA. From QA, it will be merged into production, then sit there until someone is ready to deploy it. Contact Ben on Twitter at Ben Curren Download Episode | 11/15/11 | Free | View In iTunes |
| 4 | VideoInstant in-app CRM with rails and javascript using Intercom | Watch as we sit down and talk to Des Traynor and Ciaran Lee about Intercom, Exceptional, and awesome things like Backbone.js. Intercom is a customer relationship monitor for people who create web apps or websites. It lets you see which customers use your products, and information about them. Web app owners can get to know their users and can communicate with them. They use Rails 3 and MySQL, but are considering switching from MySQL to MongoDB for better sorting of data. Intercom runs on Heroku. To use Intercom, users just need to include a small Javascript script in their templates. They use JQuery for the Javascript library. They believe in proactive support and getting to know your users before someone else does. One of the dangers with community-based forums is the development of a cult mentality. One of Intercom’s challenges is storing data. They are considering more options for customizing widgets. They get 250 million API requests per month. If listeners want an invite code, tweet @webpulptv. Download Episode | 10/7/11 | Free | View In iTunes |
| 5 | VideoManaging 1.4 Petabytes of video data | Watch Jared Klett talk about blip.tv, dealing with 200 servers, and managing 1.4 Petabytes of video! Blip.tv helps content creators distribute and monetize web shows They have around 50,000 active show creators Blip.tv has their own storage solution, AWS wasn’t around when they started They use mod_perl to run the website They use Java to help with uploading/encoding on blip.tv They use squid and database application MySQL Blip.tv CTO is working on an HTML5 embeddable player that will work with the iPad Blip.tv is using MogileFS to handle file storage and Jared loves it. Blip.tv has 1.4PB of data scattered across multiple datacenters, thanks to MogileFS. Jared mentioned that they use ghost for linux They use webistrano tool based on capistrano Blip.tv uses cvs for their source code management Download Episode | 8/18/11 | Free | View In iTunes |
| 6 | VideoLearn how thoughtbot built Airbrake to handle millions of errors a day using Rails, Mysql, and MongoDB | Join us as we talk with with Harold Giménez to talk about thoughtbot’s Airbrake App. We also touch on using Rails and Backbone, migrating off EngineYard, and how they handle a large API load. Thoughtbot does mostly web app consulting. They have a few products of their own, and also hold workshops — such as starting with Rails, all the way up to scaling. They have three focuses: workshops, products, and consulting. Airbrake was originally HopToad. Airbrake is an application that collects exceptions and errors from your apps. Whenever an exception occurs, it gets sent to this service. You can check the backtrace for the error, how many times the error has occurred, etc. Support languages include Rails and iOS, but there is an open API as well. A lot of third-party developers have built libraries for any language — Erlang, Scala, PHP, Python, Java, and more They have experimental support for Javascript as well. Inside Airbrake, they use Rails. They use MySQL for storing stuff, and Mongo. They heavily use Redis as well. Thoughtbot looks for ways to evolve with their infrastructure. They use nginx, Passenger, and REE 1.8.7. Their web frontend is handle by two servers, but the API endpoints are handled by seven servers They actually started out at EngineYard, but then moved their infrastructure to a new datacenter. Their goal is to have zero downtime. Bluebox.net was helpful during the move. They made the DNS switch, then put up a configuration on the old environment to keep from losing data. When an error comes in, they store a small amount of data in MySQL, and the rest of it in Mongo. By storing it in Mongo, data is immediately available to users. In Mongo, data goes into collections. A capped collection has a fixed limit of how much it will store. They use Nagios, BluePill, and the New Relic RPM. Follow Thoughtbot on Twitter @thoughtbot. Download Episode | 8/16/11 | Free | View In iTunes |
| 7 | VideoManaging 8 petabytes and virtual slices for 40k customers at Linode | Join me as I interview Jed Smith from Linode. Learn how they manage 8 petabytes and 40,000 customers! Linode is a virtual private server company; they provide infrastructure as a service. They were kind of the first people to do cloud services. Their customers use Linode in a variety of ways. ColdFusion is their founder’s favorite language; Linode was built as a precursor to ColdFusion hosting. Linode has documentation for how to set up Wordpress or almost anything you can think of; they recently reached 400 guides. Linode has 5 data centers. If someone wanted to build something that fails over and is highly scalable/available, they could do it with Linode. Within a facility, all the bandwidth is pooled. They use very basic tools to monitor their servers, including the Nagios monitoring system and PagerDuty. Jed mostly uses Python. Everyone uses iTerm. They’ve handrolled a lot of code from the beginning. They have a team that follows all the distros; they simply make a few modifications to work under Xen. They’ve long prided themselves on their community. They have over 40,000 customers and 8,000 terabytes of raw storage. All of Linode’s equipment is colocated. In the last year and a half, they introduced an automated backup solution for servers; it includes many options for restoring your data, including restoring your snapshots to any other Linode. The backup system is based on a custom stack; it’s written in PERL. NodeBalancer is basically load balancing as a service; you can use it to upgrade your server. The Onion is completely hosted on Linode. People use Linode for VoIP. A service Jed’s been using is Twilio. They do their best to plan for outages, but they’re reliant upon the power grid. They use their mistakes to get better. They just celebrated their 8th birthday. The boss is working on a super-secret project that will change how people think about Linodes. Check out linode.com or linode.com/community. Contact them at IRC as well. You can also contact them via email: service AT linode.com or sales AT linode.com. Also check them out on Twitter: @linode. Download Episode | 6/30/11 | Free | View In iTunes |
| 8 | VideoBuild native apps from javascript web apps using SproutCore and Strobe | Join me as I interview Yehuda Katz about Sproutcore. He also dives into what Strobe, Inc does and how the company is building a general-purpose platform for people who want to deploy applications for the web as well as for native platforms. PhoneGap lets you take a web application and put it in a simple iOS skin. You can build a web app with HTML, CSS, and JavaScript as usual, instead of XCode PhoneGap then builds a program for the App Store that has access to native functionality; now you can access an address book or photo gallery, for example This eliminates the core problem of " i really want to tweet a link but also access the photo gallery or monetize People should be able to build a single app and have it work on native platforms. Before there was Rails, people knew there was MVC, but now there was a package so people could focus on building an application. What they’re trying to do is take the state-of-the-art in all these areas and put them into a package. All you’re doing at the end of the day is saying “Deploy,” so maybe more people will do it. You can think about the problem in a way that lets you be more productive. The easiest way to build apps on the Strobe platform is to use SproutCore as a framework It’s easy to have tools to make it easy to put an app behind a CDN, for example. You use SproutCore to develop your web application, then go to the command line and type “Strobe deploy.” Strobe platform also let’s you choose to deploy to other native platforms, like Android and iOS. There is also a “preview” mode that emulates all the functionality. They’re still working out pricing; they’re thinking of basing it on reliability. How much are you willing to pay to get your app into more places? Some people may just want it to work; others want more features. SproutCore was created by Charles Jolley for the Mailroom app, which was used by the Obama campaign; Apple hired him to write apps for their technology. Cocoa seems to have gotten this right; let’s use Cocoa’s event system or invent our own. That worked well for people who really wanted something native-like for the web. It was assumed people would used the higher-level functionality. There was clearly a middle ground between raw JQuery and the web-style of interaction. Strobe asked “What’s the smallest part of SproutCore from an API perspective that will help people?” They recently released SproutCore 2.0; Now people can now more easily play with it. SproutCore epitomizes the idea of declaratively tracking data flow as opposed to callbacks; it’s a 29K app. In SproutCore 2.0, a template is a bunch of HTML that gets rendered by SproutCore and inserted into the DOM. Not only are you saying how the DOM should look with a template, you’re defining how things should be updated later. Let’s say you have a to-do list with 20 items. A callback list counts down and loops through all the items as they are done; it’s significantly less efficient. With SproutCore, you can just type “isdone.” You need to start thinking about your Rails app as a simple data store and less as a place where your business logic resides. It’s good to use technology that you’re used to, but there needs to be a better ecosystem for plugins. In SproutCore, there’s a data store. With an error, you invoke a callback that sets the query to an error state; you can push arbitrary objects to the data store that say what you want. He thinks CoffeeScript is great for app code, but not so much for framework code. People can learn more about Strobe Corp at StrobeCorp.com and SproutCore at SproutCore.com Keep an eye out on the SproutCore blog for version 2.0 stuff. Download Episode | 6/21/11 | Free | View In iTunes |
| 9 | VideoBuild your own cloud platform using Cloud Foundry | Listen as we talk with Ezra Zygmuntowicz about CloudFoundry. Sit back and learn how Cloud Foundry works, what technologies they use, and how they enabling cloud platforms. We also take a look what Ezra did at Engine Yard and why he left to join up with VMWare! He previously worked at Engine Yard and now works with VMWare. He left Engine Yard because the growing sales side of things didn’t interest him as much. He wanted to build a platform as a service type of abstraction. They started Engine Yard before “cloud” meant what it means today. He went to Velocity and met Adam Jacobs from OpsCode who was working on Chef. Ezra was building a project called Stem Cell. Ezra built the whole platform on top of Chef. Derek Collison at VMWare wrote TIBCO Rendezvous, then worked at Google. In July of 2010, Ezra took a two-week vacation and then decided to leave the company to work for VMWare on the Cloud Foundry project. Cloud Foundry is “the Linux kernel for the cloud.” Linux is a kernel that abstracts over different CPU architectures. It abstracts infrastructures over service clouds. It’s a full platform, right out of the box. The program is open-source and written in Ruby. Cloud Foundry is very easily scalable. You can run a complete microcloud on your laptop. VMC is easy to change; you can control multiple clouds. Think of Cloud Foundry as “Rack for deployment.” There are three ways to extend Cloud Foundry. First of all, you can define declarataive APIs for adding language runtime APIs and frameworks. Out of the box, there is support for Ruby, Java, Node JS. He’s added Erlang, Scala, Python. There are also services: MySQL, Redis, MongoDB, Memcache or any type of service that your app uses over a socket. Anyone who’s a member of that cloud can choose VMC services, and choose “Create service, bind it to my app.” It runs anywhere that runs Ubuntu10.04. The VCAP repo is the main platfrom. There’s a problems with Heroku: you can have bad neighbor effect, once you get in, it’s hard to get out, etc. You can start on their hosted platform for free, and then, if you start to hit a bottleneck, you can take the open source and instantiate a cloud on Amazon or Rackspace. Now you have your own complete access to the cloud. You’ll have a lot of trouble doing that with one of these black box services. You can custom-tune the whole platform just for your app. A problem he’s seen with black box is that each application is unique and grows to become an ecosystem, and each program has unique scaling bottlenecks. In Cloud Foundry, you can customize the whole platform just for your app. There are three ways of detecting Ruby apps. There is also detection for Rails apps, Rack apps, and Sinatra apps. VMWare wanted to be loosely coupled to the source control system, but still wanted a differential deploying***. Your client will only send out what the client controller doesn’t have; a file will never have to be pushed to the cloud twice. Out of the box, they have MySQL, Redis, MongoDB, RabbitMQ*, and PostgreSQL. If you wanted to host your own service, you can do that, and run a Sinatra app, using the same tools to bind into it. You can pick up your cloud and pop it somewhere else, and still have the same functions They have plans to add a utility tier. Rather than charge per process, they do a “pool of resources.” Free accounts get 2GB of memory, a certain amount CPU cores, a certain amount of disk space, etc. You can get a table of your processes, how much memory your apps are using, etc., so you know when you’re close to running out of resources. CloudFoundry.com is the hosted service, CloudFoundry.org is the open-source community, and GitHub.com/CloudFoundry is the source code. Contact him @ezmobius on Twitter, or ez AT vmware.com. Download Episode | 6/13/11 | Free | View In iTunes |
| 10 | VideoAn inside look at how Heroku handles downtime | Join us as we sit down with Mark Imbriaco again, this time he works for Heroku! Sit back and learn how heroku works, what technologies they use, and how they handle a downtime situation. We also take a look into how the recent EBS outage effected Heroku and what they are doing to mitigate that. Mark switched to Heroku in August 2010 Heroku offers node.js support in a private beta Heroku makes it easy to deploy, you just git push Heroku is opinionated with ruby, node.js, and postgres only support, but they have add-ons Heroku add-ons include support for all kinds of things like Mongo, Redis, Memcache, Websolr, Chargify, New Relic and many more Support is handled by online ticketing for Heroku, but they don’t have an SLA at the moment Mark takes us through the recent amazon EC2 downtime, caused by their EBS problems Heroku has continous backups for dedicated db customers, and they are rolling it out for shared database customers too Heroku uses Nginx, a Varnish cluster, a custom erlang routing proxy, thin, and node.js They use RabbitMQ, Redis, and Postgres for state tracking and stats Opsdash is a custom monitoring dashboard that the Heroku team wrote Heroku also uses Nagios, Chef, SNMPd, CollectD with Graphite Railguns are the app servers and each app is a slug, Railguns launch slugs! Every engineer at Heroku participates in the on call support rotation The ops team are Incident Commanders! Woah, Mark was on vacation when EC2 started having problems! Heroku has a status app that they keep up to date Keep up to date with Heroku via their blog and twitter account Heroku is hiring! Mark is speaking at Velocity Mark is going to attend Surge & RubyConf Mark loves Percona! Check out Percona Live and use the code “PULP” to get $50 off. Download Episode | 5/25/11 | Free | View In iTunes |
| 11 | VideoLearn how to deliver pixel perfect fonts to all browsers using selenium testing | The process of delivering custom fonts to your website maybe more complex than you thought - as Ryan Carver, Founder & Head of Technology of Typekit explains in today’s episode of Webpulp.tv. Typekit is a hosted service that enables web designers to use custom fonts on their websites. Historically web designers have been limited to a number of common fonts which were pre-installed in every machine. Typekit’s service takes care of font file formatting, distribution, browser/system compatibility and rendering. It also handles the licensing of fonts, providing a vast choice of custom fonts to its users. The service uses JavaScript to do browser negotiation, font compatibility, format determination, etc. Afterwards, specifically generated CSS files are used to serve the custom fonts in the webpages. This multi-level integration demands for super reliability and efficiency, which Typekit ensures its customers via a worldwide CDN. Edgecast hosts Typekit’s CDN. Typekit’s service infrastructure is divided into two levels. The website ‘Typekit.com’ lets users to browse fonts, configure font kits, allow API access, etc. Whereas ‘[your-domain].typekit.com’ will host the selected fonts and collaborate with the JavaScript file. Typekit.com has a standard Ruby stack. It currently is a Rails 3 application. Besides MySQL, the stack also contains Redis and MongoDB. Redis is used for stashing Resque data, Vanity metrics, etc. MongoDB is used for storing CDN logs, basic analytics data, traffic-tracking data, etc. Typekit uses Nginx with Unicorn for its web and application servers. Then HAproxy is used to load-balance the whole thing. Ryan says Typekit currently has about a dozen servers in total, hosted on Slicehost. Typekit has a unique type of revenue-share deal with its Type foundry partners, distributing revenues based on the popularity/usage of font faces. MongoDB is particularly used for such usage-based data collection and calculation along with its built-in MapReduce framework for reporting. Ryan thinks on-the-fly-report-generation is technically very much possible with MapReduce. Typekit plans to shift to an EC2 environment in near future because of the easy scaling and flexibility of EC2. They are currently preparing a cloud formation with Chef, rebuilding Typekit’s operations infrastructure. They wish to prepare a different and independent stack for a totally automated sub-system. It will allow them to re-architect their 1) Font serving infrastructure, and 2) Font processing pipeline. The pipeline usually takes care of tasks like- hinting, font-browser optimization, subsetting, format conversion, etc. Next in line is the testing of all its fonts in an environment consisting of different platforms and browsers. They originally had their testing system written in Selenium. Currently it is a queue-based, Python script-powered, crawler type custom system named ‘FontSpider’. RSpec and Cucumber are used for further testing and development. They even have a CIJOE based CI server. For monitoring and alerting purposes, Munin and Pingdom are used. Thanks to a CDN-based architecture, Typekit hasn’t faced too much trouble scaling-wise. CDN’s pre-generation and remote serving of files minimizes runtime hassles. On the other hand, handling the font files has been much bigger of a challenge due to the complicated nature of such files. Firefox’s recent release of version 4 now supports ligature and OpenType font features, making the lives of people like Ryan a lot easier. Typekit users can also purchase additional, out-of-catalogue fonts from its foundry partners’ websites (Fontshop, ProcessType) and then integrate the license and import the fonts to their Typekit accounts. Download Episode | 5/18/11 | Free | View In iTunes |
| 12 | VideoInstantaneous information using javascript and python | Can Sar, the co-founder and CTO of Apture joins us today as we get to know more about this Python-written, JavaScript-depended unique product called ‘Apture’. Apture is a product for the publishers’ to inform their readers better. Using Apture, publishers can bring contents from the rest of the web into their websites, providing instantaneous access to relevant information in the very same webpage. Apture increases revenue opportunity for publishers by keeping readers engaged with its rich, compelling multimedia contents and search experience which is related to the actual content of the webpage. The concept of Apture was born as the founders were looking for a way to enhance the then-stale-online-news-experience. The initial Apture product served only as a sorting tool for the publishers; integrating with APIs from Google, Flickr, Wikipedia, etc. It let people working for the publishers to manually choose related content for their pages. Analyzing these trends has led the product to be a much more automated solution nowadays. Apture currently associates with brands likes The Washington Post, The Economist, Reader’s Digest, Times of India, etc. The client side of the product is very JavaScript heavy. Whereas all the components in the back-end is written in Python. Can Sar originally comes from an Operating-Systems background who is now most familiar with Python as a back-end language. There are two different ways to experience Apture. Publishers can make their pages Apture ready by integrating its JavaScript into their web templates. Readers are also able to independently make their web experience Apture supported by installing the Apture browser add-on. ‘Apture Highlights’ is a featured add-on for the newly launched Firefox 4 browser. Apture is able to fetch information from Twitter profile, CrunchBase page, Google Books, YouTube videos, etc. Apture started with only Django as its framework. Currently they have numerous services which use other custom frameworks apart from Django. With The Washington Post as the first launch partner, the Apture team had to pay close attention to scaling architecture from the beginning of the project. Currently their web tier is based entirely on Django. Searching is handled by a custom Eventlet based WSGI server which fetches information from the ‘search server’ and then serves it in the web browser. The closest Ruby equivalent to Eventlet is Eventmachine. Spawning is used as the webserver which itself uses Eventlet. WSGI stands for ‘Web Server Gateway Interface’. It is the python equivalent for Ruby’s Rack. The Apture team even had Tyler Croy from Slide join them to help on more event-based services. Apture uses a standard MySQL database with no sharding. The whole DB consists of only one single MySQL instance with two slaves for backup purposes. Caching on servers is done using Memcached. The service currently has a total of 25~50 servers hosted by Contegix. Besides hosting, Contegix handles all the operations and some monitoring as well. Prior to Contegix, Apture relied on Rackspace which unlike Contegix, does not provide the flexible, hands-on support that Apture requires. In addition to general performance monitoring, every single server and process is individually monitored via both Contegix and Apture’s independent monitoring measures. Nagios is used for alerting via SMS and email. Being a third-party JavaScript, Apture from the very beginning, faced constant pressure from publishers to keep its script small and light. The team also had to make sure that their script does not to collide with other, often poorly-written native scripts and CSS of the sites. Apture initially used Flash to communicate between channels. Most of which is done today via iFrame POSTing. The team has been using a modified version of the MooTools framework since the initial development stage of Apture. Additional modifications were added ov | 5/3/11 | Free | View In iTunes |
| 13 | VideoInstant web analytics and scaling Clicky using php | For today’s show, we have Sean Hammons with us from Clicky. We cover topics like- Web Analytics, Content Distribution Network, Geo Traffic Management, etc. Clicky is an advanced web analytics service. What distinguishes Clicky from other popular web analytics services is its real-time support. Clicky’s analytics results are instantaneous, whereas a service like Google Analytics usually takes 24 hours. The service is only 4 years old and it is already serving over 300,000 websites. Clicky has a LAMP stack, using Apache, MySQL and PHP. The service currently depends on a total of 50 servers, 40 of which are MySQL databases. A Sharding module is followed in the database as each of the DB servers handle about 8,000 websites. They also have 5 tracking servers which log data from website-specific analytic codes. These servers sit behind a load balancer. All the servers are Linux based. Clicky also has its own CDN (Content Distribution Network) with VPS.net servers which are distributed globally. Its CDN handles all the static data (tracking code, images, stylesheets, etc.), allowing the main servers to serve only the html codes. This application of a CDN dramatically increases the speed of its web experience. One of the reasons Clicky developed its own CDN is- the high expense of SSL-supported traditional CDNs just did not go with their cost management. Traditional CDNs are not suitable for Clicky because of the complicated way its servers were originally set-up. Clicky currently tracks about 300-million page views per day, which is about 1,000 page views per website on average. Because the service is so focused on individual visitor metrics, they had to apply a limit of maximum 500,000 page views/day for each website. Most of the analytics data are pre-processed; and the user data from the tracking servers are processed in batches. To achieve real-time, the service processes and summarizes these data instantaneously. The only thing that is not pre-processed are the data segmentation requests. Cloud solutions are just not suitable for some of Clicky’s needs. All of its 50 servers are hosted in a local data center at Oregon. When asked about an alternative storage technology, Sean says he would have gone for a NoSQL solution if Clicky were to start again from scratch. But on current circumstances, just because of the sheer volume of stored data, the prospect of migrating to a new system appears unrealistic to Sean. When a machine visits a website that has Clicky installed, it will communicate to the nearest Clicky CDN server though a tracking code. Clicky has numerous of these servers throughout the US, Europe and Asia (Singapore). These servers then report back to the Clicky home server in the US along with the user statistics. One of the newest features of Clicky is the Spy feature: real-time streaming of page view data. Munin is used in some of the servers to monitor server load, disk usage, memory usage, etc. It is a tool that has helped them to monitor their tweaking and to improve on efficiency. As an added measure of monitoring, a custom PHP script is used to ping all the servers, every minute. The script can even notify server downtime via text messages. Memcached is used for database caching. They also have a custom PHP-written, Apache-served HTML caching system to keep things speedy. Sean says, the biggest challenge for the team is to keep up with the constant scaling of the service. Clicky uses DynDNS’s geo-location based traffic management system which helps by serving the fastest available content from the nearest server to the client machines. The system has a failsafe feature as well, as it can automatically switch servers if a designated server happens to go offline. Getclicky.com currently provides both free and paid plans. You can get a free 21 day-trial of the ‘Pro plan’ just by signing up. This plan usually costs $10/month or $60/year. Customs plans are also available according | 3/31/11 | Free | View In iTunes |
| 14 | VideoScaling Postrank using Ruby and Eventmachine | Ilya Grigorik, CTO of PostRank joins us for today’s show as we talk about PostRank, a unique analytics service based on using social networks and other popularity algorithms. We discuss about more interesting stuff as we go further into the chat. PostRank mainly offers a social web analytics service, aggregating engagement data (posts) from popular social networking sites for publishers to monitor their contents, even in real-time. Although started as a fun-summer project in mid-2007, PostRank quickly grew into, what today is a large company; due to the practical demand of such a service. It can be said that PostRank is like a modern day adaptation of the PageRank concept. Thanks to the recent boom of social networks, publishers nowadays need much more of a dynamic analytics tools for their URLs and contents. This is where PostRank comes in. The company currently has a workforce of 15, most of whom are developers. The team has been relying on MySQL for all their databases up until very recently as the service continues to grow rapidly. The amount of information PostRank parses has a growth rate of 60% per year. Ruby is used for the entire API layer, in the front-end. For their various sub-products, they use Rails and Sinatra. Python is also used in various cases where the library is more suitable for the task. The system also has numerous Java and JRuby processes running. As of the end of 2010, PostRank has a total of 65 instances in EC2. As Ilya explains, it really isn’t a big number considering the amount of tasks the system performs. Ilya says that most of PostRank’s blog posts reflect on the things they’re currently working on. Nagios is used to monitor the servers and its services; whereas Ganglia is used to fetch operational metrics. They are currently using Splunk too. PostRank’s search engines and search servers are powered by Solr and Luncene. Companies like Apple and Twitter are also currently using Solr and Lucene. The PostRank team has come up with their very own search and indexing architecture where variable compression ratios are introduced in data storage, depending how old or new the data is. Ilya points out Eventmachine and RabbitMQ as the two of his most favorite-by-necessity tools of trade. The team evaluated a few other protocols such as XMPP before finally deciding to settle for AMQP, which apparently is more suitable for PostRank’s message-queuing needs. AMQP provides a modular, queuing approach to data processing instead of a monolithic approach which only makes it easier to monitor and to avoid bottlenecks and pile-ups. AMQP is also essential when the data pipeline consists of different programs (e.g. Python and JRuby) sitting in different servers. When a pipeline does stay within the same server, IPC protocols such as Beanstalk is used instead of AMQP. The newest feature that the team is currently working on is the ability to store actual (social) conversations associated with each content or URL. They’ve gone for a Cassandra cluster for indexing all the conversation-data. The cluster currently ingests about 20 GBs of content every day. Cassandra is mainly preferred for its incremental scalability. It also takes manual-data-migration out of the equation which saves a lot of time and effort. For the Cassandra cluster ephemeral storage is used, whereas EBS is used for the MySQL databases and Solr indexes which require additional protection. PostRank is about to open-source their fully asynchronous API web stack. Ilya hopes developers will play with it in order to develop scalable infrastructured services. Ilya says what they are about to release is more of a set of services than a framework which basically is a Rack API and which would work with the JSON API. Ilya thinks JSON is getting increasingly popular because of its responsiveness, something Rails and other frameworks should start to take into consideration. Download Episode | 3/29/11 | Free | View In iTunes |
| 15 | VideoPragmatic devops the rails ways | On today’s episode, we have Alex Nobert, Operations leader of Shopify. I get to know about what Shopify is and the bits and pieces of how the whole thing functions; and ultimately how Shopify turned out to be one of the most distinguished e-commerce platforms out there. Shopify is an e-commerce platform, which helps people to sell their products online. Shopify was born when Tobias Lütke, the current CEO of Shopify, couldn’t find a suitable e-commerce solution for his snowboard shop thus decided to design his own. The mantra or motto of Shopify is beautiful, elegant, easy-to-use and simple platform for the masses. Shopify can easily be tagged as one of the pioneers behind the ever-expanding e-commerce industry that currently exists. Today’s online e-commerce industry, being as vast as it is, Alex thinks it still has plenty of room to develop and bloom. Shopify currently has a total of around 20 severs which serve various purposes. Among them, 10 are actual dedicated app-servers handling the web applications. The rest include- database servers, log servers, dedicated search servers, utility servers, etc. Cloud is also involved in Shopify’s infrastructure as the Shopify App Store and Shopify Theme Store rely on the services like Rackspace Cloud and Amazon EC2. Shopify also uses the service of Amazon S3 which hosts its static assets. Shopify stores are Ruby on Rails based shop portals which have Nginx webservers in the front-end. Its app servers use Unicorn. Passanger is also used in some cases. For the process of SSL termination, hardware load balancers are used. All of shopify’s databases are MySQL based. They currently have 3 active database servers. As search servers, shopify used to use Solr. They currently use Sphinx. Alex says they did so because of Sphinx’s better ruby support. He particularly mentions about the Ultrasphinx library. They currently have 2 dedicated search servers. HAProxy is used to load balance between the search servers. Load balancing among its app servers are mostly done using f5 hardware load balancers. Alex speaks highly of the iRules feature of the f5 load balancers. Shopify’s 10 app servers currently handle about a total of 8,000 requests per minute which, according to Alex, is a tiny amount considering their total capacity. The team here, went for over provision. Apparently, the biggest challenge for operations is to deal with the frequent threat of DDoS attacks. Frequent, because of the sheer volume of Shopify which currently hosts about 12,000 paying shops. Operations also has to worry about maintaing a steady performance graph. Most app targeted DDoS attacks can be stopped by doing Layer-7 filtering, which they do in the firewall / load balancer level. For more advanced types of attacks like Slowloris, more aggressive TCP timeouts are introduced by the team. Shopify uses Munin to collect data and to provide statistics on all the servers. They also use Nagios for similar monitoring and alerting purposes. Pingdom is used to monitor app downtime and to provide SMS notifications accordingly. Newrelic RPM is another tool they use for realtime monitoring of the whole service, thus preventing service degredation. According Alex, the best thing about working for Shopify is that the company does not hesitate to experiment with the leading edge technologies and methods. Alex then shares his personal and his company’s opinion towards open-sourcing programs. While hiring, Shopify puts great importance in the open sourced materials their potential employees possess. Alex thinks it’s only fair that they give back to the community from time to time. Open sourcing also means that the community gets to make the product even better than before. Delayed_job and active_merchant are two prime examples of this scenario. Alex doesn’t consider open-sourcing as a threat for Shopify’s business. As he puts it, a home-brewer, e-commerce developer doesn | 3/25/11 | Free | View In iTunes |
| 16 | VideoRealtime feed updates with Pubsubhubbub and XMPP | Webpulp.tv is back in the saddle again! We took a bit longer break than expected due to a move. This time, I had the opportunity to chat with Julien Genestoux, the CEO and founder of Superfeedr. Julien goes to great lengths and details to explain the whole concept of this product and how it functions. Superfeedr is a remote infrastructure which transforms static RSS web feeds into realtime web feeds. If your app publishes or subscribes to RSS feeds, Superfeedr can help you turn it into realtime, instant feeds. Realtime is achieved by working closely with publishers and with the help of the Pubsubhubbub protocol. Superfeedr is used by publishers like Tumblr, Posterous, Typepad, etc. The main objective of Superfeedr is to make the process consuming feeds fairly easy. In order to do so, Superfeedr has come up with various types of services such as - Atom based normalization, Digest notifications, Feed retrieval, etc. Data normalization is one of the signature features of Superfeedr. It solves quite a lot of issues and also boosts up the overall efficiency of the system. Julien believes developers can concentrate more on app development When they don’t have to worry about the feed publishing/consumption process. Pubsubhubbub is a protocol developed by Google engineers. It helps by handling feeds in realtime. Google reader already has this technology integrated. Julien says Superfeedr’s infrastructure can be compared to a botnet system. It mainly consists of a lot of XMPP workers which can connect to other worker instances. The infrastructure is mostly written in Ruby with some help of C. The XMPP servers and a few other components are based on Erlang. Superfeedr uses Redis for data storage. Redis is preferred because of its speed of execution and its ability to store all the data inside the server memory. Another reason why they prefer Redis is because it allows atomic operations on complex data searchers. Among these complex data searchers are the ‘rings’; which are also among the key components of the infrastructure. These ‘rings’ are used as schedulers which decide when to pull the feeds and lists. These ring cycles are rotated as fast as possible in order to pull feeds faster. They use EventMachine library for the ruby instances which handle all the network issues. Despite the fact that a Ruby oriented solution wasn’t the best in this case, the team chose Ruby because they come from a Ruby community. Julien also explains why XMPP was preferred over RabbitMQ for messaging purposes. RabbitMQ, according to Julien, is more suitable for queue based tasks. XMPP is much more complex and flexible which can work with queries, server-presence, etc. Feeds and XMPP are both XML based. Hence it made more sense to the team to go for XMPP. The system does use RabbitMQ for small tasks, but for most of the messaging is handled by XMPP. Superfeedr provides two (XMPP and Pubsubhubbub) different types of setup subscription protocols. The XMPP protocol can help when the subscriber is behind a stubborn firewall or when the feed-frequency is high. If the subscriber has subscribed to a large number of feeds and the feed frequency is relatively low then the Pubsubhubbub protocol is more suitable. Both protocols serve feeds in an identical format. The only difference between XMPP and the Pubsubhubbub protocol setup is thet difference between their transportation layers. Superfeedr is capable of supporting all types of custom feed setups through normalization. Custom blogs can notify Superfeedr about new materials via pinging or other various creative methods like - tweet monitoring. Last resort for custom blogs is the monitoring of ETags and ‘If-Modified-Since’ headers, which Superfeedr can check every 15 minutes. ETags and If-Modified-Since headers allow client to make conditional requests and queries to the server and thus checking for new updates. These can easily avoid content generation on the | 3/10/11 | Free | View In iTunes |
| 17 | VideoRunning mongodb in the cloud with 33 aws instances, chef, and sheer grit for MongoHQ | Join me for a great chat with Ben Wyrosdick and Jason McCay about MongoHQ and how they built it! The goal of MongoHQ is to reduce the barrier of entry to get going with Mongo. They created MongoHQ about 2 years ago. MongoHQ is hosted on AWS — Amazon Web Services With MongoHQ, people can have their own IO, essentially. MongoHQ allows users to grow in incremental steps, to scale their growth cost-effectively. Their advice: use MongoHQ effectively, and you will have success with it. With MongoHQ, people can visualize their data in a way that matches the paradigm of how they use it. MongoHQ is powered by MySQL, MongoDB, and Redis They use Resque as a background job processor. Right now, they have 33 instances on Amazon. They manage instances for other people as well. They use Chef for automation. They use NagiOs on a server outside of AWS, Cacti for graphing, and Moto for sending text alerts. With the new version of MongoHQ, you can round robin your reads around multiple sets. There are autofail benefits to using replica sets. Recently, they were able to acquire funding and Download Episode | 12/31/10 | Free | View In iTunes |
| 18 | VideoGet the inside scoop on how StillAlive works to notify you if you website is up AND functioning | I recently had a late night chat with Mikel Lindsaar from Rubyx. Besides Rubyx and the team, we talked mostly about the StillAlive app - their submission for the 2010 Rails Rumble. Mikel works as a consultant for Rubyx - a web development consultancy firm based in Australia. Consisting of only three members (Dave Cruikshank, Ivan Vanderbyl and Mikel Lindsaar), Rubyx doesn’t really have a large workforce. They produce high-end web apps using Rails according to what the customer wants. The team also provide certain types of monitoring solutions for websites. Mikel claims that their American clients love the service because of the time zone difference. He then unfolds the story behind the ‘StillAlive’ app and how it all started. The team was originally looking for a way to monitor web apps which resulted in the making of a different app named - TellThemWhen. It (TellThemWhen) is designed to notify users about the events of your website, in their very own timezone. The team then realized the necessity of checking the status of the website or the app itself. The solution they were looking for, needed to be more just a ping-check. Mikel specifically needed an app that would help him run Cucumber-tests on the server while it was on production. Along came Rails Rumble. The team (which by now had a new member - Rex Chung) already had an idea. They just gave it a name - ‘StillAlive’. All StillAlive does is - it takes cucumber-like webrat stories and turns them into a web server. Then the users run a specific set of recipies against the web server which either passes or fails. Whenever the test fails (meaning your site/app is down) it notifies the user via Email and SMS. Even though they haven’t officially started yet, StillAlive already has over 300 users monitoring close to 500 websites since the Rails Rumble. The approach behind StillAlive was to make it app-agnostic. Meaning it will work for all types of (PHP, Rails, Perl, .net, Java, etc.) web applications. Mikel says this approach has allowed all kinds of users to come forward and create some pretty impressive recipies for their own apps and sites. Free from the pressure and constrains of Rails Rumble, The team now plans to introduce even more features to StillAlive as plugins. Besides ping-check, they plan to introduce API checking tools for our current tendency towards API dependency. They plan to follow the Google (feed) Reader model for such API-check modules. Similar API-check results can be shared among users for efficiency. Mikel consciously accepts the failure of giving the concept an unique identity by naming it ‘Social-Monitoring’. What the team is currently working on is to make the recipe editor much more interactive and flexible. StillAlive is destined to be a market-driven product. They will always be open to make necessary changes to meet the demands. When it comes to pricing, they intend to follow the Github model - things like basic blog-checking are going to be absolutely free. Premium paid accounts will have additional features, such as - decreased interval between checks, ability to monitor a large number of sites, number of recipes, etc. Due to the huge popularity and demand of the app, the team is working very hard to put it all up and live, as early as possible. Ivan Vanderbyl is in charge of the UI section. Rex Chung handles the engineering side of the story. Dave Cruikshank takes care of all the “front-end stuff” of the app. Mikel suggests a month or two before StillAlive starts at full throttle. Initially StillAlive started with Linode as its host but eventually they changed to one that suited their setup better. They are switching hosts again for the same reason. During the Rails Rumble the team didn’t hesitate to use various tools to make their jobs easier - using the latest Rails 3.0, taking help from Intridea - using Omniauth, etc. All these with two rails app, plus the gems - it was ar | 12/9/10 | Free | View In iTunes |
| 19 | VideoPush realtime updates to your web apps | I talked with Martyn Loughran from the Pusher App team. Besides Pusher App and its services, he talked about the new WebSocket protocol technology. Pusher App provides a real-time Push-to-Browsers service for existing websites. It is a cloud based service. Unlike the conventional method of using the Comet model, Pusher App uses the brand new ‘WebSocket protocol’ for the real-time pushes. WebSocket is a technology which helps browsers to establish a socket-like bidirectional connection to a server. It (WebSocket) is a new addition in HTML5. For browsers that don’t support WebSocket, they also have a Flash alternative. Pusher App basically is a hosted WebSocket solution for your website. They are hosted on Amazon Web Services (AWS). Although Pusher App is in its Beta stage, it can handle up to thousands of concurrent users. Martyn then breaks down their stack, which is mostly consisted of open-source software. Pusher App uses a Ruby EventMachine based WebSocket server named ‘Em-Websocket’. For the API side of the story, they use Sinatra which keeps the whole thing light. Redis is used for look-ups, authentication, etc. For messaging-middleware purposes they depend on the AMQP protocol. Martyn specifically mentions about RabbitMQ. Pusher App uses Amazon’s Elastic Load Balancer to load balance the API components. To load balance the socket suppresses, they use HAproxy. Martyn goes ahead and explains the topology of RabbitMQ in detail. He then talks about some of the interesting clients of Pusher App. Pusher App is often used for tasks similar to collaborative text editing. Martyn has seen people pushing tweets, score-updates, feeds and all sorts of things. Pusher App is used in Scrabb.ly which is a popular MMO type crossword game. Pusher App intends to stick to only WebSocket type connections for pushing. Martyn says their current objective is to improve the current product rather than exploring other push-methods. They currently have 4 (four) instances running with AWS, 2 (two) of which run WebSocket services. Pusher App provides special analytics tools to its users. While it helps users to monitor their site and app performance, it is also the same tool Pusher App uses to bill them. In comparison with Pusher App Stats, Google Analytics is often quite delayed. Pusher App has a MySQL database through the Amazon RDS service. Martyn encourages people who are interested to have a look at their site, specially the pricing page. As part of the Beta program, it currently is free to sign up and get started with the ‘Sandbox’ plan. Pusher App has a vast range of libraries covering pretty much every language. Download Episode | 11/20/10 | Free | View In iTunes |
| 20 | VideoDeep dive into MySQL with the experts at Percona | I interviewed Baron Schwartz from Percona. We talked mostly about their field of expertise, which is MySQL. Percona provides various types of MySQL and LAMP related services and consultancy for companies. It was founded by Peter Zaitsev and Vadim Tkachenko. Baron joined Percona in April 2008 as a consultant. He subsequently was the ‘Direction of Consulting’ and the ‘VP of Consulting’. He is now the ‘Chief Performance Architect’ in Percona. The current VP of Consulting is Espen Braekken Percona has its own version of MySQL. Percona covers numerous ‘big-name’ clients from all over USA, Canada and a vast region of Europe. Percona server started out as a patch - a performance enhancing tool, written by their Polish consultant Maciek Dobrzanski. The patch later on got developed into a much more complete product - XtraDB. There is no compiling or plug-in hassle with XtraDB as Percona server comes with XtraDB complied into it. Choosing XtraDB over InnoDB and Percona server over MySQL can give you - operational enhancements and performance & scalability. It also provides additional instrumentations and monitorability of the server. By default, Percona server with XtraDB is faster than MySQL. When web companies need to upgrade their DB management system, they go to Percona. Percona is also asked for help when a company doesn’t have an in-house DBA. As database hardwares got more powerful and systems grew bigger, people started to notice various types of new problems. Baron has written a toolkit called Aspersa which helps by gather diagnostic data to solve unorthodox problems like server lock-ups. Baron says whenever solving a problem, the simple operating system tools can provide precious diagnostic informations. But a common mistake that many DBAs make is the temptation of drawing conclusion from only the data that is available to you. That’s why the Percona team has ended up building many efficient diagnostic tools of their own (like Aspersa and Maatkit) which serves their particular purposes. Aspersa is basically a collection of scripts. Baron names two of them - summary and mysql-summary. These tools summarize the system status, what’s installed in them, how it’s configured, etc. Aspersa even understands most of the popular Raid-controllers. Baron then talks about server memory, memory related tuner tools and Cache-Hit-Ratio. Server performance is mostly about time - It’s not about the number of operations to a cache; It’s about how expensive it is to miss that cache. Most of the tuner tools are based on simple rules of thumb - telling you to add more memory or increase cache size; where there could be other, more efficient logical solutions to optimize performance. Writing better index and rebuilding the query execution plan can also help optimize performance dramatically. His book High Performance MySQL, Second Edition goes deep into this topic. The book also explores topics like replication, high availability, scaling, backup-recovery, operations, tools to help you with your tasks, etc. For backup and restore purposes, the percona server comes with a tool called Xtrabackup. Being designed for XtraDB, it works fine with InnoDB as well. Xtrabackup is basically an open-source reimplementation of InnoDB’s Hotbackup tool. As its not proprietary, it’s directly linked to standard MySQL command-line client libraries. Baron then explains why you should go for Xtrabackup over Mysqldump. With Mysqldump, when the dump size surpasses a couple hundred gigabytes, the restore process could take days if not months. On the other hand, Xtrabackup works fine with such large chunks of data. For people who want to learn more about high performance MySQL, Baron recommends to visit Mysqlperformanceblog.com. Download Episode | 11/1/10 | Free | View In iTunes |
| 21 | VideoLearn about the posterous infrastructure and how they handled their DDOS attack! | I had an interesting chat with Vince Chu from Posterous. The discussion was mainly about Posterous the service, its infrastucture, database configuration, etc. We also talked about a rather sensitive subject - DDoS attacks. Posterous is, as Vince puts it - the easiest way to post anything online. It was launched in July 2008. Posterous’s core feature is its smart ‘post-by-email’ option; easy to use yet very powerful. Posterous makes managing different social (website) accounts easier through its auto-post integration with Facebook, Twitter, Flickr and other similar websites. Vince then explains how a typical web request to a posterous blog is normally handled. They have Nginx in the front end which fetches all the incoming requests and then proxies them back to the Varnish cache. Varnish caching is a recent addition to their system and they’ve already noticed a 60% overall improvement in page load handling performance. If the requested page cannot be found in the cache, the request is then passed on to HAproxy. HAproxy mainly load balances the requests to their back-end servers. These basically are Unicorn servers, handling Rails requests. For the database, Posterous relies on MySQL. They also use a lot of Memcached for the protection of the databases. For various back-end, background processing tasks, they use Delayed_Job and Resque. Delayed_Job is backed by MySQL and Resque is backed by Redis. Posterous originally started on cloud servers which were provided by Slicehost. They have recently switched over to Rackspace dedicated hosting. Vince says that the switchover was a big project for them which took months of rigorous planning. He says in the end it’s totally worth it as it is now catering perfectly to their needs. Posterous has recently experienced a couple attempts of DDoS (distributed denial-of-service) attacks. Vince talks about the whole experience and how to deal with such situations. Although DDoS attacks are able to make substantial damages to websites, such attacks don’t require much resource. Almost anybody can rent botnets which are able to generate huge amount of traffic, easily surpassing your bandwidth threshold limit thus taking your site down overnight. Vince thinks that because of the random accessibility to such malicious tool and the high probability of such attacks, webmasters are often forced to overlook the threats of DDoS attacks. As your site grows, you need to be able to take care of such threats and take precautions against such attacks. One of the only ways you can deal with DDoS threats is to work with third-parties that specialize in DDoS mitigation. DDoS mitigation is basically a process of filtering out malicious activity, allowing only genuine requests to reach your front-end, thus protecting your website. Nowadays, many of the hosting providers offer DDoS mitigation services. Posterous currently uses Preventier™ - a DDoS mitigation service provided by Rackspace. One thing to always remember is DDoS mitigation / filtering will often lead to severe degradation of the service that you provide. Sometimes even legit activity from genuine users can be misinterpreted as malicious activity to the filter. That is why you want to go off your DDoS protection as soon as possible. Posterous started using Varnish after their DDoS attack experience. Varnish being as beefy as it is, they currently have only 2 Varnish servers. And they have about 20 servers at Rackspace. Vince then moves on to a discussion focused more on database, talking about the various database settings of Posterous. Posterous’s MySQL setting is - One master and a couple of slaves in the back-end. The slaves help taking load off the master database. Vince talks about the infinite possibility of MySQL. Although there are other alternatives and NoSQL solutions out there, he doesn’t see the importance of MySQL disappearing anytime soon. For application monitoring purposes, Posterous use | 10/23/10 | Free | View In iTunes |
| 22 | VideoPeer into PCI Compliance and learn how Chargify handles it for you | I interviewed Lance Walley about his company Chargify. We also touched on several related topics like merchant accounts, online payment, credit card billing, and PCI compliance Chargify is a system to make it easy to bill your customers for recurring charges. Lance is the CEO & founder of Chargify. People have recently started to prefer Chargify over Paypal and other merchant accounts for reliability. As payment gateway services, Chargify supports Authorize.net, TrustCommerce, Beanstream and PaymentExpress. These payment gateways then connect you to a number of banks. Lance also mentions about a blogpost he wrote six months ago where he had explained these technical terms rather interactively. As a merchant, you never want to store or handle the credit card data by yourself because it just makes you an easy target for the hackers. It is security concerns like these that have led to the birth of applications like Chargify and Authorize.net which can take care of the risky businesses for you. PCI compliance is an IT security standard defined by the Payment Card Industry Security Standards Council. It basically certifies that services like Chargify and Authorize.net are safe and secured. A system has to fulfill 12 requirements before it can achieve PCI compliancy. Lance says Chargify is soon to be examined by an auditor and then they’ll be officially PCI compliant. When running a PCI service like Chargify, there is nothing wrong with cloud (AWS, Engine Yard, Rackspace, etc.) servers. Lance thinks there is a lot of misunderstanding and misinformation about this topic which needs to be cleared. But relying on cloud servers do contradict with some of the requirements of the PCI compliance. That’s why Chargify has chosen to go for physical servers. Chargify initially started out with Engine Yard cloud servers. They have recently moved to a data center company situated outside of Kansas which happens to specialize in PCI compliance requirements. Lance Wally is actually one of the co-founders of Engine Yard. He and his friend Tom Morini founded Engine Yard back in 2006. He was also the CEO of Engine Yard. As soon as they brought in a professional management team, Lance left his post as the CEO and started focusing on some of the other problems he had noticed working for Engine Yard. Lance noticed that as Engine Yard’s customers grew, their bookkeepers got buried in over frequent billing activity management tasks (BAM!). Lance therefore was on his quest for a better, automated recurring billing system. He preferred a company with a basecamp like business model (Freemium) and that is how he ended up in Chargify. As far as the coding goes, Chargify is 100% Ruby on Rails. In their data center, VMware is used for virtualization and CentOS as the operating system. Chargify also uses Nginx and Phusion Passenger They have 2 servers running the app, 2 more servers running the database - a total of 4 servers plus shared DB in the data center. Lance says their data center is SAS 70 compliant. Lance says following the PCI compliance has its own downsides sometimes. Now they cannot develop something and push to production as easily as they used to - because of the regulations. But of course being PCI compliant means Chargify will be less vulnerable to security threats or malicious incidents. It provides the much needed confidence for people to rely on Chargify with their credit card data. This compliancy is very important from the marketing point of view as well. Lance says he can see a pattern emerging where the credit card data will be in fewer and fewer hands. Instead of a million places, it is better to have the credit card information in only a handful of highly secured places. Even if you use Chargify or other similar services to store your credit card data, if your website has a page which displays or accepts credit card numbers, make sure the page is SSL encrypted. An alternative solution would be to use the secured p | 10/12/10 | Free | View In iTunes |
| 23 | VideoBudget scaling a web app for DNS | I talked with Anthony Eden about DNSimple and how it works. We also delve a bit into why DNS is so important. DNSimple provides first of its kind - DNS hosting through a simple and easy-to-use web interface / API. DNSimple also provides second-level domain registration and domain transfer services. The main motive behind starting DNSimple was to make the whole DNS registration procedure (including domain registration, renew, transfer, etc.) as simple as possible. DNSimple is a partnership between Anthony and his brother Darrin Eden. Darrin handles the web operations whereas Anthony handles the application development side. They integrated an API to their system for the sole purpose of automation. It also helps by simplifying the registration process. Anthony thinks with all the automation happening in the world of system management right now, DNS hosting needed to be redefined as well. That is why they came up with DNSimple. The specialty of DNSimple is its response time - whenever a record is added or modified, DNSimple updates its DNS servers almost instantaneously. Unlike other DNS management companies, DNSimple allows users to set the value of TTL to as low as 1 minute. This feature is very helpful in EC2 instances. As the DNS server itself, they use an open source software named PowerDNS. Anthony says it’s very flexible, reliable and suitable for their needs. MySQL is used for storing all the DNS records. Anthony describes DNSimple as a “bootstrap startup” project. They are currently depending on various server hosts for redundancy and also to keep the maintenance costs low. DNSimple started out with having servers on Linode. Now they’re using the cloud services provided by Rackspace. Anthony explains this whole strategy as - spreading things out and adding in redundancy as the service grows. They host the DNS of their own website using DNSimple. Anthony then breaks down a typical web-request to their website. When someone types in the web address of DNSimple, it goes through their DNS server getting the A record for the web front-end of the site. After hitting the web front-end, the query gets processed by Rails, doing additional query to a Postgres database when needed. After a series of queries, the site finally generates a response and then sends it back to the user. DNSimple currently uses Unicorn to host Rails. They always try to keep their web front-end as snappy as possible and for this reason, processes are often dumped to the back-end. For such back-end, background processing, tools like Redis and Resque are used. Being the backbone of internet, DNS is the most crucial element for any website project. If the DNS server is down, it can jeopardize the whole site along with all its services. Unlike normal caching with web requests, DNS caching is a much more complex system which requires heavy tweaking of certain values and strings like TTL, etc. in order to suit one’s needs. With DNS caching, when you modify a record you not only have to update locally but you also have to wait for other servers throughout the internet to update as well - that’s why configuring the TTL value is very crucial. PowerDNS has its very own caching system which is designed to handle enormous loads. It doesn’t need to hit the MySQL in the back-end for every query. DNSimple is currently working on a way to put all of the domain registration and transferring processes into an API. Anthony says it is the next big thing for them. They also plan to do some marketing and sales campaigns in the near future in order to spread their name out there and to win the trust of all the developer communities. DNSimple’s goal is to be responsive to its customers’ needs, providing them the best possible service and to ensure them the ‘money-well-spent’ feeling. Anthony says - every internet entrepreneur’s target should be to start small and become profitable early on. This is the only | 9/30/10 | Free | View In iTunes |
| 24 | VideoWatch John Allspaw explain how they do devops right at Etsy | I talked with John Allspaw about Etsy, Flickr and about the various aspects of WebOps. John also talks about his new book - ‘Web Operations’ and much more. The book (Web Operations) covers topics like : database, storage, handling outages, communication, postmortem - pretty much all the topics in the field of web operations. What makes the book special is that it was written by 17 different writers who are the leaders of their respective fields and positions. John runs operations at Etsy. He had worked in a similar position for Flickr aswell. As the VP of Technical Operations at Etsy, His responsibilities include - moderating the infrastructure, provisioning, metrics-collection, monitoring, capacity planning, etc. John points out the differences between working at Yahoo (Flickr) and working at Etsy. He says the priorities and challenges are very different from each other. Storage is the main priority for flickr. They have about 10 data centers all over the world serving over 10 Petabytes of photos. Not only that, flickr has to make sure that all the photos last forever. Although Etsy do handle photo storage to some extent, storage is certainly not among their main concerns. Their storage system is more focused on recency unlike flickr. Usually all the photo caches of flickr is constantly full and the eviction is done at the storage level; where Etsy tries to encapsulate the whole workload or at least the most active part of it under cache. According to John, Etsy’s network architecture is undergoing somewhat of an evolution as they are experimenting with different products and tools to see what works out best for them. The Etsy website is served with PHP and it runs under an Apache server. Memcached is used for database caching and for image caching, Etsy uses Squid. Etsy’s image servers had a Python Twisted origin. But now they’re using plain ol’ Apache for image servers. Squid is used also for reverse proxy caching. Etsy started off with a Postgres origin but they’ve recently started to migrate to MySQL. They also use MongoDB for specific features of the site. Scala sits in front of the MongoDB as a REST interface. Currently search queries are handled by Solr. But they do plan to introduce a Thrift interface in front of Solr to make the whole searching process even more flexible. Etsy’s code deployment system, named ‘Deployinator’, is written in Ruby. There is a detailed blog post about it in the Etsy engineering blog. One of the reasons why they are moving from Postgres to MySQL is that most of their employees have more experience with MySQL. Replication and coming up with a flickr type database architecture are the two other reasons for this migration decision. John finally describes this migration as - “making the database dumber, but making it more efficient and suitable.” Etsy’s deployment cycle is more of a continuous process where they deploy small and frequent code changes couple of times a day or as needed. Sites like flickr and Kaching use this same method. John mentions about a related article posted on Kaching. John has recently given a speech about continuous deployment at the Velocity conference alongside with Paul Hammond. For source code repository, Etsy is using SVN at the moment. For configuration management, they use Chef which itself uses a Git repository. John talks about ‘dark deployment’ - a very strategic approach to code deployment where the code or the feature is at first deployed / turned on for only a specific group of people. John describes this technique as ‘liberating’. Etsy uses Nagios For general website monitoring purposes. For gathering metrics from SNMP driven devices (i.e. routers, switches, etc.), Etsy uses Cacti. Ganglia is used for gathering metrics on CPU usage, disk usage, network status, Apache, Postgres, MySQL, Memchached status, etc. For Etsy specific custom metrics, they us | 9/21/10 | Free | View In iTunes |
| 25 | VideoMonitoring web apps with sharded mysql | I talked with Lew Cirne about New Relic, why monitoring is important, switching their data collector to java, and tweaking MySQL heavily. * DISCLAIMER * New Relic is a sponsor of the show, but I believe their product plays a very important role in the devops space. New Relic is a web application performance management company focused on providing runtime visibility of your web apps. Unlike Scout or Fiveruns, New Relic is very easy to configure - all you need to do is just install the plug-in and it should be ready for your app. New Relic is very specific about the platforms it supports - which is mainly Rails apps. It also has sacrificed customizability just to make the tool very simple and easy to use. The company has a ‘contrib’ version on Github where people can put instrumentation from MongoDB or AWS APis that New Relic couldn’t cover in their library. As new codes and platforms are emerging everyday, business demands are changing too. New Relic allows customers to responsibly take on the risks of this rapid change through various instrumentation. New relic is basically a Rails app in the front end and a Java app in the back end. This multi-threaded network architecture provides extra efficiency and better performance. Multi-threading also allowed them to triple their traffic in the last year without adding any new hardware. The company uses highly modified versions of Nginx and Haproxy for load distribution which stands in front of it’s Java app system. As its data collection system, New Relic uses JDEE and it has Nginx in front of it. According to Lew Cirne, New Relic’s secret for persistent data processing is nothing more than some optimized database sharding and good old fashioned MySQL. When New Relic first started, they had about 4-5 shards. The company now has 11 shards in production. The New Relic team has it’s own custom sharding library. They did think about open sourcing the code but then finally decided not to as the code became more specific to their own requirements. Lew Cirne thinks that New Relic cannot afford to experiment with Cassandra or other large scale NoSQL database systems as they are having to manage customers who are running serious businesses. So when it comes to data storage, they plan to only use proven technologies. Whenever a new account is created at New Relic, a special algorithm is used to find out the least busy shard and then it is assigned to the new account for load balancing. New Relic uses dedicated clustered architecture for their servers. Lew Cirne thinks Dedicated servers are more suitable for New Relic’s app as EC2 just isn’t efficient for the way New Relic works. According to him, Whether you need EC2 or a dedicated server depends solely on what your app does and what its requirements are. The primary tool to keep the New Relic website running well is the New Relic app itself. And anytime when the app is not helping to solve a performance issue of the site, it just helps them to tweak the product and make it even better. Lew Cirne says that RPM is very important for your apps and sites because it provides vital information and measurements for your product, which ultimately helps you to do the correct tweaking to make your product even better. For most of the MySQL intensive shards and config issues, New Relic relies on the expertise of Percona. New Relic has recently expanded into Java instrumentation. While preparing for the Java instrumentation, they have created a interface where they can start supporting other platforms as well. So now they are thinking about supporting Sinatra, .NET and many other platforms. The ultimate vision of New Relic is to provide an easy performance management solution for your web app regardless of its platform. The pricing of the different packages of New Relic has always been an interesting issue and Lew Cirne dives into it. According to him - all the features that New Relic provides makes the price tag | 9/9/10 | Free | View In iTunes |
| 26 | VideoUsing Varnish to scale NYtimes.com | I had a chat with Jacob Harris to talk about New York Times, working on the interactive newsroom team, and how they use varnish to scale their apps without breaking a sweat. The interactive newsroom team makes interactive apps that go up on the NYTimes home page: elections, oscars, olymipics, etc. News editing staff with done thing such as election, water polluters, football player, travel, face a lot of charms and challenges but its fun. They are using Rails Apps for 3 years running with EC2 tools couple of proxy come in and ice of varnish of cache in the traffic.com They use MySQLfor databases and it's awesome according to Jacob The Mango IT is very useful for photo uploads forms, data that could anything, pictures, and colors. Also used in Twitter hacker but mostly in New York Times In easy tunes, they have few backend server applications a loader and bouncer and 1 varnish cache. Miscellaneous machine, Mysql- RDS mostly use. They looking for using Red-Hot Proxy Season and Proxy Spot If there's a events like election and Oscar award night they use proxy, bc2, 3 or 4 ec 2, 4 additional web server, for traffic and a large of ec2 proxy different from day to day. New York times have a bunch of 40 live application, mango, proxy configuration, for market folks sever. They use varnish for cache, it’s a good deal to the traffic. Varnish is easy to set up it has VCL language, it’s a DSL, and main feature of Varnish is its configuration language. When the program comes in it takes action and if your retrieving on the backend it easy to store cache. Varnish is technically single point of failure. But it’s a way of correcting. Central point cache goes to application and it’s very quick, do not need to go further. Varnish is very powerful to decomposing the site in a small sub page in the substituting together in cache with separate time outs, different TTPL, can send full dynamic in the cache of the site. Olympics apps, New York Times have a graphic group; they are people good in JavaScript and do a lot of work with Flash Apps,and maps wizard for IPod. They have 1 varnish to all of the traffic. New York Times Request per day, 30 to 60 hits for seconds for traffic. Varnish can used for open source C 10K caches. Varnish cache control is private stuff. Varnish VSL, browser can be cache things and sometimes its use by JavaScript or a flash thing by adding step on request. In general, varnish follows the same logic a browser cache, it’s a browser header. It cache the full URL, can check longer, with particular info like PPIT, CSC, Google, CC Sql, which cached keys. Jacob said that Varnish is his favorite cached tools right now. Download Episode | 9/1/10 | Free | View In iTunes |
| 27 | VideoDeep dive into Riak and how it works | I chatted with Kevin Smith to talk about basho, how riak works, and what the CAP theorem is. Riak is a distriuted key/value store that is meant to be clustered Riak really simplifies at data application for plump color performance that run faster for more node for cluster. Kevin said that Riak is meant to be robust, pretty fast, operation friendly. Riak has two things; It has open source and enterprise it can scale to any number with the biggest application and features. The links that data center are normal faster than approach of data to synchronization. History starts from 1999 Dr. Eric Brewer's the board of director for Basho The CAP theorems are distributed computer system to simultaneously provide all three of the following guarantees: C- consistency, A- availability, P- Partition color. Like in Riak, never switch on dying, never lose the nodes in Cluster, do the design and data in distribution. Riak Core into CAP scaling is like a major family in databases, diamond family that excited Amazon's Dynamo Paper, and the big table. Riak definitely sense the diamond, which emphasis the A (availability) and C (consistency), that the inconsistency is more important. Buckets and keys are the only way to organize data inside of Riak. User data is stored and referenced by bucket/key pairs. In general if you need store data or a key data; known that really good indication like Riak. Why used Riak, you can used in maps, Riak has cluster, Ol cluster and OLTP cluster that have fast machine that they used replication in CPU that can do analytics data. MapReduce, popularized by google first which has a wall fire, like other application. MapReduce is operation on cluster with Riak space, maps are listed inputs, buckets keys on process, Riak takes a process that you wanna run. If you want to extract data or if you want on calculation and they sent that code that on node you want. The MapReduce language is written by Json, the lystic keys or name of buckets, Thank can be written by Earl, spire monkey, JavaScript, virtual machine The funny tweets about Riak said by Josh, has explained by Kevin that the founder of Basho are come from acamai. Acamai are the company’s on late '90s working on the distributed systems, CBN country delivered networks that really large scale which very relevant and consistent and easy operate. see that a lot of lesson that design are included on Riak. Riak features are programming Interfaces, Management Tools, protocol boppers, SNMP support, interdata center replication, Multi-node clustering, search interface, Map/reduce. Riak are all equals, no master in nodes, in front of all cluster are all works. Download Episode | 8/27/10 | Free | View In iTunes |
| 28 | VideoAutomating the cloud | Join me as I chat with Joshua Timberman about the opscode platform and Chef. Chef was written by Adam Jacobs and expanded by the opscode team and the open source community The opscode platform is a version of Chef available as a service, or "CMaaS" - Configuration Management as a Service Opscde Platform do all kinds of crazy magic using different application software Opscode develop fully automated server infrastructure provides a data storage system for node, role and other data and a file server for Chef and Cookbooks When you write your recipes, put on cookbooks upload on the server then connect your clients as node retrieve the configuration where you to find a list of recipes Web server rolls and recipes to perform that rolls and when the node run chef, they can store and retrieve opscode data about itself and about the info instructure. Chef server provides a search index functionality for data storing made available for research of data’s Chef server advantages, it has Sets an architectural difference of lightweight scale, scale volume and security of infrastructure They have flexible 3rd generation language functionality, and write of order manner of specification and configuration Opscode have published about a 100 cookbooks support for common technologies such as for Apache, for MySQL, Nagios, bunch of different node sql, Data Store, Redis, Riak and Cassandra, Tomcat , JBoss, Tech Tv and engine acts application that covers full infrastructure automation Cookbooks provide complete set of recipes manage in different aspects Opscode support staff manage window cookbooks Chef server uses bunch of components including API that provides all endpoints and access; Cache that serves as data storage Chef indexing process that is the basic cueing system that currently uses solar service Chef server API store cookbooks that can be uploaded by management work sample that use Nyfe command like tool to upload the cookbooks to server. s3for http data storage They use Merv for APi functionality and Web UI, it’s optional. Application is available rails recipes and out of template. Space robots is signing up opscode platform, configuration your accounts. Opscode platform use right scale, can use racks space, linod and temark, and you can use your own. Blog post are available for those who use cookbooks for more detailed with web cast. To get start use of chef just go to Opscode site, it has helps tab to get all the detailed and Then you can upload and download cookbooks. Cookbooks are all library code releases, this tool helps recipes to find easily. Download Episode | 8/10/10 | Free | View In iTunes |
| 29 | VideoLearn why the cloud is always cheaper | I had a chat with Joe Stump to talk about SimpleGeo, Cassandra, and Digg. Joe also talks about why the cloud is cheaper, even when you get to a bigger scale. SimpleGeo is a cloud service for location based stuff, similar to Twilio or Sendgrid. The idea for SimpleGeo came out of a location game. Real time background location services push 4mb/sec of data for 500k users Joe thinks overall Amazon web services is cheaper because of reserved instances and the hidden cost of personnel to spec hardware and contracts for bare metal servers SimpleGeo is using Cassandra for database backend, hadoop and hbase, scribe, python and tornado, rabbitMQ, and django SimpleGeo indexes 1,500 to 3,000 points a minute StickyBits uses SimpleGeo, Joe uses it for a travel log with pictures of where he has been SimpleGeo offers a marketplace, which allows you to share your GeoData if you want Cassandra node clusters are scarily simple to setup Cassandra offers a “rack” aware setup, to ensure copies of data go into different racks. SimpleGeo bastardized that to make it AWS Availability Zone aware Twitter is using Cassandra now Digg was a traditional LAMP stack with a big SOLR/Lucene setup Digg made heavy use of Gearman Digg has switched to using Cassandra and Thrift Joe is digging SQS and S3, he really loves it Download Episode | 7/23/10 | Free | View In iTunes |
| 30 | VideoLearn the secrets to build scalable apps | I had a chat with Theo Schlossnagle to talk about OmniTI, Surge Scalability Conference, and Circonus. Theo has also written and contributed to books such as Scalable Internet Architectures and Web Operations. Theo loves Beautiful Code, and he feels Web Operations has a similar feel. Surge is a conference for sharing experiences in building web apps The goal of Circonus is to bring monitoring, alerting, and trending functions all into one app Circonus can connect and check the health of MySQL and Postgres db servers OmniTI and ClearLeft partnered up to launch FontDeck Theo wanted to use a commercially available CDN for FontDeck, but they had to build their own. The FontDeck CDN uses Dynect to do GeoLocation lookups Theo gave a talk at Velocity Josh has been working on breaking down the Tweethopper daemon into smaller, simpler parts Circonus uses Citrix Netscaler, mod_perl, postgres, snowth, perl, c, java, lua, and rabbitMQ. Interested in going to surge? Just Tweet “Follow @webpulptv and @surgecon for a chance to win a ticket to SurgeCon 2010! Details: http://bit.ly/wp-surge” Stay tuned for a live streaming Webpulp.tv episode from Surge (maybe) Contact Jason Dixon for more questions about Circonus, OmniTI or the Surge Conference Download Episode | 7/15/10 | Free | View In iTunes |
| 31 | VideoReal-time data tracking with Hummingbird | I had a chat with Michael Nutt to talk about Hummingbird, Market.io, and Gilt Groupe. Michael wrote hummingbird to allow Gilt Groupe to monitor real time traffic as it happened during their flash sales every day at noon, it utilizes node.js and mongodb. Gilt Groupe hosts with Joyent, the infrastructure started with Rails, moved to a jruby db-less custom framework, and now they use Zeus to handle traffic Gilt uses MongoDB to track Gift requests, then they use Map/reduce to determine the best performance recommendations. Gilt uses Voldemort to handle the checkout process Hummingbird is a real time traffic analytics tool built to show off shopping and sales to in-office visitors at Gilt Groupe Hummingbird’s node pixel tracker should handle 4k requests/sec Express, a sinatra like DSL for node.js, cut throughput to 1k requests/sec Gilt uses postgres for most things still Hummingbird live viewer uses Canvas and Websockets Michael would probably stick XMPP in the node/websockets stack if the Hummingbird stats were viewed by a lot of people Michael suggests giving Mongoose a try for easier MongoDB/Node.js integration Download Episode | 7/1/10 | Free | View In iTunes |
| 32 | VideoLearn the secret to github's success: Open Source! | I had a chat with Tom Preston-Werner to talk about Github and their server/software architecture. Tom is one of the cofounders of Github and he has written a ton of open source stuff like: jekyll, ernie, chronic, and god. A github request hits the load balancer, goes to the front-end which runs nginx, then nginx shuttles the request to unicorn which runs the rails app. The rails app either needs a git repo or it doesn’t, if it does it makes a request to the sharded file system servers using grit. Tom explains grit communications with “hand gestures” ;) Github uses BERT and Ernie to let the grit front-end and back-end communicate. Tom decided to use Erlang because of his previous work on Fuzed with Dave Fayram. Github uses the Linux HA load balancers, which they’ve setup to rewrite the tcp headers to bypass the load balancers on the trip back to the client. Github users Anchor Systems to handle lower level server tasks Larry Wright asked “What does Github use Redis and MongoDB for?” Tom wrote proxymachine to work with Redis to do sharding on the file system. Redis is also used for Resque (job worker) and stats tracking. MongoDB is used for exception tracking, a closed source project named Haystack. Wikis will become git backed soon! Github has made open source commits every day this year except two. A twitter follower asked what they use for shared FS or backup: they build their own stuff! Backups are stored offsite in Australia, just in case the feds raid Github. Github moved to rackspace to use custom hardware and to move away from GFS Jack Dempsey wanted to know what could be learned from Github for a normal rails site Tom recommends using Unicorn Tom also recommends using Resque If you are writing a server either don’t use Ruby or use EventMachine Download Episode | 6/17/10 | Free | View In iTunes |
| 33 | VideoDeep dive into Mongo and MongoMapper | I had a chat with John Nunemaker to talk about Ordered List, harmony app, and mongoDB. John is the guy behind the popular mongoDB ruby ORM Mongo Mapper. Harmony was born out of their work together at Notre Dame, called Conductor MongoDB was built for speed and simplicity MongoDB offers an alpha version of sharding, but they are working on it Mongo uses Bson Embedded documents can offer speed advantages by making less queries John chose Mongo because they want to store all their documents together Mongo Replication is done by BSON log replay John wrote Joint for GridFS Mike Dirolf wrote nginx-gridfs John is starting to play with Varnish for Harmony ESI is how you do dynamic content with something like varnish Harmony has one server, trying to grow slow :) Ruby can’t hurt mongo! Download Episode | 5/24/10 | Free | View In iTunes |
| 34 | VideoGlimpse the inner-workings of 37signals and how they manage their web apps | I sat down with Mark Imbriaco to talk about life as a server admin at 37signals. Mark was the first official server admin that 37signals hired and has overseen a lot of their current architeture. 37signals has four main apps: Basecamp, Backpack, Campfire, and Highrise. They have a ton of small apps including: Job board, Signal vs Noise, Changelog, and Queen bee. 37signals has 40 servers, and a total of 125 OSes running. 3 Server admins work at 37signals. They use capistrano and chef to automate a ton of stuff. Mark tweeted it is hard to automate with chef when he can do it quickly on his own. Bare metal servers showed a 15% increase using production traffic, over KVM. 37signals uses Cisco Load Balancer, Nginx, Haproxy, Unicorn, and MySQL. Campfire uses redis now. Jamis Buck and Mark Imbracio switched the Campfire poller to Erlang. Erlang allowed them to switch from 300 fcgi processes to 3 erlang processes. Josh uses Eventmachine and RabbitMQ to power tweethopper Randy wanted to know if 37signals was hiring, and they aren’t right now. 37signals uses Nagios, Website Pulse, Circonus, and Ganglia to monitor their apps. 37signals uses a weekly on-call rotation now. Downtime is managed by fixing the app first and analysis later, sometimes hard for a developer to do. 37signals has around 60TB of data in Amazon S3. Mark says the cloud gets expensive when you need to scale up. They purchased 150TB San setup to start replacing their S3 setup Deploys push to the web servers and rails servers, so Nginx has the static content. They use mogileFS, they switched from NFS Josh Owens has used GlusterFS Dreamhost has cheap bandwidth! They are using Schooner MySQL appliances, they are seeing significant performance increase - around 300% 37signals uses the Percona flavor of MySQL (extraDB). Mark needs to do more Nuts and Bolts videos on SvN! Download Episode | 5/5/10 | Free | View In iTunes |
| Total: 34 Episodes |
Customer Reviews
Webpulp.tv is a great resource.
Webpulp.tv is a great resource. It's a revealing, behind-the-scenes look at "the stack" of hardware, software, and services that support emerging companies in industries like media, SaaS, and e-commerce. Check out the New Relic, Pusher App, and OpsCode episodes. If, like me, you're listening in your car, get a pen and paper handy. You'll want to jot some things down!
Viewers also subscribed to

- DevOps Cafe Podcast
- John Willis & Damon Edwards
- View In iTunes

- The Changelog
- Adam Stacoviak and Wynn Netherland
- View In iTunes
- Node Tuts
- Pedro Teixeira
- View In iTunes

- YUI Theater
- ericmiraglia@yahoo.com
- View In iTunes

- Pivotal Labs Tech Talks
- Pivotal Labs
- View In iTunes

