Saturday, 24 October 2009

BarCampLondon7: Non-relational Databases

Simon Willison

Back channel notes on etherpad

  • why?
    • scalability issues — have to do bizarre things to get to Flickr/Google size
    • some models don’t fit schemas
  • Voldemort — used by LinkedIn
    • needs at least four servers to get started!
  • CouchDB, MongoDB, etc
    • download and type make
    • MongoDB was much faster, tho’ CouchDB has improved
  • whenever you hit a tag page on on flickr, you hit a search
    • if you hit “my photos, tagged X” you hit a relational database
  • Programming the Semantic Web
    • by the guy who wrote Programming the Collective Intelligence — very good: all people who like X will like Y
  • redis
    • key-value store, network accessible
    • ridiculously fast
    • doesn’t persist to disk — every 15 seconds it dumps the entire database to disk
    • can improve reliability by replicating
    • e.g. live stats services
    • can have a key-set, with add to set, set intersection
  • Git
    • has shown that it can scale to the size of the linux kernel
    • so can scale to storing your desktop settings!
    • git is not just a RCS it’s a file system with revision control
    • there’s also git# and jGit
  • jaiku migrated to app engine
    • including all the history
    • need to think of queries at design time, otherwise you’re stuck and have to do a big MapReduce to extract data

No comments: