Links

Notes

Formatted in a question / answer style

Introduction & Background

What is PowerDrill (PD)?

  • A web-based analysis tool built by Google AdWords team
  • The columnar storage backend and execution engine is called "PD Serving", and is the focus of this paper

What types of analysis can you do in PD?

  • Drilldown: start with the entire dataset and perform slice/filter/aggregate operations
  • UI consists of bar graphs (GROUP BY) and selection/filters (WHERE)
    • Bias towards discrete/categorical data (strings, dates, etc.)

What kind of data is being analyzed?

  • Paper is not specific about this, but video is
  • The most important AdWords datasets
  • Log data
    • lots of string columns (e.g. search query text)
    • Wide datasets: thousands of columns
  • Usecases given:
    • responding to user requests (support requests?)
    • spam analysis (somewhat interactive)
    • Generating alerts for mission-critical systems (clickfraud according to the video)

Who is using PD?

  • Google internal only
  • 800 monthly users, 4 million monthly queries (c. 2012)

Why use columnar storage?

  • Compression: same-typed data is lower entropy so yields higher compression rate
    • specialized compression techniques for certain datatypes (e.g. dictionary encoding, RLE)