-
October 2009 –
Present
Mountain View, CA - Google, Inc. - Google Books / OCR
I work on the Tesseract open source OCR engine as part of the Google
Books project.
While with the project, I've implemented an advanced text matching algorithm used for OCR evaluation and cross-document correction,
streamlined our evaluation and training process, and introduced the use of
standardized (BCP 47) language and script tags.
I'm currently working on a new structure extraction algorithm and tinkering with training Tesseract on Old English.
-
January 2008 –
October 2009
Mountain View, CA - Google, Inc. - Traffic Team (SRE)
Traffic Team runs Google's DNS, front line web servers, and software
load balancing infrastructure in order to direct web requests to the
closest data center with capacity. While on Traffic Team, I:
- Increased Google site reliability by consolidating our internal
DNS into anycasted pools.
- Designed and implemented a tool to do capacity planning based
on historical load trends and simulations of common failure scenarios.
- Worked with Dave Presotto and Dan Eisenbud on our load balancing
external DNS infrastructure, collapsing it from two tiers to one.
- Co-ordinated disruptive networking upgrades with a wide range of
teams within Google.
- Did all manner of system administration work: building monitoring,
handling maintenances, deploying new binaries, tracking down and fixing
bugs and regressions, bringing up and tearing down frontend services at
clusters, and helping services launch.
-
October 2005 –
December 2007
New York, NY and Mountain View, CA - Google, Inc. - Google Checkout
Google Checkout is a payments and order processing system which
consolidates much of the hassle of doing business online.
It was my first experience building a "cloud" application, and
I learned a lot about all that entails: zero-downtime upgrades,
strategies for multihoming, internationalization, and monitoring
a running system. My major contributions were:
- Scalability: I led the effort (both in design and implementation)
which moved a large proportion of Checkout's data from SQL to Bigtable.
This project covered schema management for the new data store, the
design of a replication strategy, and the provision of Java handles
to the backends correlated with access type (read-only or read-write)
and proximity.
The move to Bigtable helped with scalability both due to the limitations
of our SQL instances and the enforced single-row discipline that Bigtable
requires. Each Order links a buyer with a merchant, and it's tempting to
update indices associated with both parties whenever an Order is changed.
This doesn't scale well, for reasons
Pat Helland has described rather eloquently in his
CIDR 2007 paper.
- Checkout Merchant integration (web UI, API, features).
- Monitoring, logs analysis, whole-system debugging and production support.
-
June 2003 –
August 2003
Poughkeepsie, NY - IBM - Linux Technology Center
- Developed Python scripting hooks for DCIM System Management Protocol
- Found and patched PowerPC spinlock bug in glibc
- Tracked down and patched memory leaks in C/C++ libraries
-
June 2002 –
August 2002
Atlanta, GA - Georgia Tech - Mathematics Department
NSF VIGRE supported reading in optimization problems and proof systems.
-
June 2001 –
August 2001
Austin, TX - IBM - Extreme
Blue Internship with Austin Research Lab
Developed Linux port to and application software for the prototype PowerPC 405LP
- Wrote PCI and embedded drivers for an LCD Controller
- Optimized video-related software stack (Linux Framebuffer / Nano-X / FLTK)
-
January 2000 –
August 2000
Dallas, TX - Hewlett Packard - High Performance Systems Lab
Helped develop firmware for IA64-based Hewlett-Packard Supercomputers
-
June 1999 –
December 1999
Atlanta, GA - Georgia Tech - GVU Animation Lab
Developed improvements to physical simulation system for researchers
- Ported simulation system from SGI/sproc() based threading to POSIX threads
- Worked on a Perl scripting system
-
May 1996 –
August 1996
Seaford, DE - Interactive Mathematics Online
Designed and led four-month, three-high-school-student development of Interactive Mathematics Online
- Won the ThinkQuest Gem Award
- Received the JARS Top 1% Applet Award for my technical paper on Stereograms
- My team mate Amay Champaneria now also works at Google.
Patents, Papers, and public Code
Papers: Consistency,
Survivability, Speed: Pick Two (November 2008)
Patents: "Access Using Images" submitted in December 2008 for a new form of captcha.
Code:
Tesseract OCR
engine;
The DC3 Suffix Array Creation Algorithm (an implementation) helpful for aligning OCR output with the correct text.
Education:
Awards and Distinctions
- Fulbright Scholar, Budapest, Hungary, 2003-2004
- NSF VIGRE Grant Recipient for Summer Research in Mathematics, 2002
- Albert C Jacobs, Phi '21 Award, 2001
- Psi Upsilon, Gamma Tau Chapter President, 2000
- ThinkQuest Gem Award, 1997
- Lots of little recognitions and scholarships for being a good student
Skillsets:
- Sharper: C/C++, Unicode, DNS, Python, UNIX Shell
- Google: Bigtable, cluster monitoring software, replication strategies
- Rustier: LaTeX, OpenGL, POSIX systems and (Linux) kernel work, Perforce, Postscript, SML/NJ, SQL, HTML, Java, JavaScript
References available upon request.