Contact Info

Areas of Expertise

  • High Availability

  • Low Latency Services

  • Business Risk Analysis (SLA/KPI)

  • Team Cohesion

Work Experience

Distributed Systems Engineer for Hire - Referential Labs - August 2016 to Present

  • Worked with high throughput (>100,000hits/minute) low latency (<2ms 99.999%) applications to reduce jitter and automate outlier analysis

  • Identify gaps in application monitoring to improve system visibility and customer experience

  • Reduce cloud operating expenditure with infrastructure deployment automation

  • Reduce business risk by defining security policies for cloud infrastructure deployment

  • Improve team cohesion by defining success metrics and identifying friction points in the delivery cycle

Principal Engineer - Lookout - November 2014 to August 2016

Cloud Operations architect and performance engineering. Improved operations delivery quality through automation and infrastructure validation.

  • Improved hiring practices to encourage strong candidates that would grow the organization

  • Issue escalation for critical infrastructure (Network, Cassandra, MySQL, Elasticsearch, Kafka, etc)

  • Failure testing procedures and training

  • Migration to AWS from physical DC

  • Monitoring stack improvements (self service!)

  • Infrastructure automation advocacy, training, documentation, and implementation

Principal Engineer - Jive Software - August 2012 - November 2014

Team lead for SaaS operations group. Worked with multiple engineering teams and product groups to enable self-service engineering application deployment.

  • CI/CD pipeline design and implementation

  • Monitoring stack design and improvements

  • Automation architecture and implementation (puppet, ansible)

  • Escalation problem solving for devops team

  • Ops tool set standarization

  • Engineering Advocate inside TechOps Organization

Systems Architect - Boltnet, Inc. - February 2010 to August 2012

First employee, designed, implemented and scaled infrastructure to three data centers serving >15,000 hits/second across billions of landing pages.

  • Work with engineers to resolve scale/load issues

  • Designed and implemented the architecture that runs the BO.LT application (cloud and in-house)

  • Performance monitoring (BGP visualization, OpenNMS, Keynote, log parsing)

  • Release Engineering (git integration, test automation)

  • Configuration management (revision control, deployment)

Operations Architect - TiVo Inc. - June 2008 to February 2010

Worked with multiple operations and engineering groups to help architect and improve several different applications deployed internationally to over 700,000 concurrent clients.

  • Designed and implemented numerous improvements to the tool-set used by administrators to diagnose problems.

  • Improved monitoring coverage through configuration and helping guide monitoring application development (in-house application).

  • Improved procedures used by operations and engineering to deploy new applications and test performance changes.

  • Senior Operations Escalation point for real-time application problems.

  • Liaison with multiple engineering groups for future state architecture steering, bug diagnostics and prioritization.

  • Significantly reduced on-call 'fires' by tracking recurring problems and helping focus limited resources on the most beneficial fixes.

Operations Engineer - Atomz - January 2001 to June 2008

Designed, built, scaled, upgraded and maintained a large highly available network serving over 50,000 customers out of 4 data centers in 2 countries.

  • Achieved application availability of 99.999% by building a fail-safe BGP/DNS load sharing system

  • Developed real-time(<60s delay) dashboard for monitoring network statistics (traffic levels, request resource utilization, CPU/IO/Mem queue wait times, etc)

  • Built tools to improve eBGP peering, reduce overall customer latency and monitor BGP events on the internet that affect us or vendor networks we rely on

  • Disaster recovery design and implementation

  • Reduced down-time and improved incident response by designing clustering software and automated fail-over procedures

  • Setup monitoring and trend graphing (Cricket, Nagios, Smokeping, Cflowd/flowtools, centralized syslog)

  • Created many cost and time saving tools for network and system maintenance (i.e. RT<→IRC interface for easy ticket management)

  • Designed automatic provisioning system to reduce configuration mistakes and build-out time

  • Helped improve QA procedures for more thorough and automated testing (test case design and tool research)

  • Escalated issue resolution/troubleshooting for multiple business units 24x7

Cisco Systems, Nortel, Sanmina - freelance - 2000

  • Oracle disaster recovery architecture implementation

  • Train staff in UNIX diagnostics

  • Lab design and setup for training, using Cisco routers and Sun machines

Hobbies:

I love photography, especially sharing ephemeral street art.

I also brew beer with a focus on old beer styles that are higher gravity and age well (24% ABV is my current personal best).