Jonathan Kratter
The latest version of this resume is available at www.jonathankratter.com
If you are receiving this resume as a forwarded or printed document, check the website to ensure you have the most recent, canonical version.
Areas of Expertise
Languages
Python, Ruby, bash
Orchestration
Kubernetes, kops, Apache Spark, Docker Compose
Config Management
Helm, boto, Chef, Puppet
AWS Services
EC2, EKS, IAM, VPC, RDS, S3, DynamoDB, CloudWatch, ElastiCache, Route53, CloudFront
Metrics, Monitoring, Logs and Visualization
Prometheus, Grafana, graphite/carbon-cache, Kibana, statsd, collectd, DataDog, Pingdom, PagerDuty,
Persistent Data Stores
Postgres, MySQL, Redis, Cassandra, Elasticsearch, Zookeeper
Queues
Kafka, RabbitMQ
Web Servers, Clients, Libraries, Proxies and Frameworks
nginx, Apache httpd, curl, Requests, HA_Proxy, Charles Proxy, Ruby on Rails, Wordpress, Flask
OS
Debian/Raspbian/Ubuntu, iOS, MacOS
Workflow Management Tools
Odoo, Jira/Confluence, Slack, Discord
Version Control Systems
git, github, gitlab, Stash
Media and Creative
Stable Diffusion, Deforum, automatic1111, OBS, Photoshop, Lightroom, After Effects, Traktor, Ableton Live, Kontakt, Logic Pro, MadMapper, VDMX, Artmatic
Extracurriculars
Woodworking, photography, lighting design, projection mapping, LED art, Raspberry PI, DJing and electronic music production
Experiences
Last updated August 2024Principal Engineer / Producer, anodyne.io
April 2017 - presentanodyne.io is my S corporation, focused on - but not limited to - work at the intersection of cloud technology and digital media
Recent Clients:
Neal's CNC
- Worked with the owners of this small cutting & fabrication shop to identify requirements and estimate the effort required to create a bespoke Kanban-style job management / CRM solution using the Odoo suite of business software and Odoo.sh, their cloud-based solution for custom apps.
Messing Adam & Jasmine LLP
- As the sole technical member of staff, wrote Python code that dramatically simplified a complex document review effort by applying a single Bates coding scheme to 60k+ documents. Collaborated with the support and engineering staff of the firm's e-discovery platform, Everlaw, to explore capabilities that might allow the effort to be automated, eventually using the platform's Excel spreadsheet import/export function as an ad-hoc API. Identified blocking bugs and communicated needed feature requests to the Everlaw engineering team in a language they could immediately understand and act on. Completed in two weeks what would have taken several months if done manually.
- Using Adobe Photoshop and After Effects, created technical animations to visualize the movement of personnel through a large worksite for use in mediation. The mediators called out the animations as being highly persuasive in their finding in favor of the client, leading to the successful resolution of the largest class action wage and hour settlement in California state court history.
Sr SRE / Systems Engineer, Uber Technologies
2018 - 2019as a member of the SRE team responsible for corporate infrastructure, authored and maintained Puppet manifests to deploy and configure internal and third-party applications and services, provided architectural guidance and engineering analysis to internal customers, and participated in a Pager Duty rotation supporting 1600 VMs across AWS and VMware on-prem.
piloted first experiments with Kubernetes on Uber corporate infrastructure, using kops and AWS EKS.
directly responsible for Uber's deployment of BigID, a GDPR compliance solution scanning over 150 corporate data stores for PII. Using Requests and boto, created automation to discover data sources via AWS and Puppet and configure those data sources in BigID. Wrote Puppet manifests to modify and merge Docker Compose files used in the deployment of this container-based solution.
worked closely with BigID engineering and support to identify, diagnose, and remediate issues discovered by virtue of Uber being an early, large customer.
domain expert for Grafana and graphite. Created automation to store dashboard JSON definitions in git and provision dashboards from that repo. Managed rollouts and migrations for major upgrades and provided advice and engineering assistance to internal customers creating metrics or consuming them via dashboards and API.
Head of Infrastructure and DevOps, Kumofox
2016 - 2017together with Bitcasa's former CEO, architect, backend lead developer, and VP of Engineering, founded a new company, Kumofox, to continue to develop Bitcasa's IP through a series of contract engagements with Intel's "New Technology Group."
as part of proof-of-concept efforts, worked with the developers to migrate the existing CloudFS backend from its previous operational context - a set of Apache and mod_wsgi Python apps running on "bare" EC2 instances, deployed via Chef - to a modern, containerized stack. Using Docker Compose, nginx and uWSGI, reduced the server-side footprint sixfold, enabling the entire platform to be run on a single t2.large instance.
following the success of these proofs-of-concept, migrated the platform from Docker Compose to Kubernetes, using kops to deploy dev and test clusters on EC2 in preparation for a production launch.
established cluster monitoring, alerting, and visualization infrastructure. With Prometheus as the backing data store and alerting engine, and Grafana as the visualization frontend, it collected host metrics using node-exporter, container metrics with cAdvisor, Kubernetes resource metrics with kube-state-metrics, and service metrics from Cassandra, Kafka, and Spark using the Prometheus JMX exporter.
authored Helm charts for platform services developed in-house and open source infrastructure services for which public charts were not available; modified and maintained publicly available charts to meet the unique needs of the platform.
managed Docker container builds for the platform, modifying public and in-house developer images to add instrumentation and platform-specific configuration
deployed centralized cluster logging using the Elasticsearch/Fluentd/Kibana stack
Sr. DevOps Engineer, Bitcasa Inc.
2014 - 2016joined the organization immediately prior to a complete migration of the service, including 45 Pb of user data, 10k users, and the entire backend infrastructure. Worked with AWS architects and engineers to ensure its success.
saved the company two million dollars in S3 charges by catching critical oversight in data migration speed calculations
as the one remaining DevOps engineer on staff after Bitcasa entered into receivership, was solely responsible for the availability and performance of the CloudFS service throughout the 30 day shutdown period provided for users to retrieve their data from the service.
in addition to winding down the production service, managed other IT wind-down processes including backup and archival, so that the company retained the option of restarting operations if the IP was purchased by a solvent entity.
using statsd, collectd, CloudWatch, carbon-cache, Graphite, and Grafana, built a comprehensive set of system dashboards to provide insight into key system metrics and trends
reduced monthly AWS spend by > $50k by identifying underutilized resources and redeploying rightsized replacements
initiated a migration of the monitoring infrastructure from nagios to DataDog and built out a comprehensive set of alerts integrated with Slack and PagerDuty
managed and refactored existing Chef cookbooks and roles as application stack needs changed and new services were brought online
technical lead on strategic partner integrations and deployments for telcos and OEMs, handling integrations at the DNS, web, and database layers.
developed pipeline using rsyslogd to deliver > 50k messages per second from application servers to analytics ingest endpoints
worked with platform scrum teams to shepherd multi-service backend releases through staging and test environments into production
spun up new application stacks and staging environments for development work and external partner beta tests
using the ELK stack, investigated abuse of the service by pirates, maintained blacklists, and, in conjunction with platform developers, designed and deployed a code-level fix that reduced daily EC2 transfer costs from $1000 to $200
advised engineering on AWS best practices, to provide cost-effective performance and good instance-to-application fit
authored naming & config conventions to ease cognitive load and speed management of > 200 AWS instances across three regions
worked with developers and QA to create prototype of the application stack inside Docker containers
participated in 24/7 weekly PagerDuty rotation
Infrastructure Engineer, IT and Internal Tools Lead, Life360
2014using HAProxy and OpenSwan, developed a novel mechanism to allow multiple EC2 instances to interface with a 3rd-party billing network over a point-to-point VPN
planned and executed migration of critical APNS Push Notification infrastructure from a manual, "artisanal" deployment to a HA / DR cluster configured entirely via Chef
responsible for network systems company-wide, including in-office WiFi and Ethernet networks, wired and wireless WAN links, and cloud-to-customer VPNs
responsible for in-house Atlassian systems, including JIRA, Confluence, Crowd, Stash, and HipChat, and the development of custom JIRA workflows and Kanban boards
participated in 24/7 weekly PagerDuty rotation
DevOps Engineer, Tout Industries
2012 - 2014I was hired as a QA manager, then grew into a DevOps / Infra role to meet the organization's needs
DevOps Work:
managed infrastructure provisioning for both production and pre-production environments, developing and implementing release plans for new services and features
led configuration, deploy, and test efforts for numerous mission-critical backend services including video ingest, analytics, messaging, and A/B test frameworks
responsible for Chef ecosystem, author and maintainer of critical cookbooks for HAProxy, nginx, Node.js, multiple JDKs, and ffmpeg
extended and became primary maintainer multiple generations of automated deployment frameworks in bash and Python
developed end-to-end monitoring of video ingest and publication pipeline, including a suite of video generation and calibration tools
participated in 24/7 weekly PagerDuty rotation
QA Work:
managed hiring for the QA group, including a new Director of QA and additional QA Engineers
as the first full-time QA resource hired by Tout, established and documented the "QA DNA" of the engineering organization, its processes and best practices
built prototype Web testing framework using Selenium IDE, and expanded automated HTTP API coverage using rspec and Tout's public Ruby client, trubl (https://github.com/Tout/trubl)