class: center, middle # Breaking Bottlenecks # 🍾 Victoria Drake --- class: center, middle # OWASP.org ## Open Web Application Security Project --- # The Trouble with Broken Links - Your customers don't like them - Attackers do like them - Broken link hijacking - Subdomain takeover - Application Security Verification Standard (ASVS) - [V10.3 Deployed Application Integrity Controls](https://github.com/OWASP/ASVS/blob/d9e0ac99828ef3c1e9233bd8a1f691f2a6958aa3/4.0/en/0x18-V10-Malicious.md#v103-deployed-application-integrity-controls) --- # The Job Design constraints: - Find and enumerate all broken links - Keep track of parent pages - Must run as part of CI/CD --- ## Prototyping a Web Crawler 1. Fetch data of page one from a URL 2. Check all the links on page one - Keep track of "seen" links so we don't visit them twice - Record any broken links 3. Fetch data from valid links on page one if they're in the same domain 4. Repeat #2 until all the links on the site have been checked --- background-image: url(./execution_flow.png) ## Execution Flow --- # How Fast Can You Do the Thing? Approximate timing on a typical PC: | | Task | Time | | ------- | ----------------------------------- | ------------------------------- | | CPU | execute typical instruction | 1/1,000,000,000 sec = 1 nanosec | | CPU | fetch from L1 cache memory | 0.5 nanosec | | CPU | branch misprediction | 5 nanosec | | CPU | fetch from L2 cache memory | 7 nanosec | | RAM | Mutex lock/unlock | 25 nanosec | | RAM | fetch from main memory | 100 nanosec | | RAM | read 1MB sequentially from memory | 250,000 nanosec | | Disk | fetch from new disk location (seek) | 8,000,000 nanosec (8ms) | | Disk | read 1MB sequentially from disk | 20,000,000 nanosec (20ms) | | Network | send packet US to Europe and back | 150,000,000 nanosec (150ms) | .footnote[Source: Peter Norvig _[Teach Yourself Programming in Ten Years](http://norvig.com/21-days.html#answers)_] --- class: center, middle # Bottleneck: Network Latency # 🍾 --- # Time and Memory .center[![Lambda chart](./lambda-chart.png)] .footnote[Source: Mayank Lahiri _[Understanding and Controlling AWS Lambda Costs](https://serverless.com/blog/understanding-and-controlling-aws-lambda-costs/)_] --- count: false # Taco Tuesday! .art[ ```text 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 🌮 🌳 🌮 🌮 ╔══════════════╗ 🌮 Hi I'm Bob 🌳 🌮 ╚══════════════╝ \ 🌮 🐹 🌮 🌮 🌮 🌮 🌳 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 ``` ] --- count: false # Taco Tuesday! .art[ ```text 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 🌮 🌳 🌮 🌮 🌮 🌳 🌮 🐹 🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵 🌮 🌮 🌮 🌮 🌳 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 ``` ] --- count: false # Taco Tuesday! .art[ ```text 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 🌮 🌳 🌮 🌮 🌮 🌳 🌮 🐹 🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵 🌮 🌮 🌮 🌮 🌳 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 ``` ] ## (Single thread) --- count: false # Taco Tuesday! .art[ ```text 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 🌮 🌳 🌮 🌮 🧑💵🧑💵🧑💵🧑💵 🌮 🌳 🌮 🐹 🌮 🌮 🧑💵🧑💵🧑💵🧑💵🧑💵 🌮 🌮 🌳 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 ``` ] --- count: false # Taco Tuesday! .art[ ```text 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 🌮 🌳 🌮 🌮 🧑💵🧑💵🧑💵🧑💵 🌮 🌳 🌮 🐹 🌮 🌮 🧑💵🧑💵🧑💵🧑💵🧑💵 🌮 🌮 🌳 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 ``` ] ## (Concurrency) --- count: false # Taco Tuesday! .art[ ```text 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 🌮 🌳 🌮 🌮 🌳 🌮 🐹 🧑💵🧑💵🧑💵🧑💵 🌮 🌳 🌮 🐹 🧑💵🧑💵🧑💵🧑💵🧑💵 🌮 🌳 🌮 🌮 🌳 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 ``` ] --- count: false # Taco Tuesday! .art[ ```text 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 🌮 🌳 🌮 🐹 🧑💵🧑💵 🌮 🌳 🌮 🐹 🧑💵🧑💵 🌮 🌳 🌮 🐹 🧑💵🧑💵 🌮 🌳 🌮 🐹 🧑💵🧑💵🧑💵 🌮 🌳 🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮 ``` ] ## (Multithreading) --- # Doing the Thing Fast ```go type Checker struct { startDomain string brokenLinks []Result visitedLinks map[string]bool workerCount, maxWorkers int sync.Mutex } ... // Page allows us to retain parent and sublinks type Page struct { parent, loc, data string } // Result adds error information for the report type Result struct { Page reason string code int } ``` Helper functions use `Unlock()` and `Lock()` to add visited links and broken links to lists --- ```go // Extract links from HTML func parse(parent, data string) ([]string, []string) { doc, err := html.Parse(strings.NewReader(data)) if err != nil { fmt.Println("Could not parse: ", err) } goodLinks := make([]string, 0) badLinks := make([]string, 0) var f func(*html.Node) f = func(n *html.Node) { if n.Type == html.ElementNode && checkKey(string(n.Data)) { for _, a := range n.Attr { if checkAttr(string(a.Key)) { j, err := formatURL(parent, a.Val) if err != nil { badLinks = append(badLinks, j) } else { goodLinks = append(goodLinks, j) } break } } } for c := n.FirstChild; c != nil; c = c.NextSibling { f(c) } } f(doc) return goodLinks, badLinks } ``` --- # Doing the Thing Fast ```go func main() { ... startURL := flag.String("url", "http://example.com", "full URL of site") ... firstPage := Page{ parent: *startURL, loc: *startURL, } toProcess := make(chan Page, 1) toProcess <- firstPage var wg sync.WaitGroup ``` --- # Doing the Thing Fast ```go for i := range toProcess { wg.Add(1) checker.addWorker() 🐹 go worker(i, &checker, &wg, toProcess) if checker.workerCount > checker.maxWorkers { time.Sleep(1 * time.Second) // throttle down } } wg.Wait() ``` Goroutine: thread managed by Go runtime --- # Did We Do the Thing Fast? ```text time python3 slow-link-check.py https://victoria.dev real 17m34.084s user 11m40.761s sys 0m5.436s time python3 hydra.py https://victoria.dev real 1m13.358s user 0m13.161s sys 0m2.826s time ./go-link-check --url=https://victoria.dev real 0m7.926s user 0m9.044s sys 0m0.932s ``` --- # Interesting Bits - [OWASP.org](https://owasp.org) - [Graphviz](https://graphviz.org/) - [Story of `left-pad`](https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/) # Find Me .social[
hello@victoria.dev
https://victoria.dev ] .social[
@victoriadrake
@victoriadotdev ]